Site Reliability Engineer - Data Platform

Posted 5 months agoViewed
United StatesFull-TimeCrypto
Company:Kraken
Location:United States
Languages:English
Seniority level:Senior, 5+ years
Experience:5+ years
Skills:
AWSDockerPythonSQLApache AirflowBashKafkaKubernetesCI/CDLinuxTerraform
Requirements:
  • 5+ years experience as a Site Reliability Engineer, Infrastructure Engineer, Data Infrastructure Engineer, or similar roles, with a focus on data infrastructure and security.
  • Experience maintaining real-time data processing technologies (Kafka, Flink, Debezium).
  • Working experience managing hybrid multi-tenant cloud systems, particularly on AWS.
  • Experience with Infrastructure as Code tools (Terraform, Terragrunt, Atlantis).
  • Experience with containerization and orchestration tools (Kubernetes, Nomad, Docker).
  • Solid understanding of bash/shell scripting.
  • Proficiency in at least one programming language (preferably Python or JVM languages).
  • Experience maintaining data-related technologies (Apache Airflow, Apache Spark, DBs, BI tooling).
  • Experience solving data access management issues at large scale data-lake.
  • Familiarity with CI/CD deployment pipelines and related tools.
  • Strong problem-solving skills.
  • Experience with data-related technologies (databases, data lakes, airflow, spark) is a plus.
Responsibilities:
  • Design data governance mechanisms for lakehouse interaction, security, and compliance.
  • Implement infrastructure for data ingestion, storage, cataloging, and lineage.
  • Provide a state-of-the-art suite of BI tools.
  • Guarantee availability, high performance, scalability, and cost efficiency of the data platform.
  • Implement data infrastructure solutions supporting multiple business units.
  • Utilize Infrastructure as Code (IaC) with Terraform.
  • Develop automation scripts using bash/shell scripting.
  • Enhance and manage CI/CD pipelines.
  • Implement robust data monitoring and alerting solutions.
  • Manage role-based access control (RBAC).
  • Maintain real-time streaming data architecture (Kafka, Debezium).
  • Ensure timely and accurate processing of streaming data.
  • Utilize Kubernetes for containerized application management.
  • Implement incident response procedures and participate in on-call rotations.
  • Collaborate with teams to understand requirements and implement solutions.
  • Document architecture, processes, and best practices.
  • Support AI/ML teams with infrastructure requests.
About the Company
Kraken
1001-5000 employeesEthereum
View Company Profile
Similar Jobs:
Posted 4 months ago
United StatesFull-TimeQuantum Computing
Staff Site Reliability Engineer - Platform
Company:IonQ
Posted about 2 months ago
United States, CanadaFull-TimeFinancial Infrastructure
Staff Site Reliability Engineer, Platform Engineering
Company:Paxos
Posted 5 days ago
United StatesFull-TimeSoftware Development
Site Reliability Engineer