ApplyStaff Cloud Platform Engineer- Core Infra
Posted about 1 month agoViewed
View full description
💎 Seniority level: Staff, 8+ years
📍 Location: USA
🔍 Industry: Software Development
🏢 Company: Sift👥 251-500💰 Secondary Market about 3 years agoFraud DetectionBig DataPredictive AnalyticsAnalyticsNetwork Security
🗣️ Languages: English
⏳ Experience: 8+ years
🪄 Skills: AWSDockerPythonSQLCloud ComputingGCPJavaJenkinsKafkaKubernetesRubyRuby on RailsSnowflakeAirflowAlgorithmsData engineeringData StructuresREST APISparkCI/CDProblem SolvingRESTful APIsLinuxDevOpsTerraformMicroservicesScalaData modelingScriptingData managementDebugging
Requirements:
- 8+ years of experience as a Software Engineer focused on infrastructure/platform services or in a Site Reliability Engineering (SRE) role.
- Strong programming skills in languages such as Java, Scala, or Python.
- Experience designing and implementing distributed systems.
- Experience building and managing cloud infrastructure on AWS or GCP.
- Expertise in building infrastructure as code and automating provisioning processes using tools like CloudFormation or Terraform.
- Proficiency in setting up and managing monitoring and alerting systems, both open-source and commercial.
- Familiarity with Docker and container orchestration technologies like Kubernetes, GKE, or AWS ECS.
- Strong experience troubleshooting and resolving production system issues, with a focus on building automated solutions to prevent future occurrences.
- Proven expertise in automation and a solid understanding of configuration management tools.
Responsibilities:
- Own the availability, performance, and scalability of Sift’s primary online storage systems and infrastructure
- Design and build immutable infrastructure and fault-tolerant, multi-AZ/multi-region systems that are resilient and self-healing.
- Design and Implement multi-region deployments, such as BigTable clusters spanning multiple regions, with strategies to ensure specific customers are routed to designated regions (e.g., sticky sessions at the regional level).
- Solve complex problems that arise from our unique data volume and request rate which may involve digging deep into data store and messaging internals
- Optimize local development and testing workflows to be fast, efficient, and seamless.
- Design and implement services and libraries for components to interact with data stores, messaging layer and services platform
- Develop tools for monitoring, detecting faults, and automatically repairing distributed systems
- Provide design support to internal engineering teams for optimal usage of data stores, data growth planning, production workload optimization, messaging, caching and service platform
- Participate in on-call support and incident response activities, providing 12/7 coverage for one calendar week approximately once every 3-4 weeks.
Apply