Apply

Staff Cloud Platform Engineer- Core Infra

Posted about 1 month agoViewed

View full description

💎 Seniority level: Staff, 8+ years

📍 Location: USA

🔍 Industry: Software Development

🏢 Company: Sift👥 251-500💰 Secondary Market about 3 years agoFraud DetectionBig DataPredictive AnalyticsAnalyticsNetwork Security

🗣️ Languages: English

⏳ Experience: 8+ years

🪄 Skills: AWSDockerPythonSQLCloud ComputingGCPJavaJenkinsKafkaKubernetesRubyRuby on RailsSnowflakeAirflowAlgorithmsData engineeringData StructuresREST APISparkCI/CDProblem SolvingRESTful APIsLinuxDevOpsTerraformMicroservicesScalaData modelingScriptingData managementDebugging

Requirements:
  • 8+ years of experience as a Software Engineer focused on infrastructure/platform services or in a Site Reliability Engineering (SRE) role.
  • Strong programming skills in languages such as Java, Scala, or Python.
  • Experience designing and implementing distributed systems.
  • Experience building and managing cloud infrastructure on AWS or GCP.
  • Expertise in building infrastructure as code and automating provisioning processes using tools like CloudFormation or Terraform.
  • Proficiency in setting up and managing monitoring and alerting systems, both open-source and commercial.
  • Familiarity with Docker and container orchestration technologies like Kubernetes, GKE, or AWS ECS.
  • Strong experience troubleshooting and resolving production system issues, with a focus on building automated solutions to prevent future occurrences.
  • Proven expertise in automation and a solid understanding of configuration management tools.
Responsibilities:
  • Own the availability, performance, and scalability of Sift’s primary online storage systems and infrastructure
  • Design and build immutable infrastructure and fault-tolerant, multi-AZ/multi-region systems that are resilient and self-healing.
  • Design and Implement multi-region deployments, such as BigTable clusters spanning multiple regions, with strategies to ensure specific customers are routed to designated regions (e.g., sticky sessions at the regional level).
  • Solve complex problems that arise from our unique data volume and request rate which may involve digging deep into data store and messaging internals
  • Optimize local development and testing workflows to be fast, efficient, and seamless.
  • Design and implement services and libraries for components to interact with data stores, messaging layer and services platform
  • Develop tools for monitoring, detecting faults, and automatically repairing distributed systems
  • Provide design support to internal engineering teams for optimal usage of data stores, data growth planning, production workload optimization, messaging, caching and service platform
  • Participate in on-call support and incident response activities, providing 12/7 coverage for one calendar week approximately once every 3-4 weeks.
Apply