Apply

Site Reliability Engineer (SRE)

Posted about 21 hours agoViewed

View full description

πŸ’Ž Seniority level: Senior, 5+ years

πŸ“ Location: California, Florida, Georgia, Idaho, Illinois, Massachusetts, Colorado, New Jersey, New York, Oregon, Pennsylvania, Texas, Vermont, Virginia, Washington, EST

πŸ’Έ Salary: 142500.0 - 180000.0 USD per year

πŸ” Industry: Software Development

πŸ—£οΈ Languages: English

⏳ Experience: 5+ years

πŸͺ„ Skills: AWSDockerPostgreSQLPythonSQLKubernetesRabbitmqGrafanaRedisCI/CDLinuxTerraformMicroservices

Requirements:
  • 5+ years of experience as a DevOps or SRE.
  • Proficiency with AWS, Docker, and Kubernetes.
  • Experience with infrastructure as code (Terraform).
Responsibilities:
  • Architect and maintain services for continuous operation and compliance with SLAs.
  • Establish and manage a platform for product teams to deploy and monitor services.
  • Lead initiatives to improve operational processes and systems.
Apply

Related Jobs

Apply

πŸ“ California, Florida, Georgia, Idaho, Illinois, Massachusetts, Colorado, New Jersey, New York, Oregon, Pennsylvania, Texas, Vermont, Virginia, Washington

🧭 Full-Time

πŸ’Έ 142500.0 - 180000.0 USD per year

πŸ” Software Development

🏒 Company: VeriffπŸ‘₯ 501-1000πŸ’° $100,000,000 Series C about 3 years agoπŸ«‚ Last layoff over 1 year agoArtificial Intelligence (AI)Fraud DetectionInformation TechnologyCyber SecurityIdentity Management

  • 5+ years of experience as a DevOps or SRE
  • Strong knowledge of AWS, Docker, and Kubernetes
  • Proficient in infrastructure as code (Terraform)
  • Understanding of SRE principles for reliability and scalability
  • Experience with Linux, SQL/NoSQL databases, and microservices
  • Architect and maintain services for continuous operation
  • Establish and manage a platform for service deployment
  • Lead initiatives for operational excellence and process improvement
  • Ensure transparent communication and conduct postmortems
  • Develop and enhance CI/CD pipelines
  • Implement SRE best practices for monitoring and security

AWSDockerPostgreSQLPythonRabbitmqGrafanaRedisCI/CDLinuxTerraformMicroservices

Posted 1 day ago
Apply
Apply

πŸ“ United States, Canada

🧭 Full-Time

πŸ” Software Development

  • Degree in Computer Science or related field
  • 5+ years experience in site reliability engineering
  • Proficiency in AWS, Azure, or Google Cloud
  • Experience with IaC tools like Terraform or CloudFormation
  • Develop and document disaster recovery plans and procedures
  • Collaborate with teams to identify and mitigate risks
  • Monitor system performance and enhance reliability

AWSAzureTerraform

Posted 3 months ago
Apply
Apply

πŸ“ United States, Canada

🧭 Contract

πŸ” Site Reliability Engineering

  • 5-7 years in Site Reliability Engineering
  • Experience with DFR, FMEA, MTBF methodologies
  • Proficiency with monitoring tools like DataDog, PagerDuty
  • Strong coding skills in languages used in SRE
  • Identify and resolve complex bugs
  • Write and maintain code for system reliability
  • Investigate complex system issues
  • Design and build fault-tolerant systems
  • Develop and maintain reliability standards

PythonDebugging

Posted 3 months ago
Apply
Apply

πŸ“ US, Portugal

🧭 Full-Time

πŸ” Health Technology

  • Proficiency in programming languages such as Python, Go, Javascript.
  • 5+ years of experience with cloud platforms such as AWS, Google Cloud, or Azure.
  • Strong understanding of Linux/Unix systems and networking.
  • Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes).
  • Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
  • Knowledge of CI/CD pipelines and tools (e.g., Jenkins, GitLab CI).
  • Proficiency with relational and NoSQL databases (e.g., MySQL, PostgreSQL, Redis, Elasticsearch).
  • Willingness to collaborate and share knowledge with colleagues.
  • Ability to take responsibility for work and demonstrate accountability.
  • Develop and maintain monitoring and alerting solutions.
  • Respond to incidents, troubleshoot issues, and perform root cause analysis.
  • Automate repetitive tasks and improve deployment processes.
  • Develop and maintain tools to support infrastructure and applications.
  • Analyze system performance and implement optimizations to improve efficiency and reduce latency.
  • Ensure systems are secure and compliant with relevant standards and regulations.
  • Maintain comprehensive documentation of systems and processes.
  • Share knowledge and best practices with team members.
  • Ensure the reliability, performance, and scalability of databases.
  • Perform database optimization, maintenance, and troubleshooting.

AWSDockerPostgreSQLPythonElasticSearchJavascriptJenkinsKubernetesMySQLAzureGoGrafanaPrometheusRedisNosqlCI/CD

Posted 4 months ago
Apply