Site Reliability Engineer (SRE)

Posted about 21 hours agoViewed

💎 Seniority level: Senior, 5+ years

📍 Location: California, Florida, Georgia, Idaho, Illinois, Massachusetts, Colorado, New Jersey, New York, Oregon, Pennsylvania, Texas, Vermont, Virginia, Washington, EST

💸 Salary: 142500.0 - 180000.0 USD per year

🔍 Industry: Software Development

🗣️ Languages: English

⏳ Experience: 5+ years

🪄 Skills: AWSDockerPostgreSQLPythonSQLKubernetesRabbitmqGrafanaRedisCI/CDLinuxTerraformMicroservices

Requirements:

5+ years of experience as a DevOps or SRE.
Proficiency with AWS, Docker, and Kubernetes.
Experience with infrastructure as code (Terraform).

Responsibilities:

Architect and maintain services for continuous operation and compliance with SLAs.
Establish and manage a platform for product teams to deploy and monitor services.
Lead initiatives to improve operational processes and systems.

Apply

Related Jobs

Apply

🔥 Site Reliability Engineer (SRE)

Posted 1 day ago

📍 California, Florida, Georgia, Idaho, Illinois, Massachusetts, Colorado, New Jersey, New York, Oregon, Pennsylvania, Texas, Vermont, Virginia, Washington

🧭 Full-Time

💸 142500.0 - 180000.0 USD per year

🔍 Software Development

🏢 Company: Veriff👥 501-1000💰 $100,000,000 Series C about 3 years ago🫂 Last layoff over 1 year agoArtificial Intelligence (AI)Fraud Detection Information Technology Cyber Security Identity Management

🔧 Requirements

5+ years of experience as a DevOps or SRE
Strong knowledge of AWS, Docker, and Kubernetes
Proficient in infrastructure as code (Terraform)
Understanding of SRE principles for reliability and scalability
Experience with Linux, SQL/NoSQL databases, and microservices

💡 Responsibilities

Architect and maintain services for continuous operation
Establish and manage a platform for service deployment
Lead initiatives for operational excellence and process improvement
Ensure transparent communication and conduct postmortems
Develop and enhance CI/CD pipelines
Implement SRE best practices for monitoring and security

AWSDockerPostgreSQLPythonRabbitmqGrafanaRedisCI/CDLinuxTerraformMicroservices

Posted 1 day ago

Apply

🔥 Senior Site Reliability Engineer (SRE) - Disaster Recovery Specialist (m/f/x)

Posted 3 months ago

📍 United States, Canada

🧭 Full-Time

🔍 Software Development

🔧 Requirements

Degree in Computer Science or related field
5+ years experience in site reliability engineering
Proficiency in AWS, Azure, or Google Cloud
Experience with IaC tools like Terraform or CloudFormation

💡 Responsibilities

Develop and document disaster recovery plans and procedures
Collaborate with teams to identify and mitigate risks
Monitor system performance and enhance reliability

AWSAzureTerraform

Posted 3 months ago

Apply

🔥 Senior Site Reliability Engineer (SRE)

Posted 3 months ago

📍 United States, Canada

🧭 Contract

🔍 Site Reliability Engineering

🔧 Requirements

5-7 years in Site Reliability Engineering
Experience with DFR, FMEA, MTBF methodologies
Proficiency with monitoring tools like DataDog, PagerDuty
Strong coding skills in languages used in SRE

💡 Responsibilities

Identify and resolve complex bugs
Write and maintain code for system reliability
Investigate complex system issues
Design and build fault-tolerant systems
Develop and maintain reliability standards

PythonDebugging

Posted 3 months ago

Apply

🔥 Senior Site Reliability Engineer (SRE)

Posted 4 months ago

📍 US, Portugal

🧭 Full-Time

🔍 Health Technology

🔧 Requirements

Proficiency in programming languages such as Python, Go, Javascript.
5+ years of experience with cloud platforms such as AWS, Google Cloud, or Azure.
Strong understanding of Linux/Unix systems and networking.
Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes).
Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
Knowledge of CI/CD pipelines and tools (e.g., Jenkins, GitLab CI).
Proficiency with relational and NoSQL databases (e.g., MySQL, PostgreSQL, Redis, Elasticsearch).
Willingness to collaborate and share knowledge with colleagues.
Ability to take responsibility for work and demonstrate accountability.

💡 Responsibilities

Develop and maintain monitoring and alerting solutions.
Respond to incidents, troubleshoot issues, and perform root cause analysis.
Automate repetitive tasks and improve deployment processes.
Develop and maintain tools to support infrastructure and applications.
Analyze system performance and implement optimizations to improve efficiency and reduce latency.
Ensure systems are secure and compliant with relevant standards and regulations.
Maintain comprehensive documentation of systems and processes.
Share knowledge and best practices with team members.
Ensure the reliability, performance, and scalability of databases.
Perform database optimization, maintenance, and troubleshooting.

AWSDockerPostgreSQLPythonElasticSearchJavascriptJenkinsKubernetesMySQLAzureGoGrafanaPrometheusRedisNosqlCI/CD

Posted 4 months ago

Apply