Senior Site Reliability Engineer

Posted 2024-10-05

View full description

📍 Location: United States

💸 Salary: $161,000 - $180,000 per year

🔍 Industry: Adult entertainment

🏢 Company: Multi Media LLC

🗣️ Languages: English

🪄 Skills: DockerPythonJavaKubernetesTerraform

Requirements:

STEM degree and relevant experience as a Site Reliability Engineer
Exceptional problem solving skills
High proficiency in one of the following: C, C++, Java, Python, Go, etc.
High proficiency in Unix/Linux environment, excellent knowledge of internals (e.g., filesystems, system calls)
Networking knowledge (e.g., routing, switching, TCP stack) for both metal and cloud (VPC, Security Groups) environments
Experience in database administration and configuration
Experience with DevOps tools such as Terraform, Ansible, Docker, Kubernetes
On call reporting to monitoring and alerting of core website functions as needed

Responsibilities:

Performance analysis to identify sources of instability using data from APM and distributed telemetry data tools
Analyze complex systems to identify operational surprises and minimize downtime.
Software engineering and patching in to incrementally improve performance, scalability, and reliability
Infrastructure modifications in both a data center metal environment with advanced routing/switching and in the public cloud
Predictive failure analysis and disaster planning
Author new tools and automation to streamline the DevOps pipeline
Collaborate with other engineering teams
Database and kv store administration and configuration with a focus on uptime and performance
Incident response and postmortem reports

Apply

Related Jobs

Apply

🔥 Senior Site Reliability Engineer

Posted 2024-11-21

📍 United States

🧭 Full-Time

🔍 Legal technology

🏢 Company: Ramp Talent

Curiosity, willingness to learn, and passion for continuous improvement.
Proficiency in all skills expected of SRE II's.
Bachelor's degree in computer science, information systems, related field; comparable certifications; or equivalent direct work experience.
A minimum of 8 years of experience in hands-on technical roles.
A minimum of 2 years of Site Reliability Engineering experience.
Experience building autonomous systems that manage software operational details without human intervention.

Developing autonomous systems that manage the details necessary to build, deploy, test, and operate all Filevine Inc. products.
Being the voice of Reliability on your team throughout the SDLC.
Collecting, monitoring, aggregating, dashboarding, and alerting on software and server events.
Improving the CI/CD pipeline.
Developing playbooks, tools, and scripts to streamline processes and shorten problem resolution time.
Identifying and fixing gaps in the availability of systems.
Improving and defending the security of software and systems.
Documenting and diagramming processes, procedures, and best practices.
Finding, learning, improving, or creating new tools that are reliable, usable, and helpful.
Mentoring, training, and reviewing more junior engineers.
Participating in an on-call rotation for 24/7 production reliability support.

LeadershipCI/CDMentoring

Posted 2024-11-21

Apply

🔥 Senior Site Reliability Engineer, Databases

Posted 2024-11-20

📍 USA

💸 170000 - 190000 USD per year

🔍 Email Security

🏢 Company: Valimail

5+ years experience building and maintaining highly available relational databases.
Work collaboratively with cross functional teams
Value team success over individual success
Put industry and engineering best practices into practice and promotes them to others
Passion for reliable, scalable, and performant datastores with strong sense of ownership
Experience building and supporting highly performant and highly reliable datastores
Deep experience working with Postgres
Expert in database fundamentals, SQL, PL/pgSQL, (or other)
Experience with NOSQL datastores and caching solutions
Working knowledge of AWS or Azure cloud providers
Experience with Infrastructure-as-Code tools, such as Terraform

Evangelizing standard methodologies for building and operating highly reliable data storage systems
Serving as the subject matter expert in datastore design and performance
Building and supporting Valimail’s mission-critical datastores
Conducting timely post mortems of production datastore incidents
Collaboratively designing systems with other engineers to meet reliability, scalability, and performance requirements
Providing assistance to teams working with datastores
Automating routine database tasks
Participating in on-call rotation and incident response.
Upgrade data storage systems as necessary

AWSSQLAzurePostgresNosqlTerraform

Posted 2024-11-20

Apply

🔥 Senior Site Reliability Engineer

Posted 2024-11-16

📍 U.S.

🧭 Full-Time

💸 140000 - 160000 USD per year

🔍 Cybersecurity / Open source software

Sense of curiosity, resourcefulness, and pragmatism.
Expertise with multi-region deployments in public cloud environments.
Demonstrable production Kubernetes experience (Managed Kubernetes, Helm, kubectl, kOps, etc.).
Strong background in Reliability Engineering, DevOps, Software Engineering.
Fluency with at least one programming language, such as C#, Python, or Go.
Experience with cloud deployment and automation tools/methodologies (i.e. GitOps, Terraform, Pulumi).
Proficiency using source control such as Git.
Ability to maintain discretion and handle sensitive information.
Staying current with trends and new technologies.
Collaborative and adaptable mindset.
Excellent communication skills.
Strong problem-solving skills.

Take ownership of the Bitwarden cloud infrastructure, focusing on quality.
Evaluate infrastructure regularly, making recommendations for reliability, security, availability, scalability, and cost management.
Implement site reliability tools and observability systems.
Respond to outages and participate in a 24x7 support strategy.
Contribute to architectural designs and engineering operations at scale.
Engage in code reviews and spread technical knowledge.
Contribute to incident management processes.
Collaborate with teams to refine priorities and deliverables.
Align SLIs, SLOs, and SLAs with product owners.
Identify opportunities for new initiatives.
Influence the SDLC as Bitwarden scales.
Mentor team members.

PythonGitKubernetesC#StrategyGoCommunication SkillsDevOpsTerraform

Posted 2024-11-16

Apply

🔥 Senior Site Reliability Engineer - US/Canada

Posted 2024-11-09

📍 United States, Canada

🧭 Full-Time

🔍 Security and fraud detection

🏢 Company: DataVisor

5+ years of experience with production environment running Linux.
3+ years of experience with cloud solutions such as AWS, Azure, or Aliyun.
Familiarity with big data technologies such as Spark and/or Flink.
Passion for automating tasks through coding and scripting.
Experience with algorithms, data structures, complexity analysis, and software design.
Proficient coding skills in Python, Java, and Bash.

Design, implement, and maintain release automation pipelines to streamline the deployment process.
Develop systems for proactive monitoring, auto-diagnosis, and incident resolution in production environments.
Work with big data platforms such as Apache Spark or Apache Flink, optimizing and scaling data processing pipelines.
Perform maintenance and troubleshooting for databases, preferably Yugabyte, ClickHouse, and MySQL.
Ensure the reliability of cloud infrastructure using Kubernetes on AWS or GCP.
Participate in on-call rotation for system reliability, focusing on automation to minimize manual intervention.
Collaborate with engineering teams to enhance system performance and manage capacity planning.

Linux

Posted 2024-11-09

Apply

🔥 Senior Site Reliability Engineer (SRE)

Posted 2024-11-07

📍 US

🧭 Full-Time

💸 198000 - 220000 USD per year

🔍 Blockchain, Cryptocurrency

🏢 Company: Uniswap Labs

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
5+ years of experience in site reliability engineering, DevOps, or related fields.
Strong understanding of reliability engineering principles and tools.
Proficiency in monitoring tools like Prometheus, Grafana, Nagios.
Experience with cloud platforms (AWS, Azure, GCP) and container orchestration systems (Kubernetes, Docker).
Proficiency in scripting tools such as Python, Bash, Ansible, or Terraform.

Design, implement, and maintain systems for reliability, availability, and performance of services.
Develop and manage monitoring, alerting, and incident response strategies.
Conduct root cause analysis of failures.
Collaborate with cross-functional teams on reliability practices.
Drive improvements and innovations in systems and processes.

AWSDockerPythonBashGCPKubernetesAzureGrafanaPrometheusCollaborationCI/CDDevOps

Posted 2024-11-07

Apply

🔥 Senior Site Reliability Engineer ll

Posted 2024-11-07

📍 United States

🧭 Full-Time

💸 150000 - 230000 USD per year

🔍 Public safety technology

🏢 Company: Axon

This position involves handling of classified federal data; under federal regulations, it is open to US Citizens only.
10+ years of applicable experience.
Experience managing cloud platforms such as Azure, AWS, or similar.
Experience operating in Kubernetes platforms like AKS, EKS, or similar.
Experience using managed languages such as Python, Go, C#, Java, or similar.
Experience utilizing CI/CD platforms to automate provisioning infrastructure, software builds, tests, and releases.
Experience using observability tools such as APM, logging, and metrics to assist with debugging issues.
Experience designing tooling to simplify the operational management of SaaS/PaaS systems.
Familiarity with building flexible and testable Infrastructure as Code modules.
Empathy to support the needs of software engineers.

Build robust, easy-to-use foundational platforms and tools that enable engineering teams to provision services rapidly, consistently, and securely.
Exemplify cloud-native site reliability best practices.
Write code that is performant, maintainable, clear, and concise.
Employ strong problem-solving skills to debug problems in cloud-native distributed systems.
Influence and educate the engineering organization to adopt new and improved architectural patterns.
Provide robust documentation for use by engineers to promote self-service.
Take calculated risks, champion new ideas, and cultivate your craft.

AWSPythonJavaKubernetesC#AzureGoCI/CD

Posted 2024-11-07

Apply

🔥 Senior Site Reliability Engineer - SRE - 12 months rolling contract

Posted 2024-11-07

📍 America

🧭 Contract

🔍 Digital paper solutions and learning ecosystem

🏢 Company: Goodnotes

Strong experience working in AWS-hosted environments.
Experience supporting production workloads and firefighting.
Knowledge of SRE best practices and common issues.
Proficient with system monitoring tools.
Understanding and experience with distributed databases.
Background in Linux and Networking fundamentals.
Experience in back-end development, including API usage and creation.
Knowledge of Security for networks and containers.
Understanding of container orchestration, especially Kubernetes.
Experience managing relational and non-relational databases, including backup and restore operations.
Familiarity with automation/configuration management tools, preferably CDK and/or Terraform.

Design, build, and maintain the Goodnotes infrastructure according to Dickerson’s Hierarchy of Reliability.
Refine and execute new and existing playbooks.
Educate teams on SRE best practices including design and capacity planning.
Act as a higher-level escalation point for applications.
Optimize latency and error rates and improve SLAs.
Enhance system monitoring, health reporting, and logging.
Implement security practices and maintain information security.
Participate in on-call rotation during the Americas Timezone.

Linux

Posted 2024-11-07

Apply

🔥 Senior Site Reliability Engineer (SRE)

Posted 2024-11-07

📍 US, Portugal

🧭 Full-Time

🔍 Health Technology

Proficiency in programming languages such as Python, Go, Javascript.
5+ years of experience with cloud platforms such as AWS, Google Cloud, or Azure.
Strong understanding of Linux/Unix systems and networking.
Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes).
Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
Knowledge of CI/CD pipelines and tools (e.g., Jenkins, GitLab CI).
Proficiency with relational and NoSQL databases (e.g., MySQL, PostgreSQL, Redis, Elasticsearch).
Willingness to collaborate and share knowledge with colleagues.
Ability to take responsibility for work and demonstrate accountability.

Develop and maintain monitoring and alerting solutions.
Respond to incidents, troubleshoot issues, and perform root cause analysis.
Automate repetitive tasks and improve deployment processes.
Develop and maintain tools to support infrastructure and applications.
Analyze system performance and implement optimizations to improve efficiency and reduce latency.
Ensure systems are secure and compliant with relevant standards and regulations.
Maintain comprehensive documentation of systems and processes.
Share knowledge and best practices with team members.
Ensure the reliability, performance, and scalability of databases.
Perform database optimization, maintenance, and troubleshooting.

AWSDockerPostgreSQLPythonElasticSearchJavascriptJenkinsKubernetesMySQLJavaScriptAzureElasticsearchGoGrafanaPrometheusRedisNosqlCI/CD

Posted 2024-11-07

Apply

🔥 Senior Site Reliability Engineer

Posted 2024-10-21

📍 AL, AZ, CA, CO, CT, FL, GA, ID, IL, IN, IA, KY, ME, MD, MA, MI, MN, MO, NV, NJ, NY, NC, OH, OR, PA, TN, TX, VA, WA, WI

🧭 Full-Time

💸 110000 - 135000 USD per year

🔍 Childcare software

🏢 Company: Procare Solutions

Minimum 5 years' of hands-on experience with AWS services including EC2, S3, RDS, Lambda, ECS/EKS.
Deep knowledge and extensive experience with Linux operating systems, including system administration and troubleshooting.
Familiarity with common SRE-related tools such as Kubernetes, Docker, Prometheus, Grafana, and the ELK stack.
Proficiency in infrastructure as code (IaC) tools like Terraform, Ansible, and CloudFormation.
Experience with monitoring solutions, including metrics setup and creating alerts.
Strong understanding of networking concepts, including DNS, load balancing, and firewalls.
Proficiency in at least one programming or scripting language such as Python, Go, or Bash.
Excellent problem-solving skills with a proactive and analytical approach.
Strong written and verbal communication skills, with the ability to collaborate effectively.
Experience in DevOps engineering, including CI/CD practices and tools.

Design, implement, and maintain scalable, reliable, and secure AWS infrastructure using best practices.
Develop and maintain monitoring, logging, and alerting solutions to ensure system health and performance.
Automate infrastructure provisioning, configuration, and deployment processes using tools like Terraform and Ansible.
Respond to production incidents, conduct root cause analysis, and implement corrective measures.
Continuously analyze system performance and implement tuning improvements.
Ensure systems comply with security best practices and manage IAM roles and policies.
Collaborate with development teams on reliability integration into the software development lifecycle.
Maintain comprehensive documentation of infrastructure and processes.

AWSDockerPythonBashElasticSearchJenkinsKibanaKubernetesElasticsearchGoGrafanaPrometheusCommunication SkillsCollaborationCI/CDProblem Solving

Posted 2024-10-21

Apply

🔥 Senior Site Reliability Engineer

Posted 2024-10-21

📍 CA, CO, CT, FL, GA, IL, IN, KY, MA, MI, MN, NC, NJ, NY, OH, OR, PA, SC, TN, TX, UT, VA, WA, WI

💸 145000 - 175000 USD per year

🔍 Benefits and employee experience

🏢 Company: Jellyvision

Demonstrated experience with cloud computing platforms, particularly AWS.
Proficient in programming languages including Ruby, Python, and JavaScript.
Experienced with configuration management tools such as Ansible, Packer, CloudFormation, and a strong emphasis on Terraform.
Skilled in container technologies and orchestration tools like Docker, ECS, and Kubernetes.
Experience with continuous integration tools such as GitLab, GitHub, and Jenkins.
Knowledge of best practices for monitoring and alerting to ensure system reliability.
Exceptional communication skills with various stakeholders.
Strong data-driven decision-making capabilities.

Design applications by advising development teams on best practices and architecting solutions for optimal performance.
Optimize CI/CD pipelines through strategic guidance, minimizing manual tasks, and enhancing operational efficiency.
Monitor systems by efficiently resolving alerts, participating in on-call rotations, and supporting application management.
Mentor team members by providing guidance, seeking continuous learning opportunities, and giving constructive feedback.

AWSDockerPythonCloud ComputingJavascriptJenkinsKubernetesRubyJavaScriptCommunication SkillsCollaborationCI/CD

Posted 2024-10-21

Apply

Senior Site Reliability Engineer

Requirements:

Responsibilities:

Related Jobs

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities

🔧 Requirements

💡 Responsibilities