Apply

Senior Site Reliability Engineer (SRE)

Posted 2024-11-07

View full description

💎 Seniority level: Senior, 5+ years

📍 Location: US, Portugal

🔍 Industry: Health Technology

🗣️ Languages: English

⏳ Experience: 5+ years

🪄 Skills: AWSDockerPostgreSQLPythonElasticSearchJavascriptJenkinsKubernetesMySQLAzureElasticsearchGoGrafanaPrometheusRedisNosqlCI/CDJavaScript

Requirements:
  • Proficiency in programming languages such as Python, Go, Javascript.
  • 5+ years of experience with cloud platforms such as AWS, Google Cloud, or Azure.
  • Strong understanding of Linux/Unix systems and networking.
  • Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes).
  • Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
  • Knowledge of CI/CD pipelines and tools (e.g., Jenkins, GitLab CI).
  • Proficiency with relational and NoSQL databases (e.g., MySQL, PostgreSQL, Redis, Elasticsearch).
  • Willingness to collaborate and share knowledge with colleagues.
  • Ability to take responsibility for work and demonstrate accountability.
Responsibilities:
  • Develop and maintain monitoring and alerting solutions.
  • Respond to incidents, troubleshoot issues, and perform root cause analysis.
  • Automate repetitive tasks and improve deployment processes.
  • Develop and maintain tools to support infrastructure and applications.
  • Analyze system performance and implement optimizations to improve efficiency and reduce latency.
  • Ensure systems are secure and compliant with relevant standards and regulations.
  • Maintain comprehensive documentation of systems and processes.
  • Share knowledge and best practices with team members.
  • Ensure the reliability, performance, and scalability of databases.
  • Perform database optimization, maintenance, and troubleshooting.
Apply

Related Jobs

Apply

📍 US

🧭 Full-Time

💸 198000 - 220000 USD per year

🔍 Blockchain, Cryptocurrency

🏢 Company: Uniswap Labs

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
  • 5+ years of experience in site reliability engineering, DevOps, or related fields.
  • Strong understanding of reliability engineering principles and tools.
  • Proficiency in monitoring tools like Prometheus, Grafana, Nagios.
  • Experience with cloud platforms (AWS, Azure, GCP) and container orchestration systems (Kubernetes, Docker).
  • Proficiency in scripting tools such as Python, Bash, Ansible, or Terraform.

  • Design, implement, and maintain systems for reliability, availability, and performance of services.
  • Develop and manage monitoring, alerting, and incident response strategies.
  • Conduct root cause analysis of failures.
  • Collaborate with cross-functional teams on reliability practices.
  • Drive improvements and innovations in systems and processes.

AWSDockerPythonBashGCPKubernetesAzureGrafanaPrometheusCollaborationCI/CDDevOps

Posted 2024-11-07
Apply
Apply

📍 America

🧭 Contract

🔍 Digital paper solutions and learning ecosystem

🏢 Company: Goodnotes

  • Strong experience working in AWS-hosted environments.
  • Experience supporting production workloads and firefighting.
  • Knowledge of SRE best practices and common issues.
  • Proficient with system monitoring tools.
  • Understanding and experience with distributed databases.
  • Background in Linux and Networking fundamentals.
  • Experience in back-end development, including API usage and creation.
  • Knowledge of Security for networks and containers.
  • Understanding of container orchestration, especially Kubernetes.
  • Experience managing relational and non-relational databases, including backup and restore operations.
  • Familiarity with automation/configuration management tools, preferably CDK and/or Terraform.

  • Design, build, and maintain the Goodnotes infrastructure according to Dickerson’s Hierarchy of Reliability.
  • Refine and execute new and existing playbooks.
  • Educate teams on SRE best practices including design and capacity planning.
  • Act as a higher-level escalation point for applications.
  • Optimize latency and error rates and improve SLAs.
  • Enhance system monitoring, health reporting, and logging.
  • Implement security practices and maintain information security.
  • Participate in on-call rotation during the Americas Timezone.

Linux

Posted 2024-11-07
Apply