Apply

Site Reliability Engineer (SRE)

Posted 2024-11-21

View full description

πŸ“ Location: Portugal

πŸ” Industry: Vertical AI SaaS solutions

🏒 Company: Intapp

πŸͺ„ Skills: PythonSQLAgileJenkinsJVMProduct ManagementAzureGoPostgresNosqlCollaborationCI/CDDevOps

Requirements:
  • Hands-on experience in building fault-tolerant and scalable systems.
  • Experience with database technologies such as SQL Server, Postgres, and NoSQL.
  • Expertise in Configuration Management and CI/CD tools like Ansible, Jenkins, and Azure DevOps.
  • Hands-on experience with Azure in building and running production workloads.
  • Strong scripting abilities in languages like Python, Perl, Go, or JVM-based languages.
  • Solid understanding of continuous integration, deployment, and operations concepts.
  • Production experience of managing Windows Infrastructure running IIS workloads.
  • Passion for resolving reliability issues and automating processes.
Responsibilities:
  • Work with Development and Product Management to design and deliver new functionality.
  • Perform deep dives into systemic and latent reliability issues while collaborating with software engineers.
  • Drive standardization efforts across multiple disciplines and services with SREs.
  • Identify and drive opportunities to improve automation for deployment and management of services.
  • Work in an agile operations framework, balancing sprint-based work with daily operations needs.
  • Participate in a 24x7 oncall rotation.
Apply

Related Jobs

Apply

πŸ“ Portugal

πŸ” Vertical AI SaaS solutions

🏒 Company: intapp

  • Hands-on experience in building fault-tolerant and scalable systems.
  • Experience with different database technologies such as SQL Server, Postgres, NoSQL.
  • Expertise in Configuration Management and CI/CD tools such as Ansible and Jenkins, Azure DevOps.
  • Hands-on experience with Azure building and running production workloads.
  • Strong scripting abilities in Python, Perl, Go, or JVM-based languages.
  • Solid understanding of continuous integration, deployment and operations concepts.
  • Production experience of managing Windows Infrastructure running IIS workloads.
  • Passion for resolving reliability issues and strategies to mitigate future issues.
  • Automation mindset - if you can automate it, do it.

  • Work with Development and Product Management to design and deliver new functionality.
  • Perform deep dives into both systemic and latent reliability issues; partner with software engineers across the organization to produce and roll out fixes.
  • Drive standardization efforts across multiple disciplines and services in conjunction with SREs throughout the organization.
  • Identify and drive opportunities to improve automation for the company; scope and create automation for deployment, management and visibility of our services.
  • Work in an agile operations framework, balancing sprint-based work with daily operations needs.
  • Participate in 24x7 on-call rotation with 12 hours shifts.

PythonSQLAgileJenkinsJVMAzureGoPostgresNosqlCollaborationCI/CDDevOps

Posted 2024-11-21
Apply
Apply

πŸ“ Germany and within Europe

🧭 Full-Time

πŸ” Technology / Employee Communication

🏒 Company: Flip App

  • Experience in operating and scaling cloud infrastructures (Azure, AWS, GCP).
  • Deep knowledge of Kubernetes and container solutions.
  • Interest in observability tools such as Prometheus, VictoriaMetrics, Mimir, Loki, ELK.
  • Familiarity with SLO, error budget, and Apdex.
  • Good knowledge of software development languages like Go, Python, Kotlin.
  • Business fluent in English; German is a plus.
  • Experience with infrastructure as code tools (e.g., Pulumi, OpenTofu) and automation tools (e.g., Ansible, Chef).

  • Ensure the availability, performance, and scalability of the infrastructure.
  • Promote practices like CI/CD, observability, and developer experience.
  • Shape goals for scalable systems and observability.
  • Expand cloud infrastructure and Kubernetes cluster.
  • Ensure resilience and safety through zero-downtime rollouts.
  • Create observability through the further development of the LGTM stack.
  • Design, develop, and optimize infrastructure as code using Pulumi in Go.

AWSPythonSoftware DevelopmentGCPKotlinKubernetesAzureGoGrafanaPrometheusCI/CD

Posted 2024-11-07
Apply
Apply

πŸ“ US, Portugal

🧭 Full-Time

πŸ” Health Technology

  • Proficiency in programming languages such as Python, Go, Javascript.
  • 5+ years of experience with cloud platforms such as AWS, Google Cloud, or Azure.
  • Strong understanding of Linux/Unix systems and networking.
  • Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes).
  • Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
  • Knowledge of CI/CD pipelines and tools (e.g., Jenkins, GitLab CI).
  • Proficiency with relational and NoSQL databases (e.g., MySQL, PostgreSQL, Redis, Elasticsearch).
  • Willingness to collaborate and share knowledge with colleagues.
  • Ability to take responsibility for work and demonstrate accountability.

  • Develop and maintain monitoring and alerting solutions.
  • Respond to incidents, troubleshoot issues, and perform root cause analysis.
  • Automate repetitive tasks and improve deployment processes.
  • Develop and maintain tools to support infrastructure and applications.
  • Analyze system performance and implement optimizations to improve efficiency and reduce latency.
  • Ensure systems are secure and compliant with relevant standards and regulations.
  • Maintain comprehensive documentation of systems and processes.
  • Share knowledge and best practices with team members.
  • Ensure the reliability, performance, and scalability of databases.
  • Perform database optimization, maintenance, and troubleshooting.

AWSDockerPostgreSQLPythonElasticSearchJavascriptJenkinsKubernetesMySQLJavaScriptAzureElasticsearchGoGrafanaPrometheusRedisNosqlCI/CD

Posted 2024-11-07
Apply
Apply

πŸ“ Europe

🧭 Full-Time

πŸ” Technology

🏒 Company: Flip GmbH

  • Experience in operating and scaling cloud infrastructures (Azure, AWS, GCP).
  • Deep knowledge of Kubernetes and container solutions.
  • Interest in observability tools and concepts like SLO, error budget.
  • Good knowledge of software development (e.g., Go, Python, Kotlin).
  • Business fluent in English.

  • Help scale the cloud infrastructure and Kubernetes clusters.
  • Ensure zero downtime with effective rollout, redundancy, migration strategies, and rollback mechanisms.
  • Develop and optimize LGTM stack and analyze SLOs.
  • Enhance operational safety and resilience of systems.
  • Design, develop and optimize production, development, and cloud infrastructure with Pulumi in Go.
  • Increase development efficiency and optimize code deployments through effective tools and processes.
  • Improve the CI/CD pipeline for faster feedback cycles and secure rollouts.

AWSDockerPythonSoftware DevelopmentAgileGCPKubernetesSCRUMAzureCI/CD

Posted 2024-10-29
Apply
Apply

πŸ“ EMEA

πŸ” Blockchain

  • Proven experience in an independent contributor role with cloud platform technologies (AWS, GCP, Azure, etc.).
  • Proficiency in scripting and programming languages such as Python, Golang, or TypeScript.
  • Experience with container technologies and microservices architecture (e.g., Docker, Kubernetes).
  • Hands-on experience with monitoring tools like Prometheus, Grafana, ELK stack.
  • Excellent problem-solving skills and ability to troubleshoot complex issues independently.
  • Strong understanding of Linux/Unix systems administration and networking concepts.
  • Strong communication and collaboration skills for effective work in cross-functional teams.

  • Collaborate with software engineering teams to design scalable, highly available, and resilient systems.
  • Develop automation tools and scripts for deployment, monitoring, and incident response.
  • Configure monitoring systems to proactively detect issues and define alerting procedures.
  • Respond to critical incidents, conduct root cause analysis, and implement preventive measures.
  • Analyze performance metrics to identify bottlenecks and propose optimizations.
  • Implement best practices for security and compliance through collaboration with security teams.
  • Document system configurations and share knowledge with team members.

AWSDockerPythonKubernetesGolangGrafanaPrometheus

Posted 2024-07-11
Apply