Strong experience with a cloud provider (AWS preferred but we’re open to Azure and GCP)
Experience designing and building unified observability platforms that enable companies to use metrics, logs, and traces to determine quickly if their application or service is operating as desired.
Use of Terraform as infrastructure as code
Comfortable using Python/Golang as a programming language
Experience around building and maintaining production-grade Kubernetes clusters
Knowledge of security best practices and the ability to implement security controls at the infrastructure level
Experience with monitoring and logging tools like DataDog or Grafana’s observability stack(Prometheus, Tempo, Loki, Grafana)
Familiarity with the open standard OpenTelemetry
Responsibilities:
Be a focal point for observability roadmap and best practices
Configure and maintain Observability solutions like DataDog, ensuring its scalability, reliability, and alignment with our operational objectives.
Collaborate with multiple product teams and respective owners to design observability solutions and building alerting strategies as needed
Building custom metrics and features to enhance Primer’s observability
Infrastructure as Code development
Writing processes and documentation for system design, troubleshooting and maintenance