Staff Site Reliability Engineer

Posted 10 months agoViewed
Costa RicaFull-TimeInformation services
Company:
Location:Costa Rica
Languages:English
Seniority level:Staff, 5+ years
Experience:5+ years
Skills:
AWSDockerPythonGitJenkinsKubernetesPrometheusLinux
Requirements:
5+ years of direct experience supporting complex scaled systems in production. Linux knowledge with experience in troubleshooting and predicting issues. Networking, troubleshooting, and monitoring skills. Experience with cloud-native application designs for performance and resilience. Skills in incident management and coordination, including blameless post-incident reviews. Familiarity with technologies like Kubernetes, Splunk, Dynatrace, ServiceNow, Jira, Jenkins, Python, Prometheus, Java, Cassandra, Redis, MongoDB, AWS, and Infrastructure as Code.
Responsibilities:
Uptime of Experian One – Experian's Cloud SaaS offering for Decision Analytics. Monitor and provide alerts for platform performance. Respond to incidents and restore service promptly. Understand systems to assess issues and allocate problem resolution. Identify and eliminate manual processes to prevent recurrence. Manage incidents and coordinate during service disruptions. Write complex queries using various tools. Review systems designs to identify resiliency, scalability, and monitoring issues.
Similar Jobs:
Posted 7 days ago
AmericasFull-TimeSoftware Development
Senior Build/Release/CI Engineer
Company:
Posted 9 days ago
AmericasFull-TimeSoftware Development
Senior Build/Release/CI Engineer
Posted 9 days ago
AmericasFull-TimeSoftware Development
Senior Build/Release/CI Engineer
Company:Brave