Senior Site Reliability Engineer (L3)

Posted 5 months agoViewed
16407 - 18047 RM per month
MalaysiaFull-TimeCryptocurrency Data
Company:
Location:Malaysia
Languages:English
Seniority level:Senior, 3 to 5 years
Experience:3 to 5 years
Skills:
AWSPythonCloud ComputingRubySoftware ArchitectureGoRelease ManagementCI/CDLinuxDevOpsTerraformMicroservices
Requirements:
3 to 5 years in managing software deployments and instrumentation in production environments with defined SLAs and SLOs. Strong knowledge of software delivery and devops principles. Experience with cloud platforms (e.g., AWS, CloudFlare, GCP). Experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation). Strong programming and scripting skills, preferably in languages such as Python, Go, or Ruby. Bachelor’s degree in Comp Sci., InfoSec or similar fields, or professional certificates e.g. Certified DevOps Professional, Certified Solutions Architect Professional in AWS or GCP. Fully capable of taking substantial features from concept to shipping as a sole contributor. Works effectively in open-ended projects and is self-sufficient to deep dive and evaluate multiple solutions to a problem. Solve hard problems with many constraints, using sound judgment to assess risks and present arguments in a well-structured, data-backed, written narrative. Have passion, creativity and empathy for users. Able to derive information, think critically and make snap judgements based on measured data in high pressure situations. Strong communicator who is able to build positive working relationships between teams and form relationships with key customers. Experience supporting on-call rotations for 24x7 services to troubleshoot, perform runbooks or escalate incidents.
Responsibilities:
Review architecture and software components with software engineers. Ensure best practices are consistent across all teams. Own and ensure SLOs and SLAs are met. Monitor operational metrics and lead improvement plans. Develop and maintain tools including infra-as-code resources. Manage and audit security controls to meet enterprise requirements. Implement and maintain best practices and compliance standards. Lead strategic release plans (e.g., canary or blue-green deployments). Lead incident response and post-mortems. Develop and implement DR plans and procedures. Perform and improve day-to-day tasks including access onboarding-offboarding, config and patch management. Plan capacity to ensure systems have sufficient capacity. Develop and extend runbooks, documentation and other technical assets. Stay up-to-date with emerging trends and technologies. Contribute to knowledge sharing. Collaborate with cross-functional teams. Answer technical questions from other teams. Provide feedback on the performance of junior staff. Participate in people development initiatives. Support any ad hoc tasks as required by the company.
About the Company
View Company Profile
Similar Jobs:
Posted about 1 month ago
APACContractSoftware Development
Site Reliability Engineer
Company:Rocket.Chat
Posted over 1 year ago
WorldwideFull-TimeSoftware Development
Senior Site Reliability / Gitops Engineer
Company:Canonical
Posted over 1 year ago
WorldwideFull-TimeSoftware Development
Site Reliability / Gitops Engineer
Company:Canonical