Senior Reliability Engineer

Posted 4 months agoViewed

💎 Seniority level: Senior, 7+ years

📍 Location: Canada

🔍 Industry: Financial technology

⏳ Experience: 7+ years

🪄 Skills: DockerProject ManagementGCPKibanaKubernetesC#StrategyGrafana.NETPrometheusDocumentation

Operationally focused with expertise in incident management and live production issue resolution.
Strong debugging and troubleshooting skills, particularly in large-scale applications performance optimization.
Proven experience in building and maintaining monitoring and alerting systems.
7+ years of experience with .NET Framework (C#) for production stability.
Strong knowledge of Kubernetes, Docker, and cloud platforms like GCP.
Proficiency with monitoring tools such as Prometheus, Grafana, and Kibana.
Experience with incident ticketing/documentation tools like FreshDesk and Confluence.
Critical thinking ability to identify system weaknesses and innovate solutions.
Strong project management skills focused on scalability and stability.
ITIL Service Management certification (or equivalent) is highly desired.
Experience with PowerBI, web scraping, or Golang is a plus.

Provide live operational support for multiple client software applications, ensuring rapid restoration of services.
Develop and maintain code to quickly resolve production issues.
Own and resolve incidents, adhering to client SLA and internal SLO timelines.
Troubleshoot complex incidents and implement solutions to prevent recurrence.
Utilize data-driven approaches to prepare detailed analyses and reports.
Conduct deep technical analyses of product deficiencies and address client pain points.
Develop monitoring systems and implement robust alert mechanisms.
Provide guidance on improving operational system stability.
Lead initiatives that automate processes for operational efficiency.
Facilitate postmortem meetings following incidents.
Collaborate with cross-functional teams for rapid resolution of production issues.
Lead and motivate project teams to ensure quality standards.
Mentor reliability engineers and track their progress.
Participate in after-hours on-call support.