Apply

Production Support Engineer

Posted about 1 month agoViewed

View full description

💎 Seniority level: Senior, 5+ years

📍 Location: Canada, Brazil

🔍 Industry: Fintech

🏢 Company: Flinks👥 101-250💰 Series B over 3 years agoDeveloper APIsBig DataFinancial ServicesBankingAsset ManagementAnalyticsMobile AppsFinTech

🗣️ Languages: English

⏳ Experience: 5+ years

Requirements:
  • 5+ years of experience with .NET Framework (C#), ensuring production system stability
  • Strong coding, debugging, and troubleshooting skills, particularly in performance optimization of large-scale applications
  • Operationally focused with expertise in incident management and resolving live production issues
  • Proven experience in building and maintaining reliable monitoring and alerting systems in high-demand environments, with a focus on production support
  • Strong knowledge of Kubernetes, Docker, and cloud platforms (GCP preferred)
  • Proficiency with monitoring tools like Prometheus, Grafana, and Kibana
  • Experience with incident ticketing/documentation tools like FreshDesk and Confluence
  • Critical thinker who can identify system weaknesses and find innovative solutions
  • Strong project management skills with a focus on scalability and system stability
Responsibilities:
  • Develop and maintain code to quickly resolve product issues, ensuring fast recovery and long-term system stability.
  • Provide live operational support across multiple client applications, monitoring services and alerts to detect and resolve critical failures with minimal downtime.
  • Own and troubleshoot complex incidents, conduct root cause analyses, and implement long-term solutions—adhering to SLAs and internal SLOs.
  • Build monitoring dashboards and alerting systems to proactively detect and address issues, supporting system scalability and stability.
  • Analyze operational metrics and KPIs to identify trends, surface client pain points, and drive improvements.
  • Automate tooling and processes to improve efficiency and reduce manual work across LiveOps.
  • Collaborate with cross-functional teams to deliver lasting fixes for production issues and contribute to technical analyses of product gaps.
  • Lead and mentor reliability engineers, providing guidance and ensuring consistent delivery of high-quality work.
  • Participate in post-incident reviews, documenting outcomes and driving preventative action items.
  • Support after-hours on-call coverage as part of the LiveOps rotation
Apply

Related Jobs

Apply

🧭 Full-Time

💸 85000.0 - 110000.0 USD per year

🔍 Software Development

🏢 Company: Pindrop👥 251-500💰 $100,000,000 Debt Financing 10 months agoFraud DetectionSecurityCyber SecurityNatural Language ProcessingSpeech Recognition

  • 5+ years of experience in a similar role or foundational understanding of the job duties.
  • Strong understanding of RESTful APIs and experience with tools/utilities to interact with them
  • Exposure to Cloud platforms like AWS, GCP etc.
  • Familiarity or experience with Linux OS, Relational Databases and Scripting.
  • Proficiency in creating, maintaining, and organizing technical documentation.
  • Experience with cloud-based ticketing tools
  • Passionate about solving challenging problems and troubleshooting until a solution is discovered.
  • Technologist at heart that has a high desire to work with bleeding-edge solutions.
  • Truly able to understand and learn new tools from top to bottom.
  • Not afraid to raise your hand when you have questions or lend a helping hand to your peers.
  • Be acclimated with our products and start working on access to required applications.
  • Complete the onboarding process and join troubleshooting sessions with other engineers to learn the internal complexities.
  • Independently handle customer cases and incidents while working collaboratively with multiple internal teams.
Posted about 1 month ago
Apply
Apply

🔍 Fintech

  • 3+ years of experience working in technology environment with experience in microservices architecture
  • 3+ years of experience in incident response or similar role
  • Experience working with a remote team in a global environment
  • Knowledge of various monitoring platforms such as AWS CloudWatch, SumoLogic, APM monitoring (NewRelic, Instana), mobile (Crashlytics data), BI (Looker, Snowflake)
  • Knowledge of relational databases, BI querying languages to be able to construct queries during investigations
  • Experience working with tools like Postman, or scripting API queries
  • Excellent debugging and documentation skills
  • Ability to coordinate incident response and communicate effectively with stakeholders from variety of teams across different timezones
  • Ability to remain calm under pressure during a production incident resolution
  • Ownership of risk event process for the PH & East Asia timezone: you’ll help coordinate teams responding to an incident, communicate effectively, oversee post-mortem and monitor that the follow-up action items are completed
  • Ownership of escalations from the in-country CXCL guild: debugging and identifying problems, resolving when possible and escalating to appropriate teams when necessary.
  • Tracking and reporting on pending issues, and regular updates on open items
  • Continuous improvement of our monitoring dashboards and alerts
  • In collaboration with the CX team, identify patterns in customer and product issues and propose improvements
  • In collaboration with the Production Support Engineers globally, share product learning, knowledge and exchange ideas
  • Identify and communicate repeating themes around risk events and propose improvements to prevent recurrence of the same issues
  • Keep track of metrics related to production performance and identify areas of improvement
  • Continuous improvements of our documentation library to allow faster onboarding of new team members and more efficient response times
Posted about 2 months ago
Apply

Related Articles

Posted about 1 month ago

How to Overcome Burnout While Working Remotely: Practical Strategies for Recovery

Burnout is a silent epidemic among remote workers. The blurred lines between work and home life, coupled with the pressure to always be “on,” can leave even the most dedicated professionals feeling drained. But burnout doesn’t have to define your remote work experience. With the right strategies, you can recover, recharge, and prevent future episodes. Here’s how.



Posted 6 days ago

Top 10 Skills to Become a Successful Remote Worker by 2025

Remote work is here to stay, and by 2025, the competition for remote jobs will be tougher than ever. To stand out, you need more than just basic skills. Employers want people who can adapt, communicate well, and stay productive without constant supervision. Here’s a simple guide to the top 10 skills that will make you a top candidate for remote jobs in the near future.

Posted 9 months ago

Google is gearing up to expand its remote job listings, promising more opportunities across various departments and regions. Find out how this move can benefit job seekers and impact the market.

Posted 10 months ago

Read about the recent updates in remote work policies by major companies, the latest tools enhancing remote work productivity, and predictive statistics for remote work in 2024.

Posted 10 months ago

In-depth analysis of the tech layoffs in 2024, covering the reasons behind the layoffs, comparisons to previous years, immediate impacts, statistics, and the influence on the remote job market. Discover how startups and large tech companies are adapting, and learn strategies for navigating the new dynamics of the remote job market.