5+ years of experience in observability, monitoring, or data engineering roles, with a strong track record of building and managing robust monitoring systems.
Proficiency in observability platforms like Splunk, Prometheus, and Grafana for full-stack monitoring and visualization.
Advanced skills in creating dashboards and data visualizations with tools like Apache Superset, Grafana, and Splunk to deliver actionable insights.
Strong knowledge of AWS monitoring and event management tools, including CloudWatch, CloudTrail, and SNS.
Expertise in logging systems such as Elasticsearch and Logstash, enabling efficient log aggregation and analysis.
Advanced SQL skills for querying, transforming, and analyzing complex datasets to support decision-making and operational improvements.
Responsibilities:
Build, configure, and maintain observability platforms, integrating tools like Splunk, Prometheus, Grafana, and CloudWatch for comprehensive full-stack visibility.
Design and manage Apache Superset dashboards to monitor transactions, system performance, and operational analytics, delivering actionable insights to stakeholders.
Develop and refine alerting mechanisms to ensure rapid detection and resolution of system anomalies, enhancing overall platform reliability.
Aggregate and analyze logs across the technology stack using Splunk and Elasticsearch, identifying performance trends, potential bottlenecks, and system anomalies.
Provide actionable insights through detailed reports and visualizations, supporting data-driven decision-making across teams.
Continuously evaluate and improve monitoring practices, introducing enhancements that boost platform scalability, reliability, and performance.