Apply📍 United States
🧭 Full-Time
💸 132000.0 - 172000.0 USD per year
🔍 Software Development
🏢 Company: Infinite Reality👥 101-250💰 $350,000,000 9 months agoMedia and EntertainmentWeb3Metaverse
- Extensive DevOps & Security Experience: You bring 5+ years of hands-on experience in DevOps and security monitoring, with a strong focus on logging, monitoring, and incident response. Your background allows you to design, implement, and optimize observability frameworks that enhance system security and performance.
- Incident Management Expertise: You have a proven track record of managing both security and operational incidents. From detection through resolution, you are adept at coordinating incident response efforts, leading post-incident reviews, and driving improvements to reduce future risks and downtime.
- Scripting & Automation Skills: You are proficient in scripting languages like Python or Bash, and are passionate about automating repetitive tasks to increase operational efficiency. Your automation solutions help streamline workflows, improve response times, and reduce manual intervention.
- Proficiency with Logging & Monitoring Tools: You have deep experience with tools like the ELK Stack, Splunk, Prometheus, and other observability platforms. Your expertise enables you to identify patterns, vulnerabilities, and trends in system health and security, empowering teams to act proactively.
- Collaboration & Cross-Functional Teamwork: You excel at working across teams, engineering, IT, and security, helping foster a culture of observability and continuous improvement. Your ability to communicate technical concepts clearly ensures alignment across stakeholders with varying levels of technical expertise.
- Strong Problem-Solving Skills: You thrive on solving complex issues, whether it’s a security breach or a system performance bottleneck. Your analytical mindset and experience with root cause analysis ensure that you can resolve problems efficiently and implement lasting solutions.
- Design & Optimize Logging and Monitoring Systems: Lead the design and implementation of advanced logging and monitoring architectures, ensuring that system performance, security threats, and infrastructure health are captured in real-time. You will drive best practices in observability to ensure our systems are proactive, secure, and resilient.
- Incident Response & Analysis: Own the full incident management lifecycle—from detection to resolution. Respond to both security and operational incidents, working across teams to minimize impact and quickly resolve issues. Lead post-incident analysis, identify root causes, and drive improvements to prevent future occurrences.
- Develop Automation Solutions: Build and implement automation workflows to streamline alerting, incident detection, and response processes. You’ll reduce manual intervention and optimize workflows, helping teams respond more efficiently to system events and improve operational efficiency.
- Collaborate with Cross-Functional Teams: Work closely with engineering, security, and operations teams to foster a culture of observability. Share best practices, establish clear protocols for incident detection and resolution, and ensure alignment across teams to improve overall system reliability.
- Monitor Security & Operational Alerts: Establish and fine-tune alerting rules to ensure actionable, precise, and timely notifications for security and system performance events. You’ll ensure that alerts are well-defined and routed to the right teams, minimizing response time to critical issues.
- Leverage Data for Continuous Improvement: Analyze logs and metrics to identify trends, anomalies, and potential security vulnerabilities. You’ll generate data-driven insights that help improve system health, performance, and security posture, contributing to ongoing process improvements.
- Mentor and Coach: Provide guidance to junior engineers and colleagues, promoting best practices in monitoring, incident management, and automation. Lead by example to elevate the technical capabilities of the team and drive knowledge-sharing across the organization.
AWSDockerPythonBashCloud ComputingCybersecurityKubernetesMicrosoft AzureAPI testingAzureGrafanaPrometheusREST APICommunication SkillsAnalytical SkillsCollaborationCI/CDProblem SolvingRESTful APIsLinuxDevOpsTerraformComplianceAnsibleScripting
Posted about 3 hours ago
Apply