Apply

Staff Engineer (Fleet Performance)

Posted 19 days agoViewed

View full description

💎 Seniority level: Staff, 7+ years

💸 Salary: 230000.0 - 270000.0 USD per year

🔍 Industry: Cloud computing

🏢 Company: DigitalOcean👥 1001-5000💰 $34,913,641 Post-IPO Equity over 3 years ago🫂 Last layoff almost 2 years agoVirtualizationDevOpsWeb HostingCloud ComputingSaaS

🗣️ Languages: English

⏳ Experience: 7+ years

Requirements:
  • Bachelor's or Master's degree in Computer Science, Mathematics, Statistics, or Computer/Electrical Engineering or equivalent work experience.
  • Extensive knowledge of Linux kernel, hypervisors, and open-source operating systems.
  • 7+ years of experience with performance measurement tools such as profilers, eBPF, XDP, fio, TPCC, MLPerf, and NCCL.
  • 5+ years of experience developing strategies for managing, monitoring, and analyzing infrastructure, applications, and services.
  • Strong proficiency in Go, Python, or Ruby.
  • Deep understanding of kernel performance aspects, including scheduling, context switching, and hardware acceleration.
  • Expertise in distributed systems performance.
  • Knowledge of GPU technology and programming for multi-GPU workloads.
  • Proven problem-solving ability at scale.
  • Strong security mindset with best practices.
  • Excellent collaboration and communication skills.
  • Leadership experience in skills development and mentorship.
  • Professional-level written and spoken English with strong presentation abilities.
Responsibilities:
  • Develop and implement comprehensive performance metrics, analysis tools, and reporting systems.
  • Lead initiatives to enhance shared infrastructure, balancing performance optimization with security standards.
  • Collaborate with hardware engineering teams and vendors to validate GPU fabric performance.
  • Engage with the open-source Linux community to advance virtualization technologies.
  • Conduct in-depth performance analysis to devise optimization strategies.
  • Identify system bottlenecks and drive optimizations across the hypervisor software stack.
  • Work cross-functionally to harness new performance capabilities from evolving hardware architectures.
  • Enhance test frameworks and pipelines for robust performance validation.
  • Investigate and resolve virtual machine downtime and performance issues.
  • Participate in on-call rotations to support system reliability.
Apply

Related Articles

Posted 4 months ago

Insights into the evolving landscape of remote work in 2024 reveal the importance of certifications and continuous learning. This article breaks down emerging trends, sought-after certifications, and provides practical solutions for enhancing your employability and expertise. What skills will be essential for remote job seekers, and how can you navigate this dynamic market to secure your dream role?

Posted 4 months ago

Explore the challenges and strategies of maintaining work-life balance while working remotely. Learn about unique aspects of remote work, associated challenges, historical context, and effective strategies to separate work and personal life.

Posted 4 months ago

Google is gearing up to expand its remote job listings, promising more opportunities across various departments and regions. Find out how this move can benefit job seekers and impact the market.

Posted 4 months ago

Learn about the importance of pre-onboarding preparation for remote employees, including checklist creation, documentation, tools and equipment setup, communication plans, and feedback strategies. Discover how proactive pre-onboarding can enhance job performance, increase retention rates, and foster a sense of belonging from day one.

Posted 4 months ago

The article explores the current statistics for remote work in 2024, covering the percentage of the global workforce working remotely, growth trends, popular industries and job roles, geographic distribution of remote workers, demographic trends, work models comparison, job satisfaction, and productivity insights.