Apply

Lead Infrastructure Engineer

Posted 5 days agoViewed

View full description

💎 Seniority level: Lead

🔍 Industry: Software Development

🏢 Company: Integration App

🗣️ Languages: English

Requirements:
  • You’ve built and run cloud infrastructure at scale (AWS preferred).
  • You work fluently with IaC tools (Terraform, CDK, etc.) and container platforms (Docker, Kubernetes).
  • You’ve implemented observability and understand distributed systems debugging.
  • You care about security, reliability, and helping others ship faster.
Responsibilities:
  • Own our cloud infrastructure, primarily AWS—design for scale, reliability, and security.
  • Improve observability—build out logging, monitoring, and tracing to catch issues before users do.
  • Streamline deployments—refine CI/CD pipelines, speed up builds, and improve dev workflows.
  • Make things reliable and efficient—automate failover, improve uptime, reduce cloud spend.
  • Level up developer experience—make development experience for our team smooth, fast, and safe.
  • Lead infrastructure work—set direction, share best practices, and mentor others as we grow.
Apply

Related Jobs

Apply

📍 United States

🧭 Full-Time

🔍 Advertising Software

🏢 Company: MNTN👥 251-500💰 $2,000,000 Seed over 2 years agoAdvertisingReal TimeMarketingSoftware

  • 8+ years in infrastructure engineering or systems administration, with increasing scope and leadership.
  • Demonstrated experience tuning Linux kernel settings for disk and network performance.
  • Deep experience with virtualized environments (multiple hypervisors).
  • Proven ability to support large-scale SAAS infrastructure and large database clusters.
  • Strong scripting and automation skills in Python and Bash.
  • Familiarity with storage technologies, particularly iSCSI and network-based storage.
  • Understanding of core networking concepts, including layer 3 routing and TCP/IP fundamentals.
  • Experience with Ansible or similar configuration management tools.
  • Strong documentation skills and operational discipline.
  • Ability to travel on-site twice per year.
  • Architect and implement high-performance data warehousing infrastructure in collaboration with Data Engineering.
  • Tune Linux kernel parameters for optimal disk and network throughput—e.g., adjusting block sizes, optimizing IOPS, striping.
  • Design and support hybrid infrastructure solutions that combine colocated servers and cloud platforms.
  • Lead automation efforts using Ansible and scripting (Python, Bash) to configure, deploy, and maintain server clusters.
  • Own the performance and scalability of systems supporting large-scale database clusters (e.g., Postgres, MySQL, Oracle).
  • Define templates and standards for infrastructure deployment and management.
  • Drive ongoing performance improvements across the infrastructure stack.
  • Manage all aspects of data center operations including rack layout, IP planning, and hardware logistics.
  • Establish robust monitoring and alerting for all infrastructure components.

AWSPostgreSQLPythonSQLBashKubernetesMySQLOracleData engineeringRDBMSCI/CDLinuxDevOpsTerraformNetworkingAnsibleScripting

Posted 1 day ago
Apply
Apply

🧭 Fulltime

🔍 Software Development

🏢 Company: Pallon

  • 5+ years owning infrastructure end-to-end, ideally in startup environments.
  • Comfortable at every layer — from bare-metal servers and NVMe drives to container orchestration and cloud-native tools.
  • Strong Linux fundamentals, and you know your way around networking, storage, and distributed systems.
  • Can code well enough to automate, debug, and build tooling across a variety of languages.
  • Communicate clearly and collaborate well — especially with engineers who aren’t infra specialists.
  • Designing and building a custom GPU cluster for deep learning workloads.
  • Deciding how we manage and scale our infrastructure — both on-prem and in the cloud.
  • Keeping systems running smoothly and securely — from data pipelines to distributed training jobs.
  • Troubleshooting weird kernel errors, configuring systemd units, or debugging Kubernetes evictions.
  • Making calls on when to script, when to automate, and when to just fix the thing.
Posted 14 days ago
Apply

Related Articles

Posted about 1 month ago

How to Overcome Burnout While Working Remotely: Practical Strategies for Recovery

Burnout is a silent epidemic among remote workers. The blurred lines between work and home life, coupled with the pressure to always be “on,” can leave even the most dedicated professionals feeling drained. But burnout doesn’t have to define your remote work experience. With the right strategies, you can recover, recharge, and prevent future episodes. Here’s how.



Posted 7 days ago

Top 10 Skills to Become a Successful Remote Worker by 2025

Remote work is here to stay, and by 2025, the competition for remote jobs will be tougher than ever. To stand out, you need more than just basic skills. Employers want people who can adapt, communicate well, and stay productive without constant supervision. Here’s a simple guide to the top 10 skills that will make you a top candidate for remote jobs in the near future.

Posted 9 months ago

Google is gearing up to expand its remote job listings, promising more opportunities across various departments and regions. Find out how this move can benefit job seekers and impact the market.

Posted 10 months ago

Read about the recent updates in remote work policies by major companies, the latest tools enhancing remote work productivity, and predictive statistics for remote work in 2024.

Posted 10 months ago

In-depth analysis of the tech layoffs in 2024, covering the reasons behind the layoffs, comparisons to previous years, immediate impacts, statistics, and the influence on the remote job market. Discover how startups and large tech companies are adapting, and learn strategies for navigating the new dynamics of the remote job market.