Apply

Infrastructure Engineer

Posted 3 days agoViewed

View full description

🏢 Company: Paradigm

Requirements:
  • Proven experience maintaining and scaling bare metal servers and cloud environments for production systems
  • Proficient at building tooling and scripts using Rust, Go or Python
  • Deep expertise deploying Kubernetes within production environments and working with IaC and configuration management tools like Terraform, Helm and ArgoCD
  • Skilled at deploying monitoring, alerting and observability systems (e.g., Prometheus, Grafana), securing and hardening those systems, and troubleshooting issues with engineers
  • Knowledgeable about Linux and networking, and troubleshooting on Linux systems
  • Familiarity with blockchain infrastructure, particularly the Ethereum ecosystem
Responsibilities:
  • Implement and manage the infrastructure that allows the engineering team to ship quickly and effectively
  • Proactively identify and eliminate bottlenecks in the devops process to ensure optimal developer velocity
Apply

Related Jobs

Apply

📍 United States

🧭 Full-Time

🔍 Advertising Software

🏢 Company: MNTN👥 251-500💰 $2,000,000 Seed over 2 years agoAdvertisingReal TimeMarketingSoftware

  • 8+ years in infrastructure engineering or systems administration, with increasing scope and leadership.
  • Demonstrated experience tuning Linux kernel settings for disk and network performance.
  • Deep experience with virtualized environments (multiple hypervisors).
  • Proven ability to support large-scale SAAS infrastructure and large database clusters.
  • Strong scripting and automation skills in Python and Bash.
  • Familiarity with storage technologies, particularly iSCSI and network-based storage.
  • Understanding of core networking concepts, including layer 3 routing and TCP/IP fundamentals.
  • Experience with Ansible or similar configuration management tools.
  • Strong documentation skills and operational discipline.
  • Ability to travel on-site twice per year.
  • Architect and implement high-performance data warehousing infrastructure in collaboration with Data Engineering.
  • Tune Linux kernel parameters for optimal disk and network throughput—e.g., adjusting block sizes, optimizing IOPS, striping.
  • Design and support hybrid infrastructure solutions that combine colocated servers and cloud platforms.
  • Lead automation efforts using Ansible and scripting (Python, Bash) to configure, deploy, and maintain server clusters.
  • Own the performance and scalability of systems supporting large-scale database clusters (e.g., Postgres, MySQL, Oracle).
  • Define templates and standards for infrastructure deployment and management.
  • Drive ongoing performance improvements across the infrastructure stack.
  • Manage all aspects of data center operations including rack layout, IP planning, and hardware logistics.
  • Establish robust monitoring and alerting for all infrastructure components.

AWSPostgreSQLPythonSQLBashKubernetesMySQLOracleData engineeringRDBMSCI/CDLinuxDevOpsTerraformNetworkingAnsibleScripting

Posted 1 day ago
Apply
Apply

📍 United States

🔍 Software Development

🏢 Company: Worth AI👥 11-50💰 $12,000,000 Seed over 1 year agoArtificial Intelligence (AI)Business IntelligenceRisk ManagementFinTech

  • Strong programming skills in Python, Node.js.
  • Proficient in SQL and experience with distributed query engines (e.g., Trino, Presto).
  • Experience with cloud-native data platforms such as AWS Glue
  • Hands-on experience with infrastructure-as-code tools (Terraform, Pulumi, CloudFormation).
  • Familiarity with containerization and orchestration tools such as Kafka and Kubernetes.
  • Solid understanding of data governance, quality frameworks, and data lifecycle management.
  • Experience in streaming data architecture and tools like Apache Kafka, Kinesis, or Pub/Sub.
  • Background in supporting machine learning or analytics platforms.
  • Exposure to data mesh, data contracts, or modern data stack concepts.
  • Knowledge of DevOps principles applied to data systems.
  • Design, build, and maintain scalable and resilient data infrastructure in a cloud environment (AWS, Azure, or GCP).
  • Develop and maintain ETL/ELT pipelines using orchestration tools such as Airflow, Dagster, or dbt.
  • Optimize data workflows for reliability, performance, and cost efficiency across structured and unstructured datasets.
  • Manage data lake and data warehouse environments (e.g., Snowflake, BigQuery, Redshift, Delta Lake).
  • Ensure data security, privacy, and compliance, including role-based access control, data encryption, and audit logging.
  • Collaborate with data scientists, analysts, and product teams to ensure data accessibility, accuracy, and availability.
  • Support real-time and batch data processing frameworks, including Kafka, Spark, Flink, or similar tools.
  • Monitor, troubleshoot, and improve the observability and performance of data systems using tools like Prometheus, Grafana, or Datadog.
  • Maintain CI/CD pipelines for data infrastructure using Terraform, GitHub Actions, or similar tools.
Posted 1 day ago
Apply
Apply

🧭 Part-Time

🔍 E-Learning

🏢 Company: Truelogic👥 101-250ConsultingWeb DevelopmentWeb DesignSoftware

  • 3+ years of hands-on experience with AWS cloud infrastructure, particularly ECS Fargate, Lambda, DynamoDB, RDS, S3, CloudFront, and VPC configuration
  • Strong proficiency in Python web framework
  • Extensive experience with Docker, containerization, and container orchestration
  • Working knowledge of uWSGI, Nginx, and web server configuration
  • Familiarity with Linux system administration and shell scripting
  • Experience with infrastructure as code tools (CloudFormation preferred)
  • Understanding of networking concepts, security best practices, and performance optimization
  • Ability to manage multiple technical priorities and communicate clearly about complex systems
  • Self-motivated with a proactive approach to problem-solving
  • Comfort working in a part-time capacity while delivering high-impact results
  • Manage and optimize our AWS cloud infrastructure (ECS Fargate, Lambda, DynamoDB, S3, CloudFront, Aurora RDS, etc.)
  • Monitor and troubleshoot container deployments, ensuring high availability and performance
  • Implement and improve CI/CD pipelines for automated testing and deployment
  • Maintain security best practices and compliance across our infrastructure
  • Optimize costs while maintaining performance and reliability
  • Support and extend our Python Flask application architecture
  • Integrate and configure services including uWSGI, Nginx, and Redis
  • Manage DynamoDB and Aurora RDS tables and data workflows
  • Develop and maintain Lambda functions for various processing tasks
  • Work with Docker containers and containerization strategies
  • Diagnose and resolve technical issues across our development and production environments
  • Perform system upgrades and patches with minimal service disruption
  • Document technical processes, architecture decisions, and system configurations
  • Participate in on-call rotation for critical system support (as needed)
  • Recommend and implement architecture improvements based on evolving requirements
  • Research and evaluate new technologies and services that could benefit our infrastructure
  • Collaborate with the team to establish best practices for code quality and infrastructure management
Posted 2 days ago
Apply
Apply

🧭 Full-Time

🔍 Software Development

🏢 Company: FluidStack👥 11-50💰 Private 8 months agoPrivate CloudCloud ComputingMachine LearningGenerative AIInformation TechnologySmall and Medium BusinessesCloud StorageSoftwareGPU

  • 5+ years of experience in compute infrastructure engineering.
  • Strong knowledge of Linux systems administration and performance tuning.
  • Experience with bare metal provisioning tools (MaaS, Metal3, Tinkerbell, or other).
  • Familiarity with GPU hardware and workload optimization, especially kernel and driver level requirements.
  • Proficiency in automation tools (e.g., Ansible, Terraform).
  • Experience operating Kubernetes and SLURM clusters.
  • Design and implement GPU/ASIC infrastructure at the server, rack, and system level.
  • Troubleshoot complex GPU and compute system related failures.
  • Develop and maintain hardware/firmware management services.
  • Automate all aspects of the server lifecycle.
  • Own end-to-end compute lifecycle, including partnering with vendors on RMAs.
  • Serve as the main point of contact for hardware escalation and troubleshooting.
  • Monitor system performance, identifying and resolving bottlenecks.
  • Automate deployment and management tasks to improve efficiency.
  • Collaborate with storage and network teams to ensure cohesive infrastructure operations.
Posted 2 days ago
Apply
Apply

🧭 Full-Time

💸 145000.0 - 195000.0 USD per year

🔍 Software Development

🏢 Company: Cavnue👥 101-250💰 $130,000,000 Series A about 3 years agoInformation ServicesAutonomous VehiclesSoftware

  • 5+ years of hands-on experience in infrastructure engineering, DevOps, or SRE roles, with a track record of operating production cloud environments at scale.
  • Strong experience using Terraform for infrastructure provisioning and configuration management in cloud environments.
  • Proficiency in multi-cloud operations – Google Cloud Platform (GCP) is highly preferred; experience with Amazon Web Services (AWS) and/or Microsoft Azure is also acceptable.
  • Deep understanding of Kubernetes (required), including experience setting up and managing Kubernetes clusters, deploying containerized applications, and debugging cluster and networking issues.
  • Ability to write clean, maintainable code for automation and tooling in Python and/or Golang.
  • Familiarity with basic networking concepts and protocols (TCP/IP, DNS, load balancing, VLANs/VPCs, firewalls) and how they apply in cloud and hybrid environments.
  • Willingness to take part in on-call rotations and proven skills in troubleshooting and resolving infrastructure incidents under pressure.
  • Strong hands-on skills with Linux and command-line tools; you are comfortable using terminals and utilities (e.g. k9s for Kubernetes, tmux sessions, zsh or similar shells) to manage and debug systems efficiently.
  • Knowledge of zero trust architecture principles and a habit of incorporating security best practices into infrastructure design (formal security certifications are not required).
  • Excellent communication skills with the ability to work cross-functionally. You can collaborate in a fast-paced engineering organization, explain complex infrastructure concepts to team members, and contribute to a positive engineering culture.
  • Design and implement cloud and edge infrastructure
  • Use Terraform to provision and manage infrastructure resources consistently across multiple cloud providers (GCP preferred, with AWS/Azure as needed), enabling reproducible and auditable infrastructure changes.
  • Deploy, administer, and optimize Kubernetes clusters for containerized workloads. Handle cluster upgrades, scaling, monitoring, and troubleshoot complex issues in production Kubernetes environments.
  • Develop robust automation scripts and internal tools/services in Python and/or Golang to automate routine tasks, integrate systems, and improve operational efficiency across the infrastructure.
  • Implement monitoring, logging, and alerting solutions to track system performance and reliability. Proactively tune systems and address bottlenecks to maintain smooth operation of critical services.
  • Embed security best practices into the infrastructure, enforcing zero trust architecture principles (e.g. least privilege, identity-based access) to protect systems and data. Work closely with security teams to remediate vulnerabilities and ensure compliance with company policies.
  • Participate in an on-call rotation during the team’s initial growth phase, quickly responding to infrastructure incidents and leading efforts to restore service and perform root cause analysis.
  • Work closely with all teams to understand application needs and translate them into scalable infrastructure solutions. Communicate clearly across teams and document designs and processes for broad understanding.
  • Stay up to date with emerging technologies and industry best practices in cloud infrastructure, DevOps, and platform engineering. Lead or contribute to infrastructure projects that enhance deployment speed, cost efficiency, and overall platform reliability.
Posted 3 days ago
Apply
Apply

🧭 Full-Time

🔍 Software Development

🏢 Company: Clickatell

  • Related IT qualification / 5+ years in a system administrative position
  • Red Hat Enterprise Linux certified (RHCE or better) or other appropriate Linux/Unix certification (preferred)
  • Cloud certifications (AWS preferred)
  • Proven experience as a SysOps Engineer or similar role.
  • Experience in virtualisation and cloud environments such as Amazon Web Services (AWS) or similar.
  • Red Hat Enterprise Linux certified (RHCE or better) or other appropriate Linux/Unix certification advantageous
  • Perl, python, ruby and/or PHP scripting experience advantageous
  • Containerisation (Docker/Kubernetes) knowledge advantageous
  • Operating systems, from bare steel to network services
  • Containerisation (Docker/Kubernetes) knowledge
  • Use of CD/CI tools (Anisible/Puppet/Terraform) advantageous
  • Proven experience in and with a large, ISP-type environment and infrastructure advantageous
  • Monitoring and alerting experience with Open-Source technologies like Icinga/Nagios, Nagvis, Logstash, Elasticsearch, Graphite and Kibana Advantageous
  • Proven experience in production environments of the below is advantageous:SAN storage solutions
  • Software package building and release management with software tools such as Puppet, Chef or Salt
  • Network/OS clustering
  • Must understand and demonstrate knowledge of:Networking, from Ethernet to IP
  • Operating systems, from bare steel to network services
  • IP networks, including but not limited to working knowledge of DHCP, DNS, SMTP, FTP, HTTP
  • Minimum of 5 years in a system administrative position
  • Minimum of 3 years working experience with Unix or derivative
  • Minimum of 3 years working experience with troubleshooting hardware and/or software
  • Minimum of 3 years programming or scripting experience (advantageous)
  • Amazon Web Services and Virtualization technologies such as VMware
  • Be a thought leader with regards to Clickatell’s overall cloud adoption strategy.
  • Divisional policy and process formulation, strategic planning, resource coordination and operational execution of projects and assisting in procurement process.
  • Installation/configuration, operation, maintenance, and monitoring of the Clickatell messaging engine hardware, software, and related infrastructure with a focus on high availability, stability and security.
  • Work closely with software development teams to facilitate smooth integration of applications with cloud infrastructure.
  • Scripting and coding to automate routine tasks and improve operational efficiency.
  • Technical research and development to enable continuing innovation within the infrastructure
  • Provide technical support and troubleshooting for cloud-based infrastructure issues.
  • Ensuring network, hardware, operating systems, software applications and any related procedures adhere to organizational values, enabling staff, customers, and partners
  • Technical liaison with enterprise customers and vendors as required Service, maintain, commission, and support global platforms, with a view towards high availability
Posted 4 days ago
Apply
Apply

🧭 Full-Time

🔍 Software Development

🏢 Company: FluidStack👥 11-50💰 Private 8 months agoPrivate CloudCloud ComputingMachine LearningGenerative AIInformation TechnologySmall and Medium BusinessesCloud StorageSoftwareGPU

  • 5+ years of experience in storage engineering, with a focus on high-performance environments.
  • Proficiency in storage protocols (NFS, S3) and technologies (RAID, ZFS).
  • Experience with storage hardware from major vendors (e.g. Weka, VAST, DDN) or open source tools (LUSTRE, Minio, etc.).
  • Strong scripting skills (e.g., Python, Bash) for automation and monitoring.
  • Familiarity with data center operations and infrastructure management.
  • Design and deploy scalable storage architectures (SAN, NAS, object storage) tailored for GPU-intensive workloads.
  • Implement and manage backup, replication, and disaster recovery strategies.
  • Monitor storage performance and capacity, optimizing for efficiency and reliability.
  • Collaborate with compute and network teams to ensure seamless integration and performance.
  • Evaluate and integrate emerging storage technologies to maintain cutting-edge infrastructure.
Posted 4 days ago
Apply
Apply

💸 190000.0 - 240000.0 USD per year

🔍 Software Development

🏢 Company: Engine

  • Hands-on with TensorFlow Serving, TorchServe, or similar frameworks.
  • Build production-grade APIs and integrate model inference into application workflows.
  • Containerize and orchestrate inference services at scale.
  • Deploy and operate machine learning models optimized for low-latency, high-throughput inference in production environments.
  • Build and maintain clean gRPC interfaces to expose model predictions to upstream services.
  • Own the production code paths that deliver features to the model—writing maintainable, testable application logic that integrates cleanly with the broader system.
Posted 5 days ago
Apply
Apply

🧭 Full-Time

🔍 Software Development

🏢 Company: Integration App

  • You’ve built and run cloud infrastructure at scale (AWS preferred).
  • You work fluently with IaC tools (Terraform, CDK, etc.) and container platforms (Docker, Kubernetes).
  • You’ve implemented observability and understand distributed systems debugging.
  • You care about security, reliability, and helping others ship faster.
  • Own our cloud infrastructure, primarily AWS—design for scale, reliability, and security.
  • Improve observability—build out logging, monitoring, and tracing to catch issues before users do.
  • Streamline deployments—refine CI/CD pipelines, speed up builds, and improve dev workflows.
  • Make things reliable and efficient—automate failover, improve uptime, reduce cloud spend.
  • Level up developer experience—make development experience for our team smooth, fast, and safe.
  • Lead infrastructure work—set direction, share best practices, and mentor others as we grow.
Posted 5 days ago
Apply
Apply

📍 Europe, South Africa

🧭 Full-Time

🔍 Air Cargo

🏢 Company: cargo.one

  • At least 2 years of experience as cloud infrastructure engineer with one of the major cloud providers.
  • A strong growth mindset.
  • Exceptional written and verbal skills with fluency in English
  • Operate our cloud infrastructure on GCP and Hetzner including multiple Kubernetes clusters (GKE, Rancher)
  • Run and maintain infrastructure components hosted within Kubernetes, for example Hashicorp Vault, redis and nginx-ingress
  • Keep track of what our infrastructure is doing through Grafana dashboards and alerts.
  • Assess security risks and actively increase security of our operations by thinking about refined approaches to authorization, network segmentation or encryption (during transport and at rest).
  • Identify and implement improvements across our infrastructure stack. More infrastructure as code, better alerting, reduction of cloud cost.

DockerPythonCloud ComputingGCPKubernetesGrafanaCI/CDLinuxDevOpsTerraformAnsible

Posted 5 days ago
Apply

Related Articles

Posted about 1 month ago

How to Overcome Burnout While Working Remotely: Practical Strategies for Recovery

Burnout is a silent epidemic among remote workers. The blurred lines between work and home life, coupled with the pressure to always be “on,” can leave even the most dedicated professionals feeling drained. But burnout doesn’t have to define your remote work experience. With the right strategies, you can recover, recharge, and prevent future episodes. Here’s how.



Posted 6 days ago

Top 10 Skills to Become a Successful Remote Worker by 2025

Remote work is here to stay, and by 2025, the competition for remote jobs will be tougher than ever. To stand out, you need more than just basic skills. Employers want people who can adapt, communicate well, and stay productive without constant supervision. Here’s a simple guide to the top 10 skills that will make you a top candidate for remote jobs in the near future.

Posted 9 months ago

Google is gearing up to expand its remote job listings, promising more opportunities across various departments and regions. Find out how this move can benefit job seekers and impact the market.

Posted 10 months ago

Read about the recent updates in remote work policies by major companies, the latest tools enhancing remote work productivity, and predictive statistics for remote work in 2024.

Posted 10 months ago

In-depth analysis of the tech layoffs in 2024, covering the reasons behind the layoffs, comparisons to previous years, immediate impacts, statistics, and the influence on the remote job market. Discover how startups and large tech companies are adapting, and learn strategies for navigating the new dynamics of the remote job market.