Apply

Infrastructure Engineer

Posted 1 day agoViewed

View full description

πŸ” Industry: Software Development

🏒 Company: ParadexπŸ‘₯ 1-10CryptocurrencyFinancial ServicesPaymentsTrading Platform

πŸ—£οΈ Languages: English

Requirements:
  • Deep experience in site reliability engineering within multi-datacenter cloud environments with high demands on uptime, performance, and security.
  • Strong background in AWS, Kubernetes, docker, terraform and software engineering, with the ability to adopt and integrate open-source and commercial technologies.
  • Proven experience in leading teams and projects, with the ability to work closely with senior management to prioritize and allocate resources effectively.
  • Experience with cloud infrastructure, low-latency systems, HashiCorp tools, and multi-cloud environments.
  • Expertise in securing complex cloud network architectures.
Responsibilities:
  • Own the site reliability process from design to deployment, ensuring our systems are robust, scalable, and secure.
  • Guide the software engineering team on best practices and work together to evolve our engineering processes.
  • Manage and optimize multiple Kubernetes clusters across environments, and build core services that drive our platform.
  • Resolve production incidents, ensuring minimal downtime and high performance.
  • Help build and lead a high-velocity, adaptable infrastructure engineering team.
Apply

Related Jobs

Apply

πŸ“ Bungie-approved Remote Locations

🧭 Full-Time

πŸ’Έ 141000.0 - 167000.0 USD per year

πŸ” Software Development

🏒 Company: BungieπŸ‘₯ 501-1000πŸ’° $100,000,000 Corporate almost 7 years agoπŸ«‚ Last layoff 4 months agoVideo GamesDigital MediaMedia and Entertainment

  • Experience in datacenter and live production best practices
  • Experience working in live high availability customer facing production environments
  • Networking and compute at scale, experience with management, security best practices, automation and tooling, performance tuning, and hardware troubleshooting
  • Managing, participating, and collaborating in complex projects with multiple team members, many stakeholders and department or team leads.
  • Understanding of virtualized environments in VMware
  • Technical proficiency in most of these areas and products: network troubleshooting, OSPF, IPSEC, BGP, VRRP, VXLan, iSCSI, RAID
  • Working knowledge of core Internet protocols and services (e.g., IP, TCP, UDP, NTP, DNS, HTTP, SMTP, SSH, SMB, NFS, syslog) required
  • Ability to work independently to solve challenging problems and provide recommendations in a timely manner
  • Ability to design and engineer environments at scale, including redundancy planning, network connectivity design, performance troubleshooting
  • Manage network hardware including switches, routers, and firewalls (Juniper, PaloAlto)
  • Administer and maintain VMware virtualization environments and configure and manage virtual networking components
  • Support customer-facing datacenter infrastructure and compute
  • Troubleshoot and resolve complex network and infrastructure issues, and work with support on escalations
  • Assist with ongoing training for our IT Support team and cross training for LiveOps team members
  • Research, selection, planning, and implementation of new technologies to support scaling of solutions to meet business needs
  • Collaborate with InfoSec teams to ensure infrastructure and designs align with security requirements
  • Develop tools and scripts to assist with routine maintenance tasks
  • Support for all related products and triage any issues that come up
  • Create and update technical documentation, including network diagrams, policies, and procedures
  • Provide after-hours escalation assistance as the highest escalation point for sub-discipline related issues
Posted about 12 hours ago
Apply
Apply
πŸ”₯ Infrastructure Engineer
Posted about 23 hours ago

πŸ“ Ireland, UK, Sweden, the Netherlands, Germany, Spain, Bulgaria, Denmark, Finland, France, Italy, Poland

πŸ” Software Development

🏒 Company: MongoDBπŸ‘₯ 1001-5000πŸ’° Post-IPO Equity about 7 years agoDatabaseOpen SourceCloud ComputingSaaSSoftware

  • Pragmatic, detail-oriented, self-motivated, and understands the benefits of collaboration
  • Takes a software-driven approach to solving problems and routinely uses git to track progress
  • Familiar with software engineering principles, dependency injection, composition, and test driven development
  • Experience designing/implementing medium/large scale software projects (preferably with Go)
  • Familiar with standard authentication protocols (e.g OAuth)
  • Familiar with the development of web services and/or Kubernetes controllers
  • Experienced performing deep technical analysis and fixing applications, systems, and networks
  • Experience working with Linux, command line, and TCP/IP networking skills
  • Experience working with container runtime toolchains (containerd, docker, podman)
  • Solid knowledge of cloud infrastructure (preferably AWS)
  • Experience with configuration management tools and managing infrastructure through code
  • Familiar with how to use CI/CD workflows and tooling to deploy production services
  • Experience running containers in a production environment, preferably Kubernetes based
  • Experience with observability concepts and tooling, metrics, logging, traces, Prometheus, Grafana, OpenTelemetry
  • Has practical knowledge of delivering production level services with SLI/SLOs and understands how to measure, track and adjust them
  • Work with engineering teams across MongoDB to investigate gaps and limitations in existing development workflows and understand new infrastructure and platform requirements
  • Design self-service platform services and developer tooling that focuses on reliability, usability, and provides the appropriate level of abstraction from cloud infrastructure
  • Regularly write and review automation, configuration management, and application code
  • Author and review functional specifications and scoping documents for large platform projects and services
  • Own and operate much of the internal development platform that runs MongoDB
  • Work on a distributed team that frequently interacts with remote engineers across multiple time zones (primarily PST/EST/GMT)

AWSBackend DevelopmentDockerCloud ComputingGitKubernetesOAuthGoGrafanaPrometheusCI/CDRESTful APIsLinuxTerraformMicroservicesScriptingSoftware Engineering

Posted about 23 hours ago
Apply
Apply

πŸ“ United States, Hong Kong, United Kingdom

🧭 Full-Time

πŸ’Έ 175000.0 - 245000.0 USD per year

πŸ” Software Development

🏒 Company: OntraπŸ‘₯ 101-250πŸ’° $200,000,000 Series B over 3 years agoLegal TechDocument ManagementInformation TechnologyLegalSoftware

  • 6+ years setting up and maintaining tools like Jenkins, Travis CI, CircleCI, or GitHub Actions
  • 6+ years of experience using tools such as Terraform, CloudFormation, or Pulumi
  • A general background in using programming languages such as Python, Ruby, Go, and/or Javascript
  • 6+ years of experience provisioning, configuring, and optimizing cloud resources (we use AWS)
  • A background in working with cross-functional teams to facilitate an efficient software delivery pipeline and align on projects and goals
  • Use Kubernetes and Docker to create, manage, and scale containerized applications
  • Implement and manage monitoring and logging tools such as DataDog and Prometheus to ensure the health and performance of applications and infrastructure
  • Manage GitHub and implement branching strategies and code reviews to ensure code quality
  • Gather and analyze requirements from various stakeholders to define infrastructure and deployment needs.
  • Maintain clear documentation for infrastructure and processes to ensure that team members can understand and reproduce the environment as needed

AWSDockerPythonBashCloud ComputingGitJenkinsKubernetesPrometheusCI/CDTerraform

Posted 6 days ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ’Έ 185500.0 - 293750.0 USD per year

πŸ” Software Development

🏒 Company: UpworkπŸ‘₯ 501-1000πŸ’° about 8 years agoπŸ«‚ Last layoff almost 2 years agoMarketplaceFreelanceCopywritingPeer to Peer

  • Strong technical expertise in designing and building scalable ML infrastructure.
  • Experience with distributed systems and cloud-based ML platforms.
  • Proficiency in programming languages such as Python, Java, or Scala.
  • Deep understanding of ML workflows, including data pipelines, model training, and deployment.
  • Passion for innovation and eagerness to implement the latest advancements in ML infrastructure.
  • Strong problem-solving skills and ability to optimize complex systems for performance and reliability.
  • Collaborative mindset with excellent communication skills to work across teams.
  • Ability to thrive in a fast-paced, dynamic environment with evolving technical challenges.
  • Design, implement, and optimize distributed systems and infrastructure components to support large-scale machine learning workflows, including data ingestion, feature engineering, model training, and serving.
  • Develop and maintain frameworks, libraries, and tools that streamline the end-to-end machine learning lifecycle, from data preparation and experimentation to model deployment and monitoring.
  • Architect and implement highly available, fault-tolerant, and secure systems that meet the performance and scalability requirements of production machine learning workloads.
  • Collaborate with machine learning researchers and data scientists to understand their requirements and translate them into scalable and efficient software solutions.
  • Stay current with advancements in machine learning infrastructure, distributed computing, and cloud technologies, integrating them into our platform to drive innovation.
  • Mentor junior engineers, conduct code reviews, and uphold engineering best practices to ensure the delivery of high-quality software solutions.

AWSDockerLeadershipPythonSoftware DevelopmentSQLCloud ComputingJavaKubeflowKubernetesMachine LearningMLFlowAlgorithmsData engineeringData StructuresREST APICollaborationCI/CDProblem SolvingMentoringLinuxDevOpsTerraformExcellent communication skillsScalaData modeling

Posted 6 days ago
Apply
Apply

πŸ“ Australia, Japan

🧭 Full-Time

πŸ” Software Development

🏒 Company: LaravelπŸ‘₯ 11-50πŸ’° $57,000,000 Series A 6 months agoDeveloper ToolsWeb DevelopmentEnterprise SoftwareSoftware

  • Experience with managing and scaling Apache Kafka.
  • Mastery of AWS and potentially other cloud providers.
  • Experience with Kubernetes (K8s) and container orchestration as a whole.
  • Experienced and opinionated about logging, monitoring, alerting tooling, and processes.
  • Experienced and comfortable with using IaC in a team, enforcing sanity checks and policies to help us scale.
  • Knowledge of network protocols, load balancing, caching, and DNS management.
  • Work together with a world class team, touching the developer experience of hundreds of thousands of developers around the world.
  • Work on our most ambitious projects yet, the recently announced Laravel Nightwatch and Laravel Cloud platforms.
  • Spend time researching the best solutions. We want to make sure we do the right things right.
  • Value the collaborative process, we often pair program or have quick huddles to tackle tricky problems and like to come up with solutions together.
  • Enjoy programming(!). Even though we're often working with cloud providers such as AWS, we also sometimes need to build custom tooling and integrations and we enjoy doing that ourselves.
  • Documentation is king, especially in an all-remote company. We pride ourselves in documenting for ourselves and for our users.
  • Flexible hours and minimum bureaucracy, allowing you the freedom to get your best work done whichever way you prefer.

AWSKubernetesApache KafkaClickhouseGoCI/CDLinuxDevOpsTerraformMicroservices

Posted 8 days ago
Apply
Apply

πŸ“ United States, CA

πŸ’Έ 110000.0 - 180000.0 USD per year

πŸ” Cybersecurity

🏒 Company: crowdstrikecareers

  • Kubernetes
  • Python, Bash
  • Experience within the broader Cloud Native infrastructure stack including at least some tools such as Helm, Rook, FluxCD, Argo, CLusterAPI, Cilium
  • Experience with large scale, business-critical Linux environments
  • Experience operating in clouds, preferably Amazon Web Service
  • Experience operating in custom data centers
  • Familiarity and ability to manage on call shifts
  • Design and implement scalable hybrid multi-cloud Kubernetes platform solutions using your deep expertise in cloud-native infrastructure architecture and operations at scale
  • Optimize and maintain high reliability of large-scale infrastructure platform systems
  • Evaluate and integrate open-source technologies into existing systems workflows
  • Provide insights and technical direction for the continued evolution of Kubernetes infrastructure services
  • Mentor and help develop less-senior engineers to drive collaborative technical decision-making for infrastructure choices

AWSPythonBashCloud ComputingKubernetesCI/CDLinuxTerraformAnsible

Posted 10 days ago
Apply
Apply

πŸ“ Glasgow, Scotland, United Kingdom

πŸ” Business Technology Solutions

🏒 Company: Sword Group

  • Proven work experience as an Infrastructure Engineer or similar role.
  • Microsoft AD Management
  • Microsoft Server Build development
  • Server Hardware Physical Deployment
  • VMWare deployment & management
  • Dell VxRail or equivalent Hyper converged Infrastructure management
  • Deployment of Backup, Anti-virus, IDS, SIEM technologies
  • Hybrid Cloud Migration
  • Networking - SD-WAN, Security
  • Vendor Management
  • Plan technical elements of deployment plan
  • Execute Project Infrastructure tasks
  • Supply best practice advice on deployment of Infrastructure components within projects
  • Liaise with central technical authorities to verify delivery
  • Work in conjunction with Site teams to deploy Infrastructure
  • Testing and implementing new systems and infrastructure
  • Stakeholder engagement and coordination of onsite teams
Posted 13 days ago
Apply
Apply

πŸ“ EU, UK

🧭 Full-Time

πŸ” Software Development

  • Experience with AWS services
  • Experience with GCP
  • Experience with Kubernetes
  • Experience with Terraform
  • Experience with modern CI/CD tools
Help build, maintain, and optimize our cloud infrastructure

AWSBackend DevelopmentGCPKubernetesCI/CDTerraform

Posted 13 days ago
Apply
Apply

πŸ“ ANYWHERE in European time zones

🧭 Full-Time

πŸ” Software Development

🏒 Company: Primer.io

  • Strong experience with a cloud provider (AWS preferred but we’re open to Azure and GCP)
  • Experience designing and building unified observability platforms that enable companies to use metrics, logs, and traces to determine quickly if their application or service is operating as desired.
  • Use of Terraform as infrastructure as code
  • Comfortable using Python/Golang as a programming language
  • Experience around building and maintaining production-grade Kubernetes clusters
  • Knowledge of security best practices and the ability to implement security controls at the infrastructure level
  • Experience with monitoring and logging tools like DataDog or Grafana’s observability stack(Prometheus, Tempo, Loki, Grafana)
  • Familiarity with the open standard OpenTelemetry
  • Be a focal point for observability roadmap and best practices
  • Configure and maintain Observability solutions like DataDog, ensuring its scalability, reliability, and alignment with our operational objectives.
  • Collaborate with multiple product teams and respective owners to design observability solutions and building alerting strategies as needed
  • Building custom metrics and features to enhance Primer’s observability
  • Infrastructure as Code development
  • Writing processes and documentation for system design, troubleshooting and maintenance

AWSPythonCloud ComputingGitKubernetesData StructuresGrafanaCI/CDRESTful APIsLinuxTerraform

Posted 17 days ago
Apply
Apply

πŸ“ London

🧭 Full-Time

πŸ” AI

🏒 Company: CohereπŸ‘₯ 251-500πŸ’° $169,509,482 Grant 3 months agoπŸ«‚ Last layoff 8 months agoArtificial Intelligence (AI)Machine LearningGenerative AINatural Language Processing

  • Extremely strong software engineering skills.
  • Experienced in HPC environment and auto-scaling.
  • Experienced with container orchestration platforms (e.g. docker, kubernetes).
  • Experience using large-scale distributed training strategies.
  • Familiarity with autoregressive sequence models, such as Transformers.
  • Design and write high-performant and scalable code, that allows fast training and sampling.
  • Read latest papers, implement those ideas that are relevant and run experiments.
  • Research, implement, and experiment with ideas on our super compute and data infrastructure.
  • Learn from and work with the best researchers in the field.

DockerPythonCloud ComputingKubernetesMachine LearningPyTorchSoftware ArchitectureAlgorithmsData engineeringData scienceData StructuresREST APITensorflowCI/CDProblem SolvingLinuxDevOpsResearch skillsSoftware Engineering

Posted 19 days ago
Apply

Related Articles

Posted 12 days ago

Why remote work is such a nice opportunity?

Why is remote work so nice? Let's try to see!

Posted 7 months ago

Insights into the evolving landscape of remote work in 2024 reveal the importance of certifications and continuous learning. This article breaks down emerging trends, sought-after certifications, and provides practical solutions for enhancing your employability and expertise. What skills will be essential for remote job seekers, and how can you navigate this dynamic market to secure your dream role?

Posted 7 months ago

Explore the challenges and strategies of maintaining work-life balance while working remotely. Learn about unique aspects of remote work, associated challenges, historical context, and effective strategies to separate work and personal life.

Posted 7 months ago

Google is gearing up to expand its remote job listings, promising more opportunities across various departments and regions. Find out how this move can benefit job seekers and impact the market.

Posted 7 months ago

Learn about the importance of pre-onboarding preparation for remote employees, including checklist creation, documentation, tools and equipment setup, communication plans, and feedback strategies. Discover how proactive pre-onboarding can enhance job performance, increase retention rates, and foster a sense of belonging from day one.