Apply

Infrastructure Software Engineer

Posted about 5 hours agoViewed

View full description

💎 Seniority level: Junior, 1-3 years

🔍 Industry: Software Development

🏢 Company: Baseten👥 11-50💰 $40,000,000 Series B about 1 year agoDeveloper ToolsArtificial Intelligence (AI)Machine LearningSoftware EngineeringSoftware

⏳ Experience: 1-3 years

Requirements:
  • Bachelor's degree or higher in Computer Science or related field
  • 1-3 years experience in software engineering or infrastructure
  • Proficient coding abilities in one or more popular programming or scripting languages; Go proficiency is a plus
  • Working knowledge of Kubernetes and containerization
  • Basic understanding of machine learning concepts and model serving
  • Familiarity with distributed systems concepts
  • Experience with basic monitoring and logging tools
  • Interest in ML/AI infrastructure and willingness to learn
  • Strong collaboration and communication skills
Responsibilities:
  • Develop infrastructure components for our ML inference platform using Python and Go
  • Implement and maintain Kubernetes deployments for model serving
  • Contribute to our inference orchestration layer for model deployments
  • Build and enhance monitoring systems for model performance metrics
  • Implement efficient resource management solutions for ML workloads
  • Support infrastructure automation to improve ML deployment workflows
  • Work closely with team members to implement technical solutions
  • Help balance performance optimization with system reliability
  • Participate in technical discussions around infrastructure improvements
  • Learn and apply infrastructure best practices
Apply

Related Jobs

Apply

💸 200000.0 - 250000.0 USD per year

🔍 Software Development

  • 5+ years experience building production infrastructure systems
  • Expert-level proficiency in Go, with Python experience a plus
  • Deep expertise with Kubernetes in production environments
  • Extensive experience with major cloud providers (AWS, GCP) and neo-cloud providers (Crusoe, DigitalOcean, Nebius) a plus.
  • Advanced understanding of distributed systems concepts and performance tuning
  • Proven experience designing observability systems
  • Track record of leading technical initiatives and mentoring engineers
  • Experience with ML/AI workloads and MLOps platforms highly valued
  • Design and architect scalable infrastructure systems for our ML inference platform
  • Lead optimization of Kubernetes deployments for efficient, cost-effective model serving
  • Drive enhancements to our inference orchestration layer for complex model deployments
  • Define monitoring strategies for model performance, latency, and resource utilization
  • Develop advanced solutions for GPU capacity management and throughput optimization
  • Establish infrastructure automation standards to streamline ML deployment workflows
  • Partner with other engineers to translate complex inference requirements into technical solutions
  • Make critical architectural decisions balancing performance with system reliability
  • Lead technical discussions and mentor junior engineers on infrastructure best practices
  • Contribute to long-term technical strategy and infrastructure roadmap
Posted about 5 hours ago
Apply
Apply

📍 United States

🔍 AI

🏢 Company: Worth AI👥 11-50💰 $12,000,000 Seed over 1 year agoArtificial Intelligence (AI)Business IntelligenceRisk ManagementFinTech

  • Bachelor's degree in Computer Science, Software Engineering, or a related field.
  • Proven experience as a Software Engineer, with a focus on infrastructure development and operations.
  • Strong programming skills in languages such as Python, Javascript, or Go.
  • Experience with cloud platforms (preferably AWS) and cost optimization strategies.
  • Familiarity with container orchestration (e.g., Kubernetes, Docker).
  • Expertise in Infrastructure as Code (IaC) tools, particularly Terraform (AWS CDK is a plus).
  • Design, scale, and maintain infrastructure to support Big Data workloads and real-time streaming systems such as Apache Spark, Hadoop, and Kafka.
  • Understanding of networking concepts, protocols, and security practices.
  • Proficiency in source control systems, especially Git.
  • Experience with CI/CD tools such as GitHub Actions and ArgoCD.
  • Familiarity with observability tools (Datadog, New Relic, etc.) for monitoring and logging.
  • Excellent problem-solving skills and the ability to work in a collaborative environment.
  • Strong communication skills to effectively share knowledge with team members.
  • Experience in the Risk, Underwriting, and/or Payments Industry is a plus.
  • Design and develop cloud infrastructure components and services to support our AI-driven platforms.
  • Collaborate with software engineers to integrate applications with underlying infrastructure.
  • Automate deployment processes and infrastructure management using Infrastructure as Code (IaC) practices.
  • Implement monitoring and logging strategies to optimize system performance and availability.
  • Optimize infrastructure for cost efficiency, ensuring resources are utilized effectively without compromising performance.
  • Coordinate with security teams to ensure the infrastructure is compliant with best practices and standards.
  • Troubleshoot and resolve infrastructure-related issues efficiently.
  • Continuously evaluate, recommend, and implement changes to improve system reliability and performance.
  • Maintain documentation for infrastructure services and processes.
  • Support on-call rotation as needed for critical infrastructure issues.
  • Other Duties as assigned

AWSDockerPythonSQLApache HadoopCloud ComputingGitHadoopJavascriptKafkaKubernetesAlgorithmsApache KafkaData StructuresGoCI/CDRESTful APIsLinuxDevOpsTerraformMicroservicesNetworkingSoftware Engineering

Posted 12 days ago
Apply
Apply

📍 United States

🧭 Full-Time

💸 133450.0 - 232000.0 USD per year

🔍 Software Development

🏢 Company: ClickHouse👥 101-250💰 Series B over 2 years agoDatabaseArtificial Intelligence (AI)Big DataAnalyticsSoftware

  • 5+ years experience in software development
  • Experience with AWS, Azure, or GCP
  • Familiarity with infrastructure-as-code tools
  • Knowledge of Kubernetes and microservices
  • Experience with security principles and network protocols
  • Architect and build distributed infrastructure
  • Build a cloud-native platform and automate resource management
  • Work with core database and security teams
  • Improve reliability and scalability of services
  • Design and build security components
  • Enhance performance and cost efficiency

AWSCybersecurityGCPJavaKubernetesC++AzureGoTerraformNetworking

Posted 27 days ago
Apply
Apply

📍 Bay Area, NYC

🧭 Full-Time

💸 125000.0 - 225000.0 USD per year

🔍 AI observability and evaluation

🏢 Company: Arize AI👥 51-100💰 $38,000,000 Series B over 2 years agoArtificial Intelligence (AI)Machine LearningInformation TechnologySoftware

  • 5+ years of experience building infrastructure and developer tools.
  • A focus on user needs rather than technology preferences.
  • Proven track record of improving developer productivity with pragmatic solutions.
  • Strong empathy for engineering teams' challenges and ability to prioritize impactful solutions.
  • Working knowledge of Kubernetes, Terraform, and Bazel.
  • Partner with engineering and security teams to architect and scale infrastructure.
  • Design and develop infrastructure for AI fine-tuning workloads.
  • Create best-in-class tooling for internal systems observability and security.
  • Lead system optimization initiatives including capacity planning and performance tuning.
  • Maintain productivity tools that enhance engineering velocity.
  • Optimize infrastructure costs while ensuring high performance.
  • Drive technical decisions impacting the entire infrastructure stack.

KubernetesTerraform

Posted 3 months ago
Apply

Related Articles

Posted 10 days ago

Why remote work is such a nice opportunity?

Why is remote work so nice? Let's try to see!

Posted 7 months ago

Insights into the evolving landscape of remote work in 2024 reveal the importance of certifications and continuous learning. This article breaks down emerging trends, sought-after certifications, and provides practical solutions for enhancing your employability and expertise. What skills will be essential for remote job seekers, and how can you navigate this dynamic market to secure your dream role?

Posted 7 months ago

Explore the challenges and strategies of maintaining work-life balance while working remotely. Learn about unique aspects of remote work, associated challenges, historical context, and effective strategies to separate work and personal life.

Posted 7 months ago

Google is gearing up to expand its remote job listings, promising more opportunities across various departments and regions. Find out how this move can benefit job seekers and impact the market.

Posted 7 months ago

Learn about the importance of pre-onboarding preparation for remote employees, including checklist creation, documentation, tools and equipment setup, communication plans, and feedback strategies. Discover how proactive pre-onboarding can enhance job performance, increase retention rates, and foster a sense of belonging from day one.