Apply

Infrastructure Software Engineer

Posted 3 months agoViewed

View full description

πŸ’Ž Seniority level: Senior, 5+ years

πŸ“ Location: Bay Area, NYC

πŸ’Έ Salary: 125000.0 - 225000.0 USD per year

πŸ” Industry: AI observability and evaluation

🏒 Company: Arize AIπŸ‘₯ 51-100πŸ’° $38,000,000 Series B over 2 years agoArtificial Intelligence (AI)Machine LearningInformation TechnologySoftware

⏳ Experience: 5+ years

πŸͺ„ Skills: KubernetesTerraform

Requirements:
  • 5+ years of experience building infrastructure and developer tools.
  • A focus on user needs rather than technology preferences.
  • Proven track record of improving developer productivity with pragmatic solutions.
  • Strong empathy for engineering teams' challenges and ability to prioritize impactful solutions.
  • Working knowledge of Kubernetes, Terraform, and Bazel.
Responsibilities:
  • Partner with engineering and security teams to architect and scale infrastructure.
  • Design and develop infrastructure for AI fine-tuning workloads.
  • Create best-in-class tooling for internal systems observability and security.
  • Lead system optimization initiatives including capacity planning and performance tuning.
  • Maintain productivity tools that enhance engineering velocity.
  • Optimize infrastructure costs while ensuring high performance.
  • Drive technical decisions impacting the entire infrastructure stack.
Apply

Related Jobs

Apply

πŸ“ United States

πŸ” AI

🏒 Company: Worth AIπŸ‘₯ 11-50πŸ’° $12,000,000 Seed over 1 year agoArtificial Intelligence (AI)Business IntelligenceRisk ManagementFinTech

  • Bachelor's degree in Computer Science, Software Engineering, or a related field.
  • Proven experience as a Software Engineer, with a focus on infrastructure development and operations.
  • Strong programming skills in languages such as Python, Javascript, or Go.
  • Experience with cloud platforms (preferably AWS) and cost optimization strategies.
  • Familiarity with container orchestration (e.g., Kubernetes, Docker).
  • Expertise in Infrastructure as Code (IaC) tools, particularly Terraform (AWS CDK is a plus).
  • Design, scale, and maintain infrastructure to support Big Data workloads and real-time streaming systems such as Apache Spark, Hadoop, and Kafka.
  • Understanding of networking concepts, protocols, and security practices.
  • Proficiency in source control systems, especially Git.
  • Experience with CI/CD tools such as GitHub Actions and ArgoCD.
  • Familiarity with observability tools (Datadog, New Relic, etc.) for monitoring and logging.
  • Excellent problem-solving skills and the ability to work in a collaborative environment.
  • Strong communication skills to effectively share knowledge with team members.
  • Experience in the Risk, Underwriting, and/or Payments Industry is a plus.
  • Design and develop cloud infrastructure components and services to support our AI-driven platforms.
  • Collaborate with software engineers to integrate applications with underlying infrastructure.
  • Automate deployment processes and infrastructure management using Infrastructure as Code (IaC) practices.
  • Implement monitoring and logging strategies to optimize system performance and availability.
  • Optimize infrastructure for cost efficiency, ensuring resources are utilized effectively without compromising performance.
  • Coordinate with security teams to ensure the infrastructure is compliant with best practices and standards.
  • Troubleshoot and resolve infrastructure-related issues efficiently.
  • Continuously evaluate, recommend, and implement changes to improve system reliability and performance.
  • Maintain documentation for infrastructure services and processes.
  • Support on-call rotation as needed for critical infrastructure issues.
  • Other Duties as assigned

AWSDockerPythonSQLApache HadoopCloud ComputingGitHadoopJavascriptKafkaKubernetesAlgorithmsApache KafkaData StructuresGoCI/CDRESTful APIsLinuxDevOpsTerraformMicroservicesNetworkingSoftware Engineering

Posted 12 days ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ’Έ 133450.0 - 232000.0 USD per year

πŸ” Software Development

🏒 Company: ClickHouseπŸ‘₯ 101-250πŸ’° Series B over 2 years agoDatabaseArtificial Intelligence (AI)Big DataAnalyticsSoftware

  • 5+ years experience in software development
  • Experience with AWS, Azure, or GCP
  • Familiarity with infrastructure-as-code tools
  • Knowledge of Kubernetes and microservices
  • Experience with security principles and network protocols
  • Architect and build distributed infrastructure
  • Build a cloud-native platform and automate resource management
  • Work with core database and security teams
  • Improve reliability and scalability of services
  • Design and build security components
  • Enhance performance and cost efficiency

AWSCybersecurityGCPJavaKubernetesC++AzureGoTerraformNetworking

Posted 27 days ago
Apply
Apply

πŸ“ Bay Area, NYC

πŸ’Έ 160000.0 - 210000.0 USD per year

πŸ” AI observability and evaluation

🏒 Company: Arize AIπŸ‘₯ 51-100πŸ’° $38,000,000 Series B over 2 years agoArtificial Intelligence (AI)Machine LearningInformation TechnologySoftware

  • 5+ years of experience in infrastructure engineering, with significant exposure to enterprise on-premises deployments.
  • Deep expertise in container orchestration platforms (especially Kubernetes) and infrastructure-as-code tools like Terraform.
  • Strong background in enterprise networking, security requirements, and compliance considerations.
  • Experience working with multiple cloud providers (AWS, GCP, Azure) and understanding how to adapt cloud-native architectures for on-premises environments.
  • Track record of building and maintaining automation tools that improve deployment reliability and reduce operational overhead.
  • Excellent communication skills and ability to work directly with customers to understand and address their infrastructure needs.
  • Experience with Python, Golang, and JavaScript is highly desirable.
  • Design and implement robust deployment architectures that enable Arize to run efficiently in diverse customer environments, from air-gapped systems to hybrid cloud setups.
  • Collaborate with customers to understand their infrastructure requirements and create tailored deployment solutions that meet their security, compliance, and performance needs.
  • Develop and maintain automation tools and frameworks that streamline deployment, upgrade, and maintenance processes for on-premises installations.
  • Partner with the product team to ensure new features are compatible with on-premises deployments and create efficient release mechanisms.
  • Create comprehensive monitoring and observability solutions that enable customers to maintain and troubleshoot their Arize installations effectively.

AWSPythonGCPJavascriptKubernetesAzureTerraform

Posted about 1 month ago
Apply