AI Research Engineer

New
United StatesFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Languages
English

Requirements

  • Strong experience in AI/ML engineering with a focus on inference optimization, model serving, or AI systems performance.
  • Deep understanding of model deployment architectures and inference frameworks for large-scale AI applications.
  • Expertise in optimizing latency, throughput, scalability, and memory footprint in production AI systems.
  • Hands-on experience with performance monitoring, benchmarking, profiling, and bottleneck analysis.
  • Strong knowledge of advanced AI model architectures, including multi-modal systems and resource-efficient models.
  • Experience building and deploying AI systems across cloud, edge, or low-resource hardware environments.
  • Proficiency in programming languages commonly used in AI infrastructure and optimization workflows.
  • Strong analytical and problem-solving abilities with a research-oriented mindset.
  • Ability to work independently in a highly distributed and fast-moving global environment.
  • Excellent English communication skills and ability to collaborate across technical and non-technical teams.
  • Passion for innovation, experimentation, and scalable AI infrastructure development.

Responsibilities

  • Design, develop, and optimize advanced model serving architectures focused on high throughput, low latency, and efficient memory utilization.
  • Build scalable inference pipelines capable of running across cloud, edge, and resource-constrained environments.
  • Conduct controlled inference experiments in simulated and production environments to evaluate system performance and reliability.
  • Monitor and analyze key performance metrics such as latency, throughput, memory consumption, token response time, and error rates.
  • Develop and maintain benchmarking methodologies and performance validation frameworks for AI inference systems.
  • Identify bottlenecks in serving pipelines, including batch processing inefficiencies, network overhead, and excessive memory usage.
  • Optimize inference frameworks and deployment strategies for scalability, resilience, and operational efficiency.
  • Collaborate with cross-functional engineering and research teams to integrate optimized inference solutions into production environments.
  • Create high-quality testing datasets and deployment scenarios that reflect real-world operational challenges.
  • Continuously improve inference infrastructure through experimentation, iteration, and adoption of cutting-edge AI serving techniques.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now