Senior AI Systems Engineer - Edge & Inference

New
BrazilFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Required Skills
DockerPythonPyTorchC++TensorflowLinuxNLPComputer Vision

Requirements

  • Strong experience deploying and optimizing machine learning models in production environments
  • Solid expertise with PyTorch and TensorFlow, especially model export workflows
  • Deep understanding of ONNX and ONNX Runtime
  • Hands-on experience with inference servers such as Triton Inference Server
  • Strong knowledge of inference optimization techniques including INT8, FP16, and mixed precision
  • Advanced Python programming skills
  • Intermediate to advanced C++ skills with a focus on performance optimization
  • Experience working in Linux environments, Docker, and containerized infrastructures
  • Background in performance profiling, benchmarking, and system optimization
  • Familiarity with LLMs, vision models, NLP architectures, and multimodal AI systems
  • Experience in at least one of the following domains: GenAI at scale, real-time computer vision, edge AI systems, or large-scale NLP applications
  • Strong analytical thinking, autonomy, and ability to work in complex technical environments
  • Clear communication skills and ability to collaborate with cross-functional teams

Responsibilities

  • Lead the deployment and productionization of AI models in enterprise-grade environments, ensuring stability, scalability, and performance
  • Optimize inference pipelines through quantization, pruning, and tuning techniques to balance accuracy, latency, throughput, and energy consumption
  • Design, implement, and maintain inference services using tools such as Triton Inference Server and ONNX Runtime
  • Integrate AI models with specialized hardware accelerators to maximize execution efficiency
  • Develop monitoring, telemetry, and health-check systems for AI workloads in production
  • Perform profiling, benchmarking, and performance analysis to continuously improve system efficiency
  • Architect and support advanced AI use cases such as LLM serving, retrieval-augmented generation systems, copilots, and real-time video analytics
  • Build APIs and inference services using Python and C++, collaborating closely with data, ML, and engineering teams
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now