Senior AI Systems Engineer - Edge & Inference

New

BrazilFull-TimeSenior

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Strong experience deploying and optimizing machine learning models in production environments
Solid expertise with PyTorch and TensorFlow, especially model export workflows
Deep understanding of ONNX and ONNX Runtime
Hands-on experience with inference servers such as Triton Inference Server
Strong knowledge of inference optimization techniques including INT8, FP16, and mixed precision
Advanced Python programming skills
Intermediate to advanced C++ skills with a focus on performance optimization
Experience working in Linux environments, Docker, and containerized infrastructures
Background in performance profiling, benchmarking, and system optimization
Familiarity with LLMs, vision models, NLP architectures, and multimodal AI systems
Experience in at least one of the following domains: GenAI at scale, real-time computer vision, edge AI systems, or large-scale NLP applications
Strong analytical thinking, autonomy, and ability to work in complex technical environments
Clear communication skills and ability to collaborate with cross-functional teams

Lead the deployment and productionization of AI models in enterprise-grade environments, ensuring stability, scalability, and performance
Optimize inference pipelines through quantization, pruning, and tuning techniques to balance accuracy, latency, throughput, and energy consumption
Design, implement, and maintain inference services using tools such as Triton Inference Server and ONNX Runtime
Integrate AI models with specialized hardware accelerators to maximize execution efficiency
Develop monitoring, telemetry, and health-check systems for AI workloads in production
Perform profiling, benchmarking, and performance analysis to continuously improve system efficiency
Architect and support advanced AI use cases such as LLM serving, retrieval-augmented generation systems, copilots, and real-time video analytics
Build APIs and inference services using Python and C++, collaborating closely with data, ML, and engineering teams

View Full Description & ApplyYou'll be redirected to the employer's site