Senior AI Inference Engineer

Posted about 1 month agoViewed

North America, South AmericaFull-TimeAI, Software Development

Company:Monks

Location:North America, South America, EST, PST

Languages:English

Seniority level:Senior, Significant professional experience (senior level)

Experience:Significant professional experience (senior level)

Skills:

AWSDockerLeadershipPythonSoftware DevelopmentArtificial IntelligenceCloud ComputingKubernetesMachine LearningNLTKNumpyOpenCVSoftware ArchitectureData sciencePandasRESTful APIsDevOpsMicroservices

Requirements:

Significant professional experience (senior level) building and shipping AI/ML systems in production, with strong Python and a modern data/ML stack. Proven track record taking models from notebooks or prototypes into robust, low-latency inference services. Extensive hands-on experience building agentic systems, especially those involving computer vision or multi-modal inputs. Demonstrated experience architecting autonomous agents that can “see” and reason about video content. Practical experience integrating Vision Language Models (e.g., GPT-4o, Gemini Pro Vision, LLaVA) into complex workflows. Familiarity with LLM/agent orchestration frameworks (e.g., LangGraph, AutoGen, Semantic Kernel or equivalents) applied to visual or multi-modal tasks. Strong practical experience with Kubernetes in production. Experience architecting distributed systems on AWS beyond simply provisioning basic instances. Understanding of modern NVIDIA GPU architectures (e.g., Ampere, Hopper, Blackwell) and how to optimize workloads for them. Product-minded and value-driven: able to align technical decisions with business outcomes and ROI. Excellent communication skills, with the ability to explain complex architectures to both CTO-level and non-technical stakeholders and to participate comfortably in client and pre-sales conversations. Self-starter who thrives in ambiguity, enjoys reading source code, and is motivated by solving problems that lack clear existing patterns or documentation. Experience with FFmpeg, GStreamer, NVENC/NVDEC, and modern video codecs; OpenShift or NVIDIA Holoscan for Media; Mojo language; and/or deploying AI systems on edge devices or hybrid/on-prem environments.

Responsibilities:

Architect, implement, and optimize end-to-end AI inference services and agentic pipelines in Python. Design autonomous agents that can interpret, reason about, and act on video and multi-modal content. Integrate Vision Language Models (e.g., GPT-4o, Gemini Pro Vision, LLaVA) into robust, production-grade workflows. Leverage LLM/agent orchestration frameworks (e.g., LangGraph, AutoGen, Semantic Kernel or similar) to coordinate complex visual AI tasks. Deploy and operate services on Kubernetes (and potentially OpenShift or NVIDIA Holoscan), ensuring reliability and scalability under heavy media workloads. Architect distributed systems on AWS, making informed trade-offs across performance, cost, and resilience. Optimize workloads for modern NVIDIA GPU architectures (Ampere, Hopper, Blackwell), focusing on real-time and high-throughput media use cases. Collaborate directly with clients in MEGS, including participating in pre-sales discussions to validate feasibility, shape solutions, and clarify the “why” behind requirements. Create clear architecture diagrams and technical documentation that align both technical and non-technical stakeholders. Provide technical leadership to project teams, guiding implementation to stay true to the intended architecture and product value. Work with video tooling such as FFmpeg, GStreamer, NVENC/NVDEC, and modern codecs (H.264/5), and explore emerging tools such as Mojo or NVIDIA Holoscan for Media. Design and deploy AI solutions to edge devices and on-premise or hybrid clusters.