Software Engineer (Applied AI)

New

CentralizeEnterprise sales

This role is open to remote candidates in the US, with a strong preference for candidates based in or willing to relocate to San Francisco or New York City.Full-Time

Salary170,000 - 220,000 USD per year

Apply NowOpens the employer's application page

Job Details

Languages: Excellent written and verbal English communication.
Required Skills: AWSPythonMachine LearningTypeScriptData sciencePostgresPrompt EngineeringLLM

Requirements

Demonstrated experience shipping LLM-powered products to production with real customers and real evals.
Demonstrated experience training, fine-tuning, or shipping classical ML models in production. Ranking, classification, embeddings, retrieval.
Strong fluency with multi-agent systems, tool use, function calling, RAG, and the orchestration patterns that make them reliable.
Real expertise in evaluation across both LLM and ML systems.
Strong backend engineering fundamentals. Python is required; familiarity with TypeScript, Postgres, queues, and AWS is a major plus.
Sharp instinct for cost, latency, and reliability tradeoffs across the AI stack.
Excellent written and verbal English communication.
Demonstrated ability to operate independently.
Background as an MLE who has flexed into LLM application work, or as an LLM engineer with deep MLE foundations.
Experience fine-tuning open or closed models for specific tasks, including data curation, training infrastructure, and post-training evaluation.
Experience with multi-agent orchestration frameworks (LangGraph, Mastra, custom orchestrators) at production scale.
Experience with classical ML systems in production: ranking models, embedding models, entity resolution, recommendation systems.
Open-source contributions, technical blog posts, or papers on applied AI or ML work.
Direct exposure to enterprise sales cycles or B2B SaaS products.

Responsibilities

Design and ship multi-agent systems that handle the hardest reasoning problems in the product: stakeholder mapping, account research, deal health analysis, conversation intelligence.
Own the LLM pipelines end to end: prompt engineering, retrieval, tool use, structured outputs, guardrails, and the orchestration glue that ties it all together.
Build and maintain the ML and DS work that LLMs aren't the right tool for: ranking models, classifiers, embedding models, entity resolution across messy CRM data, signal extraction from sales conversations.
Fine-tune models when frontier APIs aren't enough. Curate training data, design eval sets, run experiments, and ship the results to production.
Build the eval infrastructure that lets us ship AI features without breaking them. LLM-as-judge, human-in-the-loop, classical metrics for ML systems, regression suites.
Own the data flywheel. The product generates rich signal from customer conversations, deal outcomes, and stakeholder interactions. Turn that into training data, eval data, and the feedback loops that compound over time.
Stay on the frontier. New models drop monthly. You'll know which ones move the needle for our use cases, when to switch, and when to wait.
Talk to customers. Sit on calls, see what's actually broken, and translate that into the AI capabilities that matter.

View Full Description & ApplyYou'll be redirected to the employer's site