Senior Software Engineer II - Applied AI and Evaluations
USFull-TimeSenior
Salary175000 - 245000 USD per year
Apply NowOpens the employer's application page
Job Details
- Experience
- 8+ years
- Required Skills
- PythonMLFlowDatabricksPrompt Engineering
Requirements
- 8+ years of software engineering experience
- 2+ years working directly with LLMs in production
- Deep, hands-on experience with prompt engineering
- Deep, hands-on experience with context engineering
- Strong working knowledge of RAG architectures
- Experience building or extending LLM evaluation frameworks
- Fluency in agent system design
- Strong Python skills
- Comfortable working in data-heavy environments (Databricks, Delta tables, or equivalent)
- Ability to communicate complex quality findings (written and verbal) to both technical and non-technical stakeholders
- Strong cross-functional judgment
- A bias for clarity in ambiguous situations
- BS or MS in Computer Science, a related field, or equivalent industry experience
- Experience with MLflow or similar experiment tracking platforms (Strong Plus)
- Familiarity with CI-integrated evaluation pipelines (Strong Plus)
- Experience with multi-agent orchestration frameworks (Strong Plus)
- Prior work in an Applied AI or LLMOps function within a product company (Strong Plus)
Responsibilities
- Own agent quality end-to-end: diagnosis, improvement, and validation across SmartAssist's orchestrator and subagents
- Identify failure modes across quality dimensions factual accuracy, completeness, tone, actionability, and latency and prioritize what to fix
- Drive quality improvements through prompt engineering, context engineering, and RAG retrieval tuning
- Extend and mature our evaluation framework: scorers, golden datasets, regression gates, and online evaluation for production traffic
- Close the feedback loop ensure that every change has a measurable, attributable quality signal
- Collaborate with our Agent Architecture lead to distinguish quality problems that require prompt/context solutions from those that require structural fixes
- Establish repeatable methodology that scales beyond any single agent or subagent
View Full Description & ApplyYou'll be redirected to the employer's site