- Build a scalable self-serve evaluation platform to power our research and development
- Design a Python framework that makes it easy for poolsiders to implement both internal and public benchmarks in a centralized way
- Build and maintain the pipeline that runs distributed evaluations at scale
- Collaborate with modeling and product teams to identify opportunities to improve our experimentation and evaluation tooling
AWSPythonGCP+5 more