Apply

Staff Machine Learning Engineer

Posted 10 days agoViewed

View full description

💎 Seniority level: Staff, 10+ years

📍 Location: Canada, Latin America, Alabama, Arizona, California, Colorado, Connecticut, Florida, Georgia, Illinois, Indiana, Massachusetts, Minnesota, Nevada, New Jersey, New York, North Carolina, Oregon, Pennsylvania, Rhode Island, Tennessee, Texas, Utah, Virginia, Washington

💸 Salary: 140000.0 - 220000.0 USD per year

🔍 Industry: Retail AI

🏢 Company: Lily AI

⏳ Experience: 10+ years

🪄 Skills: PythonKubeflowKubernetesMachine LearningMLFlowPyTorchAzure

Requirements:
  • 10+ years in building large-scale machine learning solutions and ML Ops practices.
  • Working with LLM APIs and serving LLMs in-house at scale.
  • Proficiency in Kubernetes, RDBMS, and API-driven development.
  • Experience in model serving in low-latency, high-throughput use cases.
  • Knowledge of observability, data pipeline design, service scaling, and cost optimization.
  • Strong emphasis on code hygiene, including review, documentation, testing, and CI/CD practices.
  • Proficiency in Python and PyTorch.
  • Extensive experience with the scientific Python ecosystem.
  • Proficiency in cloud-native application development.
  • Action-oriented with the ability to articulate complex concepts.
Responsibilities:
  • Define, design, and maintain scalable Machine Learning data pipelines, training infrastructure, and inference systems.
  • Optimize, benchmark, and productionize deep learning models to extract high-value product attributes.
  • Drive cost efficiency and throughput improvements, owning relevant KPIs.
  • Promote and implement software engineering best practices across the team.
  • Shape and evolve the technical stack to meet business and technical needs.
  • Transition research prototypes into robust, production-ready systems.
  • Deploy, monitor, and continuously improve models in production environments.
  • Optimize model performance, focusing on memory usage and latency.
  • Automate workflows by building efficient pipelines and orchestration frameworks.
  • Develop tools and shared libraries to boost team productivity.
Apply

Related Jobs

Apply

📍 United States, Canada

🧭 Full-Time

💸 230000.0 - 322000.0 USD per year

🔍 Digital Advertising

  • Significant experience in one or more general-purpose programming languages like Java, Python, Go, Scala, C++ or similar.
  • Experience with data processing frameworks like Spark, Flink, Kafka, Druid, etc.
  • Experience with a cloud service provider like AWS or GCP.
  • Familiarity with tools: Kubernetes, Drone, CircleCI, Spinnaker, Argo, Airflow, Docker, Thrift.
  • Experience with datastores: ElasticSearch / Amazon OpenSearch, Redis, Postgres, Cassandra, BigQuery.
  • Experience with a machine learning modeling framework like Tensorflow or PyTorch.
  • Building Reddit-scale optimizations to improve advertiser outcomes using cutting-edge techniques in the industry.
  • Leveraging live auction data and model predictions to adjust campaign bids in real time.
  • Incorporating knowledge of the Reddit ads marketplace into budget pacing algorithms powered by control & reinforcement learning systems.
  • Leading the team on designing new bid & budget optimization products and algorithms as well as conducting rigorous A/B experiments to evaluate the business impact.
  • Actively participating and working with other leads to set the long-term direction for the team, plan and oversee engineering designs and project execution.

AWSDockerPythonElasticSearchKafkaKubernetesMachine LearningPyTorchAlgorithmsCassandraPostgresRedisSparkTensorflow

Posted 30 days ago
Apply
Apply
🔥 Staff Machine Learning Engineer
Posted about 2 months ago

📍 U.S.

🧭 Full-Time

💸 188000.0 - 225000.0 USD per year

🔍 FinTech

🏢 Company: Flex

  • Master’s or Ph.D. in Computer Science, Engineering, or a related field.
  • 6+ years of experience as a Machine Learning Engineer with expertise in production environments.
  • Strong proficiency in Python or similar programming languages.
  • Experience with ML libraries like TensorFlow, PyTorch, and scikit-learn.
  • Extensive experience with cloud platforms (e.g., AWS, GCP, Azure) and distributed computing frameworks (e.g., Spark, Kubernetes).
  • Proven track record of implementing end-to-end machine learning pipelines.
  • Strong background in model optimization, version control, and CI/CD practices.
  • Excellent problem-solving abilities and the capacity to collaborate with cross-functional teams.
  • Own the end-to-end lifecycle of machine learning projects, from data collection and preprocessing to model deployment, monitoring, and maintenance in a production environment.
  • Build, maintain, and optimize robust data pipelines that support model development, training, and deployment at scale.
  • Implement machine learning algorithms and models that meet performance, scalability, and reliability requirements.
  • Collaborate with data scientists, engineers, and product teams to design and deploy machine learning systems.
  • Continuously monitor and improve model performance, conducting experiments and tuning hyperparameters.
  • Leverage distributed computing frameworks and cloud-based platforms for efficient processing of large-scale datasets.

AWSPythonGCPKubernetesMachine LearningPyTorchAzureSparkTensorflowCI/CD

Posted about 2 months ago
Apply
Apply

📍 United States, Canada

🧭 Full-Time

💸 260800.0 - 365100.0 USD per year

🔍 Digital Advertising

🏢 Company: Reddit👥 1001-5000💰 $410,000,000 Series F over 3 years ago🫂 Last layoff over 1 year agoNewsContentSocial NetworkSocial Media

  • 10+ years of contributing high-quality code to production systems that operate at scale.
  • 2+ years of experience operating as a Senior Staff engineer.
  • 5+ years of experience in building control systems, PID controllers, multi-armed bandits, reinforcement learning algorithms, or bid/pricing optimization systems.
  • Experience leading large engineering teams and collaborating with cross-functional partners.
  • Experience designing optimization algorithms in an ad serving platform or other marketplaces is preferred.
  • Familiarity with control systems and reinforcement learning algorithms is a strong plus.
  • Significant experience in Java, Python, Go, Scala, C++, or similar languages.
  • Experience with data processing frameworks like Spark, Flink, Kafka.
  • Familiarity with AWS or GCP as a cloud service provider.
  • Experience with tools like Kubernetes, Docker, Airflow, etc.
  • Familiarity with datastores such as ElasticSearch, Redis, and Postgres.
  • Building Reddit-scale optimizations to improve advertiser outcomes using innovative techniques.
  • Leveraging live auction data and model predictions to adjust campaign bids in real-time.
  • Incorporating knowledge of the Reddit ads marketplace into budget pacing algorithms.
  • Leading a team of 15+ engineers on designing new bid & budget optimization products.
  • Conducting rigorous A/B experiments to evaluate business impact.
  • Collaborating with cross-functional teams for customer representation.

AWSDockerPythonElasticSearchGCPJavaKafkaKubernetesMachine LearningPyTorchC++AirflowAlgorithmsCassandraGoPostgresRedisSparkTensorflowScala

Posted about 2 months ago
Apply
Apply

📍 US

🧭 Full-Time

💸 165000.0 - 283000.0 USD per year

🔍 Technology

🏢 Company: Mozilla👥 5001-10000💰 $300,000 Angel about 20 years ago🫂 Last layoff 3 months agoInternetOpen SourceWeb BrowsersSoftwareBrowser Extensions

  • 6+ years experience as a Machine Learning engineer building tooling and services for ML applications in production.
  • Experience with designing and building ML tooling and infrastructure for training, deploying, inference, and validation of models.
  • Experience with distributed systems and platforms for AI integrations.
  • Strong problem-solving skills and the ability to communicate complex concepts effectively.
  • Experience working collaboratively with product managers and non-engineering teams.
  • Effective documentation and communication skills.
  • Lead the design, development, and integration of Generative AI solutions in Firefox.
  • Collaborate cross-functionally with product management, full-stack engineering, and design.
  • Build infrastructure for training and inference of LLMs and small models.
  • Implement robust validation and testing procedures.
  • Continuously monitor and optimize deployed models for performance and efficiency.

AWSArtificial IntelligenceCloud ComputingKubernetesMachine LearningPyTorchAlgorithmsData engineeringData scienceTensorflowCommunication SkillsCI/CDDocumentation

Posted 2 months ago
Apply
Apply

📍 United States, Canada

🧭 Full-Time

💸 230000 - 322000 USD per year

🔍 Advertising technology

  • 7+ years of contributing high-quality code to production systems that operate at scale.
  • 5+ years of experience building control systems, PID controllers, multi-armed bandits, reinforcement learning algorithms, or bid/pricing optimization systems.
  • Experience leading large engineering teams and collaborating with cross-functional partners.
  • Experience designing optimization algorithms in an ad serving platform and/or other marketplaces.
  • Significant experience in one or more general-purpose programming languages like Java, Python, Go, Scala, C++, or similar.
  • Familiarity with data processing frameworks like Spark, Flink, Kafka, Druid, etc.
  • Experience with a cloud service provider like AWS or GCP.
  • Knowledge of tools like Kubernetes, Drone, CircleCI, Spinnaker, Argo, Airflow, Docker, Thrift.
  • Experience with datastores such as ElasticSearch/Amazon OpenSearch, Redis, Postgres, Cassandra, BigQuery.
  • Experience with machine learning modeling frameworks like TensorFlow or PyTorch.
  • Building Reddit-scale optimizations to improve advertiser outcomes using cutting-edge techniques in the industry.
  • Leverage live auction data and model predictions to adjust campaign bids in real time.
  • Incorporate knowledge of the Reddit ads marketplace into budget pacing algorithms powered by control & reinforcement learning systems.
  • Lead the team on designing new bid & budget optimization products and algorithms as well as conducting rigorous A/B experiments to evaluate the business impact.
  • Actively participate and work with other leads to set the long-term direction for the team, plan and oversee engineering designs and project execution.

AWSDockerPythonElasticSearchGCPJavaKafkaKubernetesMachine LearningPyTorchC++AirflowAlgorithmsCassandraGoPostgresRedisSparkTensorflow

Posted 3 months ago
Apply
Apply

📍 Canada

🧭 Full-Time

🔍 Artificial Intelligence

🏢 Company: Cresta👥 101-250💰 $125,000,000 Series D 3 months agoAutomotiveCustomer ServiceArtificial Intelligence (AI)Intelligent SystemsRetailMachine LearningTelecommunicationsNatural Language ProcessingSoftware

  • Bachelor’s Degree in Computer Science, Mathematics, or a related field; Master’s or Ph.D. preferred, or equivalent professional experience.
  • 7+ years of hands-on industry experience with AI and machine learning, preferably with 3+ years of experience working with LLMs in large-scale production environments.
  • Expert knowledge of machine learning concepts and methods, especially those related to NLP, Generative AI, and working with LLMs.
  • Proven leadership in designing and deploying AI solutions at scale, with a deep understanding of model optimization and real-time AI applications.
  • Extensive practical knowledge of modern machine learning frameworks and technologies (e.g., PyTorch, Tensorflow, Hugging Face, NumPy), as well as experience with distributed systems and cloud-based AI infrastructure.
  • Strong problem-solving and strategic thinking abilities, with a proven ability to lead cross-functional teams and work collaboratively to deliver innovative AI solutions in production.
  • A passion for driving AI adoption and pushing the boundaries of AI technology into real-world applications, with an ability to mentor junior engineers and influence strategic decisions across the organization.
  • Design, develop, and deploy Cresta’s AI Agent solutions and proprietary models.
  • Focus on practical AI challenges such as improving reasoning, planning capabilities, and evaluation in real-world scenarios.
  • Collaborate with cross-functional teams including front-end and back-end software engineers to integrate AI Agents into Cresta’s customer solutions.
  • Lead initiatives to scale AI systems for production environments, ensuring performance and reliability across use cases.
  • Contribute to solving cutting-edge problems in AI and help define the future roadmap for Cresta’s AI Agents.
  • Innovate and research ways to improve security, cost-efficiency, and reliability of AI systems.

LeadershipMachine LearningNumpyPyTorchTensorflowProblem SolvingMentoring

Posted 3 months ago
Apply
Apply

📍 Canada

🧭 Full-Time

💸 160000 - 242000 CAD per year

🔍 Technology and Internet

🏢 Company: Mozilla👥 5001-10000💰 $300,000 Angel about 20 years ago🫂 Last layoff 3 months agoInternetOpen SourceWeb BrowsersSoftwareBrowser Extensions

  • A bachelor’s degree in Statistics, Computer Science, related technical field, or equivalent practical experience.
  • A minimum of 6 years of experience in a quantitative role, with ideally much of that as a machine learning engineer or a data scientist.
  • Knowledge of and expertise in Natural Language Processing (NLP).
  • Proficiency in a data query language (e.g., SQL) and a programming language (e.g., Python).
  • Demonstrable experience with the full lifecycle of machine learning models - from development to deployment and monitoring.
  • Being an excellent team player with a proven ability to work effectively in cross-functional teams.
  • Ability to be self-directed after work is assigned and help less experienced team members get unblocked.
  • Apply statistical and machine learning techniques to process and analyze unstructured textual data.
  • Develop and finetune machine learning models for tasks such as entity recognition, classification, and text generation.
  • Utilize pretrained language models (e.g., GPT, LLAMA) and adapt them for specific use cases.
  • Optimize models for production usage, including considerations for scalability, latency, and resource.
  • Monitor and refine deployed models for performance and efficiency, and conduct troubleshooting when necessary.
  • Work closely with interdisciplinary teams to deliver high-quality features and solutions.
  • Stay current with advancements in NLP research, methodologies, and best practices.
  • Be consistently productive and operate with a high degree of autonomy.

PythonSQLMachine LearningCollaboration

Posted 4 months ago
Apply