Senior Data & ML Engineer I

Posted about 1 month agoViewed

💎 Seniority level: Senior, 5+ years

📍 Location: United States

💸 Salary: 150000.0 - 165000.0 USD per year

🔍 Industry: Financial Services

🗣️ Languages: English

⏳ Experience: 5+ years

🪄 Skills: PythonSQLKafkaMLFlowSnowflakeAzureData engineeringNosqlSparkCI/CDDevOpsTerraformMicroservicesData visualizationData modeling

5+ years of experience in Data Engineering, Machine Learning Engineering or similar.
Proven experience working with both batch and streaming data pipelines (e.g., dbt, Spark, Snowflake for batch; Kafka/Event Hubs, Delta Lake for streaming)
Strong SQL and Python skills, and comfort working with large-scale datasets.
Solid understanding of data modeling and architecture paradigms: Kimball/dimensional modeling, Data Vault, Medallion.
Hands-on experience with Snowflake, Azure Blob Storage, Databricks, and dbt
Experience working in Azure-native environments, ideally with exposure to tools like Event Hubs, ADLS, Azure DevOps, or Synapse.
Exposure to MLOps and machine learning workflows: supporting ML teams, building feature pipelines, managing model inputs/outputs, monitoring model performance, or deploying models via Databricks ML, MLflow, or Azure ML.
Experience writing and working in a microservices architecture and writing asynchronous python code.
Understanding of ML-specific challenges such as feature drift, data versioning, or batch scoring at scale.
Familiarity with NoSQL databases such as Azure CosmosDB.
Infrastructure-as-code or DevOps tooling experience (Terraform, CI/CD, monitoring) - nice-to-have. Knowledge of Atlan or other data management tools.
Knowledge of Tableau, Power BI or other Analytics tools.
Strong communication skills, with an ability to work collaboratively with cross-functional teams, both technical and on-technical.

Architect and scale streaming pipelines with tools like Kafka/Event Hubs and Databricks Structured Streaming.
Design and optimize batch processing workflows using dbt, Snowflake, and Azure Data Lake.
Build robust ELT and CDC pipelines using Airbyte, and model them cleanly for downstream use.
Implement observability and testing frameworks for data quality, lineage, and freshness.
Implement feature pipelines in Spark for machine learning models, and microservices to host them for production use.
Develop self-service patterns and tooling for analytics and ML teams to move faster.
Help maintain and evolve our Delta Lake environment and push performance boundaries in Databricks.
Collaborate with analytics, engineering, and product teams to ensure data is trusted and accessible.