Apply

Senior Data Engineer

Posted 3 months agoViewed

View full description

πŸ’Ž Seniority level: Senior, 5+ years

πŸ“ Location: United States

πŸ’Έ Salary: 100000.0 - 120000.0 USD per year

πŸ” Industry: Healthcare

🏒 Company: FoundπŸ‘₯ 51-100πŸ’° $45,999,997 Series C 10 months agoFinancial ServicesBankingFinTech

πŸ—£οΈ Languages: English

⏳ Experience: 5+ years

πŸͺ„ Skills: PythonSQLApache AirflowETLSnowflakePandasSpark

Requirements:
  • 5+ years of experience in data engineering or related areas. You are an end-to-end data engineer whose experience goes beyond creating ETL pipelines.
  • Expertise in SQL and data manipulation languages.
  • Proficiency in data pipeline tools (Airflow, AWS Glue, Spark/PySpark, Pandas).
  • Strong programming skills in Python.
  • Experience with data storage technologies like warehouses (Snowflake, Redshift) and data lakes (Databricks, Glue Catalog/S3).
Responsibilities:
  • Design, implement, and manage robust and scalable data pipelines to ingest, process, and transform data from various sources.
  • Develop and maintain data models to support business intelligence, reporting, and analytics needs.
  • Design and implement data warehousing solutions to store and organize large volumes of data efficiently.
  • Develop and optimize ETL (Extract, Transform, Load) processes to ensure data accuracy and integrity.
  • Implement data quality checks and monitoring processes to maintain data integrity and reliability.
  • Continuously monitor and optimize data pipelines and queries for performance and scalability.
  • Work closely with data analysts and other stakeholders to understand their data needs and provide solutions.
  • Create and maintain clear and comprehensive documentation of data architecture, processes, and data dictionaries.
Apply

Related Jobs

Apply

πŸ“ Worldwide

πŸ” Hospitality

🏒 Company: Lighthouse

  • 4+ years of professional experience using Python, Java, or Scala for data processing (Python preferred)
  • You stay up-to-date with industry trends, emerging technologies, and best practices in data engineering.
  • Improve, manage, and teach standards for code maintainability and performance in code submitted and reviewed
  • Ship large features independently, generate architecture recommendations and have the ability to implement them
  • Great communication: Regularly achieve consensus amongst teams
  • Familiarity with GCP, Kubernetes (GKE preferred),Β  CI/CD tools (Gitlab CI preferred), familiarity with the concept of Lambda Architecture.
  • Experience with Apache Beam or Apache Spark for distributed data processing or event sourcing technologies like Apache Kafka.
  • Familiarity with monitoring tools like Grafana & Prometheus.
  • Design and develop scalable, reliable data pipelines using the Google Cloud stack.
  • Optimise data pipelines for performance and scalability.
  • Implement and maintain data governance frameworks, ensuring data accuracy, consistency, and compliance.
  • Monitor and troubleshoot data pipeline issues, implementing proactive measures for reliability and performance.
  • Collaborate with the DevOps team to automate deployments and improve developer experience on the data front.
  • Work with data science and analytics teams to enable them to bring their research to production grade data solutions, using technologies like airflow, dbt or MLflow (but not limited to)
  • As a part of a platform team, you will communicate effectively with teams across the entire engineering organisation, to provide them with reliable foundational data models and data tools.
  • Mentor and provide technical guidance to other engineers working with data.

PythonSQLApache AirflowETLGCPKubernetesApache KafkaData engineeringCI/CDMentoringTerraformScalaData modeling

Posted 4 days ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ’Έ 183600.0 - 216000.0 USD per year

πŸ” Software Development

  • 6+ years of experience in a data engineering role building products, ideally in a fast-paced environment
  • Good foundations in Python and SQL.
  • Experience with Spark, PySpark, DBT, Snowflake and Airflow
  • Knowledge of visualization tools, such as Metabase, Jupyter Notebooks (Python)
  • Collaborate on the design and improvements of the data infrastructure
  • Partner with product and engineering to advocate best practices and build supporting systems and infrastructure for the various data needs
  • Create data pipelines that stitch together various data sources in order to produce valuable business insights
  • Create real-time data pipelines in collaboration with the Data Science team

PythonSQLSnowflakeAirflowData engineeringSparkData visualizationData modeling

Posted 5 days ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ” Healthcare

🏒 Company: Rad AIπŸ‘₯ 101-250πŸ’° $60,000,000 Series C 2 months agoArtificial Intelligence (AI)Enterprise SoftwareHealth Care

  • 4+ years relevant experience in data engineering.
  • Expertise in designing and developing distributed data pipelines using big data technologies on large scale data sets.
  • Deep and hands-on experience designing, planning, productionizing, maintaining and documenting reliable and scalable data infrastructure and data products in complex environments.
  • Solid experience with big data processing and analytics on AWS, using services such as Amazon EMR and AWS Batch.
  • Experience in large scale data processing technologies such as Spark.
  • Expertise in orchestrating workflows using tools like Metaflow.
  • Experience with various database technologies including SQL, NoSQL databases (e.g., AWS DynamoDB, ElasticSearch, Postgresql).
  • Hands-on experience with containerization technologies, such as Docker and Kubernetes.
  • Design and implement the data architecture, ensuring scalability, flexibility, and efficiency using pipeline authoring tools like Metaflow and large-scale data processing technologies like Spark.
  • Define and extend our internal standards for style, maintenance, and best practices for a high-scale data platform.
  • Collaborate with researchers and other stakeholders to understand their data needs including model training and production monitoring systems and develop solutions that meet those requirements.
  • Take ownership of key data engineering projects and work independently to design, develop, and maintain high-quality data solutions.
  • Ensure data quality, integrity, and security by implementing robust data validation, monitoring, and access controls.
  • Evaluate and recommend data technologies and tools to improve the efficiency and effectiveness of the data engineering process.
  • Continuously monitor, maintain, and improve the performance and stability of the data infrastructure.

AWSDockerSQLElasticSearchETLKubernetesData engineeringNosqlSparkData modeling

Posted 5 days ago
Apply
Apply

πŸ“ Worldwide

🧭 Full-Time

NOT STATED
  • Own the design and implementation of cross-domain data models that support key business metrics and use cases.
  • Partner with analysts and data engineers to translate business logic into performant, well-documented dbt models.
  • Champion best practices in testing, documentation, CI/CD, and version control, and guide others in applying them.
  • Act as a technical mentor to other analytics engineers, supporting their development and reviewing their code.
  • Collaborate with central data platform and embedded teams to improve data quality, metric consistency, and lineage tracking.
  • Drive alignment on model architecture across domainsβ€”ensuring models are reusable, auditable, and trusted.
  • Identify and lead initiatives to reduce technical debt and modernise legacy reporting pipelines.
  • Contribute to the long-term vision of analytics engineering at Pleo and help shape our roadmap for scalability and impact.

SQLData AnalysisETLData engineeringCI/CDMentoringDocumentationData visualizationData modelingData analyticsData management

Posted 5 days ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ’Έ 183600.0 - 216000.0 USD per year

πŸ” Mental Healthcare

🏒 Company: HeadwayπŸ‘₯ 201-500πŸ’° $125,000,000 Series C over 1 year agoMental Health Care

  • 6+ years of experience in a data engineering role building products, ideally in a fast-paced environment
  • Good foundations in Python and SQL.
  • Experience with Spark, PySpark, DBT, Snowflake and Airflow
  • Knowledge of visualization tools, such as Metabase, Jupyter Notebooks (Python)
  • A knack for simplifying data, expressing information in charts and tables
  • Collaborate on the design and improvements of the data infrastructure
  • Partner with product and engineering to advocate best practices and build supporting systems and infrastructure for the various data needs
  • Create data pipelines that stitch together various data sources in order to produce valuable business insights
  • Create real-time data pipelines in collaboration with the Data Science team

PythonSQLETLSnowflakeAirflowData engineeringRDBMSSparkRESTful APIsData visualizationData modeling

Posted 5 days ago
Apply
Apply

πŸ“ United States, Canada

🧭 Full-Time

πŸ” Software Development

  • Strong hands-on experience with Python and core Python Data Processing tools such as pandas, numpy, scipy, scikit
  • Experience with cloud tools and environments like Docker, Kubernetes, GCP, and/or Azure
  • Experience with Spark/PySpark
  • Experience with Data Lineage and Data Cataloging
  • Relational and non-relational database experience
  • Experience with Data Warehouses and Lakes, such as Bigquery, Databricks, or Snowflake
  • Experience in designing and building data pipelines that scale
  • Strong communication skills, with the ability to convey technical solutions to both technical and non-technical stakeholders
  • Experience working effectively in a fast-paced, agile environment as part of a collaborative team
  • Ability to work independently and as part of a team
  • Willingness and enthusiasm to learn new technologies and tackle challenging problems
  • Experience in Infrastructure as Code tools like Terraform
  • Advanced SQL expertise, including experience with complex queries, query optimization, and working with various database systems
  • Work with business stakeholders to understand their goals, challenges, and decisions
  • Assist with building solutions that standardize their data approach to common problems across the company
  • Incorporate observability and testing best practices into projects
  • Assist in the development of processes to ensure their data is trusted and well-documented
  • Effectively work with data analysts on refining the data model used for reporting and analytical purposes
  • Improve the availability and consistency of data points crucial for analysis
  • Standing up a reporting system in BigQuery from scratch, including data replication, infrastructure setup, dbt model creation, and Integration with reporting endpoints
  • Revamping orchestration and execution to reduce critical data delivery times
  • Database archiving to move data from a live database to cold storage

AWSSQLCloud ComputingData AnalysisETLData engineeringData visualizationData modeling

Posted 12 days ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ’Έ 144000.0 - 180000.0 USD per year

πŸ” Software Development

🏒 Company: HungryrootπŸ‘₯ 101-250πŸ’° $40,000,000 Series C almost 4 years agoArtificial Intelligence (AI)Food and BeverageE-CommerceRetailConsumer GoodsSoftware

  • 5+ years of experience in ETL development and data modeling
  • 5+ years of experience in both Scala and Python
  • 5+ years of experience in Spark
  • Excellent problem-solving skills and the ability to translate business problems into practical solutions
  • 2+ years of experience working with the Databricks Platform
  • Develop pipelines in Spark (Python + Scala) in the Databricks Platform
  • Build cross-functional working relationships with business partners in Food Analytics, Operations, Marketing, and Web/App Development teams to power pipeline development for the business
  • Ensure system reliability and performance
  • Deploy and maintain data pipelines in production
  • Set an example of code quality, data quality, and best practices
  • Work with Analysts and Data Engineers to enable high quality self-service analytics for all of Hungryroot
  • Investigate datasets to answer business questions, ensuring data quality and business assumptions are understood before deploying a pipeline

AWSPythonSQLApache AirflowData MiningETLSnowflakeAlgorithmsAmazon Web ServicesData engineeringData StructuresSparkCI/CDRESTful APIsMicroservicesJSONScalaData visualizationData modelingData analyticsData management

Posted 22 days ago
Apply
Apply

πŸ“ United States

πŸ’Έ 135000.0 - 155000.0 USD per year

πŸ” Software Development

🏒 Company: JobgetherπŸ‘₯ 11-50πŸ’° $1,493,585 Seed about 2 years agoInternet

  • 8+ years of experience as a data engineer, with a strong background in data lake systems and cloud technologies.
  • 4+ years of hands-on experience with AWS technologies, including S3, Redshift, EMR, Kafka, and Spark.
  • Proficient in Python or Node.js for developing data pipelines and creating ETLs.
  • Strong experience with data integration and frameworks like Informatica and Python/Scala.
  • Expertise in creating and managing AWS services (EC2, S3, Lambda, etc.) in a production environment.
  • Solid understanding of Agile methodologies and software development practices.
  • Strong analytical and communication skills, with the ability to influence both IT and business teams.
  • Design and develop scalable data pipelines that integrate enterprise systems and third-party data sources.
  • Build and maintain data infrastructure to ensure speed, accuracy, and uptime.
  • Collaborate with data science teams to build feature engineering pipelines and support machine learning initiatives.
  • Work with AWS cloud technologies like S3, Redshift, and Spark to create a world-class data mesh environment.
  • Ensure proper data governance and implement data quality checks and lineage at every stage of the pipeline.
  • Develop and maintain ETL processes using AWS Glue, Lambda, and other AWS services.
  • Integrate third-party data sources and APIs into the data ecosystem.

AWSNode.jsPythonSQLETLKafkaData engineeringSparkAgile methodologiesScalaData modelingData management

Posted 22 days ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ’Έ 170000.0 - 190000.0 USD per year

πŸ” Software Development

🏒 Company: ProductivπŸ‘₯ 101-250πŸ’° $45,000,000 Series C about 4 years agoDeveloper PlatformCommunitiesService IndustrySaaSData IntegrationAnalyticsEnterprise SoftwareSoftwareApplication Performance Management

  • Strong experience designing and implementing ETL/ELT data pipelines using modern data stack technologies (e.g., Redshift, Athena, Presto, DynamoDB).
  • Expertise in data modeling and designing scalable storage solutions for analytics and reporting.
  • Strong proficiency in SQL, NoSQL, and Javascript
  • Experience with monitoring, logging, and alerting for data systems to ensure proactive issue resolution.
  • Experience with data migration to-and-from S3, DynamoDB, Redshift, and Athena.
  • Design, build, and maintain scalable, efficient, and reliable data pipelines
  • Ensure data integrity and quality
  • Implement monitoring, alerting, and logging systems
  • Design and optimize data models and storage solutions
  • Collaborate with cross-functional teams
  • Continuously improve data engineering processes and standards
  • Troubleshoot and resolve complex data issues
  • Mentor and provide technical leadership

AWSSQLApache AirflowDynamoDBETLJavascriptCross-functional Team LeadershipAlgorithmsData engineeringData StructuresREST APINosqlCommunication SkillsCI/CDProblem SolvingMentoringData visualizationData modelingSoftware EngineeringData management

Posted 26 days ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ’Έ 170000.0 - 210000.0 USD per year

πŸ” Health and Fitness

  • Minimum of 6 years of experience working in data engineering
  • Expertise both in using SQL and Python for data cleansing, transformation, modeling, pipelining, etc.
  • Proficient in working with other stakeholders and converting requirements into detailed technical specifications; owning and leading projects from inception to completion
  • Proficiency in working with high volume datasets in SQL-based warehouses such as BigQuery
  • Proficiency with parallelized python-based data processing frameworks such as Google Dataflow (Apache Beam), Apache Spark, etc.
  • Experience using ELT tools like Dataform or dbt
  • Professional experience maintaining data systems in GCP and AWS
  • Deep understanding of data modeling, access, storage, caching, replication, and optimization techniques
  • Experienced with orchestrating data pipelines and Kubernetes-based jobs with Apache Airflow
  • Understanding of the software development lifecycle and CI/CD
  • Monitoring and metrics-gathering (e.g. Datadog, NewRelic, Cloudwatch, etc)
  • Willingness to participate in a weekly on-call support rotation - currently the rotation is monthly
  • Proficiency with git and working collaboratively in a shared codebase
  • Excellent documentation skills
  • Self motivation and a deep sense of pride in your work
  • Passion for the outdoors
  • Comfort with ambiguity, and an instinct for moving quickly
  • Humility, empathy and open-mindedness - no egos
  • Work cross-functionally to ensure data scientists have access to clean, reliable, and secure data, the backbone for new algorithmic product features
  • Build, deploy, and orchestrate large-scale batch and stream data pipelines to transform and move data to/from our data warehouse and other systems
  • Deliver scalable, testable, maintainable, and high-quality code
  • Investigate, test-for, monitor, and alert on inconsistencies in our data, data systems, or processing costs
  • Create tools to improve data and model discoverability and documentation
  • Ensure data collection and storage adheres to GDPR and other privacy and legal compliance requirements
  • Uphold best data-quality standards and practices, promoting such knowledge throughout the organization
  • Deploy and build systems that enable machine learning and artificial intelligence product solutions
  • Mentoring others on best industry practices

AWSPostgreSQLPythonSQLApache AirflowCloud ComputingData AnalysisETLApache KafkaData engineeringCI/CDRESTful APIsMicroservicesData visualizationData modelingData analyticsData management

Posted 27 days ago
Apply