Staff Data Engineer

Posted 2 months agoViewed

View full description

💎 Seniority level: Staff, 12+ years

📍 Location: United States, Canada

🔍 Industry: Software Development

🗣️ Languages: English

⏳ Experience: 12+ years

🪄 Skills: PostgreSQLPythonDynamoDBETLMySQLData engineeringCI/CDScala

Requirements:

12+ years of experience in data engineering
Expertise in designing scalable data architectures
Strong programming skills in Python and Scala
Experience with Apache Spark, Databricks, Delta Lake
Proficiency with relational and NoSQL databases

Responsibilities:

Design and implement scalable data pipelines
Define and enforce data engineering best practices
Conduct code reviews and mentor team members
Build and maintain batch and real-time data pipelines
Ensure data quality, governance, and security

Apply

Related Jobs

Apply

🔥 Staff Data Engineer

Posted about 3 hours ago

📍 United States

🧭 Full-Time

🔍 Healthcare

🏢 Company: Atropos Health

🔧 Requirements

At least 5 years of experience working with messy real-world healthcare data (EHR data, claims) and experience with Real World Data / Evidence at least 3 years
Expertise working with common data models and terminology systems in healthcare data (data models and standards such as FHIR, OMOP, codesets like ICD 9/10, CPT, ATC, RxNorm, NDC, etc.)
Knowledge of common EHR and medical claim / administrative workflows and the basics of the US healthcare system (patient movement through a hospital or clinic, how medical procedures are reimbursed, etc.)
SQL (any dialect): at least 5 years, Python at least 4 years (Required)
Significant and deep experience as a data professional working with public cloud infrastructure to build data products (Required)
Knowledge of EHR data systems and at least 4 years of experience working intensively with EHR data in a data engineering and/or analytics environment
High level knowledge of ETL/ELT operations and data modeling

💡 Responsibilities

Create and maintain data pipelines to integrate, enrich, and map clinical, claims, and other data from multiple sources in the cloud
Design systems of data quality checks and assessments on clinical datasets
Expert knowledge of how to automate workflows and create reusable infrastructure in multiple public cloud environments (AWS, Azure, GCP)
Map data from one source into a common data model using many different frameworks, with a particular focus on PySpark and Spark SQL.
Create, maintain, tune, and document data flows in Databricks and Snowflake, Bigquery, and other diverse cloud tools
Filter, clean, transform, and integrate clinical data using your domain expertise of healthcare data and code systems and embedded knowledge
Work remotely with an interdisciplinary team and build and maintain strong relationships with research, clinical, product, and commercial stakeholders
Manage external customer, vendors, and customer relationships and create internal and external communications
Travel required to company or team offsites 3-4 weeks per year

AWSPythonSQLCloud ComputingETLSnowflakeData engineeringRDBMSSparkData modeling

Posted about 3 hours ago

Apply

🔥 Staff Data Engineer

Posted about 5 hours ago

📍 US

🧭 Full-Time

💸 185000.0 - 200000.0 USD per year

🔍 Adtech

🔧 Requirements

8+ years of experience in data engineering.
Proven experience building data infrastructure using Spark with Scala.
Familiarity with data lakes, cloud warehouses, and storage formats.
Strong proficiency in AWS services.
Expertise in SQL for data manipulation and extraction.
Bachelor's degree in Computer Science or a related field.

💡 Responsibilities

Design and implement robust data infrastructure using Spark with Scala.
Collaborate with our cross-functional teams to design data solutions that meet business needs.
Build out our core data pipelines, store data in optimal engines and formats, and feed our machine learning models.
Leverage and optimize AWS resources.
Collaborate closely with the Data Science team.

AWSSQLCloud ComputingETLMachine LearningData engineeringData scienceSparkScalaData modeling

Posted about 5 hours ago

Apply

🔥 Staff Data Engineer - Insurance Data

Posted about 17 hours ago

📍 AL, AR, AZ, CA (exempt only), CO, CT, FL, GA, ID, IL, IN, IA, KS, KY, MA, ME, MD, MI, MN, MO, MT, NC, NE, NJ, NM, NV, NY, OH, OK, OR, PA, SC, SD, TN, TX, UT, VT, VA, WA, WI

🧭 Full-Time

🔍 Insurance

🏢 Company: Kin Insurance

🔧 Requirements

Depth of experience in modern big data environments.
Advanced knowledge and experience with SQL and Python is expected.
Insurance domain knowledge.

💡 Responsibilities

Design and develop data pipelines and modeling raw data for downstream ingestion.
Mentor and guide data engineers on your team and across the organization, while collaborating with other engineers, product managers, analysts, and stakeholders.
Lead a cross-functional project team with members from AppEng, DataEng, BI, and business stakeholders.

AWSPythonSQLApache AirflowETLCross-functional Team LeadershipData engineeringProblem SolvingMentoringDocumentationComplianceData visualizationData modelingData management

Posted about 17 hours ago

Apply

🔥 Staff Data Engineer

Posted 7 days ago

📍 United States

🏢 Company: ge_externalsite

🔧 Requirements

Hands-on experience in programming languages like Java, Python or Scala and experience in writing SQL scripts for Oracle, MySQL, PostgreSQL or HiveQL
Exposure to industry standard data modeling tools (e.g., ERWin, ER Studio, etc.).
Exposure to Extract, Transform & Load (ETL) tools like Informatica or Talend
Exposure to industry standard data catalog, automated data discovery and data lineage tools (e.g., Alation, Collibra, etc., )
Experience with Big Data / Hadoop / Spark / Hive / NoSQL database engines (i.e. Cassandra or HBase)
Exposure to unstructured datasets and ability to handle XML, JSON file formats
Conduct exploratory data analysis and generate visual summaries of data. Identify data quality issues proactively.
Developing reusable code pipelines through CI/CD.
Hands-on experience of big data or MPP databases.
Developing and executing integrated test plans.

💡 Responsibilities

Be responsible for identifying solutions for complex data analysis and data structure.
Be responsible for creating digital thread requirements
Be responsible for change management of database artifacts to support next gen QMS applications
Be responsible for monitoring data availability and data health of complex systems
Understand industry trends and stay up to date on associated Quality and tech landscape.
Design & build technical data dictionaries and support business glossaries to analyze the datasets
This role may also work on other Quality team digital and strategic deliveries that support the business.
Perform data profiling and data analysis for source systems, manually maintained data, machine or sensor generated data and target data repositories
Design & build both logical and physical data models for both Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) solutions
Develop and maintain data mapping specifications based on the results of data analysis and functional requirements
Build a variety of data loading & data transformation methods using multiple tools and technologies.
Design & build automated Extract, Transform & Load (ETL) jobs based on data mapping specifications
Manage metadata structures needed for building reusable Extract, Transform & Load (ETL) components.
Analyze reference datasets and familiarize with Master Data Management (MDM) tools.
Analyze the impact of changes to downstream systems/products and recommend alternatives to minimize the impact.
Derive solutions and make recommendations from deep dive data analysis proactively.
Design and build Data Quality (DQ) rules.
Drives design and implementation of the roadmap.
Design and develop complex code in multiple languages.
This role may also work on other Quality team digital and strategic deliveries that support the business.

PostgreSQLPythonSQLData AnalysisETLHadoopJavaMySQLOracleData engineeringNosqlSparkCI/CDAgile methodologiesJSONScalaData visualizationData modeling

Posted 7 days ago

Apply

🔥 Staff Data Engineer

Posted 8 days ago

📍 United States

🧭 Full-Time

🔍 Software Development

🏢 Company: Apollo.io👥 501-1000💰 $100,000,000 Series D over 1 year agoSoftware Development

🔧 Requirements

8+ years of experience as a data platform engineer or a software engineer in data or big data engineer.
Experience in data modeling, data warehousing, APIs, and building data pipelines.
Deep knowledge of databases and data warehousing with an ability to collaborate cross-functionally.
Bachelor's degree in a quantitative field (Physical/Computer Science, Engineering, Mathematics, or Statistics).

💡 Responsibilities

Develop and maintain scalable data pipelines and build new integrations to support continuing increases in data volume and complexity.
Develop and improve Data APIs used in machine learning / AI product offerings
Implement automated monitoring, alerting, self-healing (restartable/graceful failures) features while building the consumption pipelines.
Implement processes and systems to monitor data quality, ensuring production data is always accurate and available.
Write unit/integration tests, contribute to the engineering wiki, and document work.
Define company data models and write jobs to populate data models in our data warehouse.
Work closely with all business units and engineering teams to develop a strategy for long-term data platform architecture.

PythonSQLApache AirflowApache HadoopCloud ComputingETLApache KafkaData engineeringFastAPIData modelingData analytics

Posted 8 days ago

Apply

🔥 Staff Data Engineer

Posted 13 days ago

📍 United States

🧭 Full-Time

💸 117800.0 - 214300.0 USD per year

🔍 Software Development

🏢 Company: careers_gm

🔧 Requirements

7+ years of hands-on experience.
Bachelor's degree (or equivalent work experience) in Computer Science, Data Science, Software Engineering, or a related field.
Strong understanding and ability to provide mentorship in the areas of data ETL processes and tools for designing and managing data pipelines
Proficient with big data frameworks and tools like Apache Hadoop, Apache Spark, or Apache Kafka for processing and analyzing large datasets.
Hands on experience with data serialization formats like JSON, Parquet and XML
Consistently models and leads in best practices and optimization for scripting skills in languages like Python, Java, Scala, etc for automation and data processing.
Proficient with database administration and performance tuning for databases like MySQL, PostgresSQL or NoSQL databases
Proficient with containerization (e.g., Docker) and orchestration platforms (e.g., Kubernetes) for managing data applications.
Experience with cloud platforms and data services for data storage and processing
Consistently designs solutions and build data solutions that are highly automated, performant, with quality checks that provide data consistency and accuracy outcomes
Experienced at actively managing large-scale data engineering projects, including planning, resource allocation, risk management, and ensuring successful project delivery and adjust style for all delivery methods (ie: Waterfall, Agile, POD, etc)
Understands data governance principles, data privacy regulations, and experience implementing security measures to protect data
Able to integrate data engineering pipelines with machine learning models and platforms
Strong problem-solving skills to identify and resolve complex data engineering issues efficiently.
Ability to work effectively in cross-functional teams, collaborate with data scientists, analysts, and stakeholders to deliver data solutions.
Ability to lead and mentor junior data engineers, providing guidance and support in complex data engineering projects.
Influential communication skills to effectively convey technical concepts to non-technical stakeholders and document data engineering processes.
Models a mindset of continuous learning, staying updated with the latest advancements in data engineering technologies, and a drive for innovation.

💡 Responsibilities

Design, construct, install and maintain data architectures, including database and large-scale processing systems.
Develop and maintain ETL (Extract, Transform, Load) processes to collect, cleanse and transform data from various sources inclusive of cloud.
Design and implement data pipelines to collect, process and transfer data from various sources to storage systems (data warehouses, data lakes, etc)
Implement security measures to protect sensitive data and ensure compliance with data privacy regulations.
Build data solutions that ensure data quality, integrity and security through data validation, monitoring, and compliance with data governance policies
Administer and optimize databases for performance and scalability
Maintain Master Data, Metadata, Data Management Repositories, Logical Data Models, and Data Standards
Troubleshoot and resolve data-related issues affecting data quality fidelity
Document data architectures, processes and best practices for knowledge sharing across the GM data engineering community
Participate in the evaluation and selection of data related tools and technologies
Collaborate across other engineering functions within EDAI, Marketing Technology, and Software & Services

AWSDockerPostgreSQLPythonSQLApache HadoopCloud ComputingData AnalysisETLJavaKubernetesMySQLAlgorithmsApache KafkaData engineeringData scienceData StructuresREST APINosqlCI/CDProblem SolvingJSONScalaData visualizationData modelingScriptingData analyticsData management

Posted 13 days ago

Apply

🔥 Staff Data Engineer: Host Pricing & Settings

Posted 17 days ago

📍 United States

💸 204000.0 - 260000.0 USD per year

🔍 Software Development

🏢 Company: Airbnb👥 5001-10000💰 Secondary Market almost 5 years ago🫂 Last layoff about 2 years agoHospitality Travel Accommodations PropTech Marketplace Mobile Apps Travel

🔧 Requirements

9+ years of experience with a BS/Masters or 6+ years with a PhD
Expertise in SQL and proficient in at least one data engineering language, such as Python or Scala
Experience with Superset and Tableau
Expertise in large-scale distributed data processing frameworks like Presto or Spark
Experience with an ETL framework like Airflow
Extensive knowledge of data management concepts, including data modeling, ETL processes, data warehousing, and data governance.
Understanding of data security and privacy principles, as well as regulatory compliance requirements (e.g., GDPR, CCPA).
Strong problem-solving skills and the ability to translate business requirements into technical solutions.
Excellent communication skills, both written and verbal, ability to distill complex ideas for technical and non-technical stakeholders
Strong capability to forge trusted partnerships across working teams

💡 Responsibilities

Design and implement data pipelines by leveraging best-in-class tools and infrastructure to meet critical business and product requirements.
Develop high quality data assets for product and AI/ML use-cases
Collaborate with cross-functional teams to gather requirements, assess data needs, and design efficient solutions that align with business objectives.
Contribute to the development of long-term data strategies and roadmaps and ML infrastructure development within the organization.
Influence the trajectory of data in decision making
Improve trust in our data by championing for data quality across the stack
Identify and actively work upon opportunities for automation and implement data management tools and frameworks to enhance efficiency and productivity.
Mentor and coach team members, providing guidance in data engineering best practices and support to enhance their skills and performance.

LeadershipPythonSQLETLMachine LearningCross-functional Team LeadershipTableauAirflowData engineeringREST APISparkCI/CDProblem SolvingExcellent communication skillsScalaData visualizationData modelingData management

Posted 17 days ago

Apply

🔥 Staff Data Engineer

Posted 27 days ago

📍 United States

🧭 Full-Time

🔍 Software Development

🏢 Company: Life360👥 251-500💰 $33,038,258 Post-IPO Equity over 2 years ago🫂 Last layoff about 2 years agoAndroid Family Apps Mobile Apps Mobile

🔧 Requirements

Minimum 7 years of experience working with high volume data infrastructure.
Experience with Databricks and AWS.
Experience with dbt.
Experience with job orchestration tooling like Airflow.
Proficient programming in Python.
Proficient with SQL and the ability to optimize complex queries.
Proficient with large-scale data processing using Spark and/or Presto/Trino.
Proficient in data modeling and database design.
Experience with streaming data with a tool like Kinesis or Kafka.
Experience working with high volume event based data architecture like Amplitude and Braze.
Experience in modern development lifecycle including Agile methodology, CI/CD, automated deployments using Terraform, GitHub Actions, etc.
Knowledge and proficiency in the latest open source and data frameworks, modern data platform tech stacks and tools.
Always learning and staying up to speed with the fast moving data world.
You have good communication and collaboration skills and can work independently.
BS in Computer Science, Software Engineering, Mathematics, or equivalent experience.

💡 Responsibilities

Design, implement, and manage scalable data processing platforms used for real-time analytics and exploratory data analysis.
Manage our financial data from ingestion through ETL to storage and batch processing.
Automate, test and harden all data workflows.
Architect logical and physical data models to ensure the needs of the business are met.
Collaborate across the data teams, engineering, data science, and analytics, to understand their needs, while applying engineering best practices.
Architect and develop systems and algorithms for distributed real-time analytics and data processing.
Implement strategies for acquiring data to develop new insights.
Mentor junior engineers, imparting best practices and institutionalizing efficient processes to foster growth and innovation within the team.
Champion data engineering best practices and institutionalizing efficient processes to foster growth and innovation within the team.

AWSProject ManagementPythonSQLApache AirflowETLKafkaAlgorithmsData engineeringData StructuresREST APISparkCommunication SkillsAnalytical SkillsCollaborationCI/CDProblem SolvingAgile methodologiesMentoringTerraformData visualizationTechnical supportData modelingData analyticsData managementDebugging

Posted 27 days ago

Apply

🔥 Staff Data Engineer

Posted about 1 month ago

📍 US

🧭 Full-Time

💸 160000.0 - 182000.0 USD per year

🔍 Adtech

🏢 Company: tvScientific👥 11-50💰 $9,400,000 Convertible Note about 1 year agoInternet Advertising

🔧 Requirements

7+ years of experience in data engineering.
Proven experience building data infrastructure using Spark with Scala.
Familiarity with data lakes, cloud warehouses, and storage formats.
Strong proficiency in AWS services.
Expertise in SQL for data manipulation and extraction.
Bachelor's degree in Computer Science or a related field.

💡 Responsibilities

Design and implement robust data infrastructure using Spark with Scala.
Collaborate with our cross-functional teams to design data solutions that meet business needs.
Build out our core data pipelines, store data in optimal engines and formats, and feed our machine learning models.
Leverage and optimize AWS resources.
Collaborate closely with the Data Science team.

AWSSQLCloud ComputingETLMachine LearningData engineeringSparkScalaData modeling

Posted about 1 month ago

Apply

🔥 Staff Data Engineer

Posted about 2 months ago

📍 United States, Canada

🧭 Full-Time

💸 200000.0 - 228000.0 USD per year

🔍 Software Development

🏢 Company: Later👥 1-10 Consumer Electronics iOS Apps Software

🔧 Requirements

10+ years of experience in data engineering, software engineering, or related fields.
Proven experience leading the technical strategy and execution of large-scale data platforms.
Expertise in cloud technologies (Google Cloud Platform, AWS, Azure) with a focus on scalable data solutions (BigQuery, Snowflake, Redshift, etc.).
Strong proficiency in SQL, Python, and distributed data processing frameworks (Apache Spark, Flink, Beam, etc.).
Extensive experience with streaming data architectures using Kafka, Flink, Pub/Sub, Kinesis, or similar technologies.
Expertise in data modeling, schema design, indexing, partitioning, and performance tuning for analytical workloads, including data governance (security, access control, compliance: GDPR, CCPA, SOC 2)
Strong experience designing and optimizing scalable, fault-tolerant data pipelines using workflow orchestration tools like Airflow, Dagster, or Dataflow.
Ability to lead and influence engineering teams, drive cross-functional projects, and align stakeholders towards a common data vision.
Experience mentoring senior and mid-level data engineers to enhance team performance and skill development.

💡 Responsibilities

Lead the design and evolution of a scalable data architecture that meets analytical, machine learning, and operational needs.
Architect and optimize data pipelines for batch and real-time data processing, ensuring efficiency and reliability.
Implement best practices for distributed data processing, ensuring scalability, performance, and cost-effectiveness of data workflows.
Define and enforce data governance policies, implement automated validation checks, and establish monitoring frameworks to maintain data integrity.
Ensure data security and compliance with industry regulations by designing appropriate access controls, encryption mechanisms, and auditing processes.
Drive innovation in data engineering practices by researching and implementing new technologies, tools, and methodologies.
Work closely with data scientists, engineers, analysts, and business stakeholders to understand data requirements and deliver impactful solutions.
Develop reusable frameworks, libraries, and automation tools to improve efficiency, reliability, and maintainability of data infrastructure.
Guide and mentor data engineers, fostering a high-performing engineering culture through best practices, peer reviews, and knowledge sharing.
Establish and monitor SLAs for data pipelines, proactively identifying and mitigating risks to ensure high availability and reliability.

AWSPythonSQLApache AirflowCloud ComputingData AnalysisETLGCPKafkaMachine LearningSnowflakeData engineeringData modelingData management

Posted about 2 months ago

Apply