Apply

Staff Data Engineer

Posted 2 months agoViewed

View full description

πŸ’Ž Seniority level: Staff, 12+ years

πŸ“ Location: United States, Canada

πŸ” Industry: Software Development

πŸ—£οΈ Languages: English

⏳ Experience: 12+ years

πŸͺ„ Skills: PostgreSQLPythonDynamoDBETLMySQLData engineeringCI/CDScala

Requirements:
  • 12+ years of experience in data engineering
  • Expertise in designing scalable data architectures
  • Strong programming skills in Python and Scala
  • Experience with Apache Spark, Databricks, Delta Lake
  • Proficiency with relational and NoSQL databases
Responsibilities:
  • Design and implement scalable data pipelines
  • Define and enforce data engineering best practices
  • Conduct code reviews and mentor team members
  • Build and maintain batch and real-time data pipelines
  • Ensure data quality, governance, and security
Apply

Related Jobs

Apply
πŸ”₯ Staff Data Engineer
Posted about 3 hours ago

πŸ“ United States

🧭 Full-Time

πŸ” Healthcare

🏒 Company: Atropos Health

  • At least 5 years of experience working with messy real-world healthcare data (EHR data, claims) and experience with Real World Data / Evidence at least 3 years
  • Expertise working with common data models and terminology systems in healthcare data (data models and standards such as FHIR, OMOP, codesets like ICD 9/10, CPT, ATC, RxNorm, NDC, etc.)
  • Knowledge of common EHR and medical claim / administrative workflows and the basics of the US healthcare system (patient movement through a hospital or clinic, how medical procedures are reimbursed, etc.)
  • SQL (any dialect): at least 5 years, Python at least 4 years (Required)
  • Significant and deep experience as a data professional working with public cloud infrastructure to build data products (Required)
  • Knowledge of EHR data systems and at least 4 years of experience working intensively with EHR data in a data engineering and/or analytics environment
  • High level knowledge of ETL/ELT operations and data modeling
  • Create and maintain data pipelines to integrate, enrich, and map clinical, claims, and other data from multiple sources in the cloud
  • Design systems of data quality checks and assessments on clinical datasets
  • Expert knowledge of how to automate workflows and create reusable infrastructure in multiple public cloud environments (AWS, Azure, GCP)
  • Map data from one source into a common data model using many different frameworks, with a particular focus on PySpark and Spark SQL.
  • Create, maintain, tune, and document data flows in Databricks and Snowflake, Bigquery, and other diverse cloud tools
  • Filter, clean, transform, and integrate clinical data using your domain expertise of healthcare data and code systems and embedded knowledge
  • Work remotely with an interdisciplinary team and build and maintain strong relationships with research, clinical, product, and commercial stakeholders
  • Manage external customer, vendors, and customer relationships and create internal and external communications
  • Travel required to company or team offsites 3-4 weeks per year

AWSPythonSQLCloud ComputingETLSnowflakeData engineeringRDBMSSparkData modeling

Posted about 3 hours ago
Apply
Apply
πŸ”₯ Staff Data Engineer
Posted about 5 hours ago

πŸ“ US

🧭 Full-Time

πŸ’Έ 185000.0 - 200000.0 USD per year

πŸ” Adtech

  • 8+ years of experience in data engineering.
  • Proven experience building data infrastructure using Spark with Scala.
  • Familiarity with data lakes, cloud warehouses, and storage formats.
  • Strong proficiency in AWS services.
  • Expertise in SQL for data manipulation and extraction.
  • Bachelor's degree in Computer Science or a related field.
  • Design and implement robust data infrastructure using Spark with Scala.
  • Collaborate with our cross-functional teams to design data solutions that meet business needs.
  • Build out our core data pipelines, store data in optimal engines and formats, and feed our machine learning models.
  • Leverage and optimize AWS resources.
  • Collaborate closely with the Data Science team.

AWSSQLCloud ComputingETLMachine LearningData engineeringData scienceSparkScalaData modeling

Posted about 5 hours ago
Apply
Apply

πŸ“ AL, AR, AZ, CA (exempt only), CO, CT, FL, GA, ID, IL, IN, IA, KS, KY, MA, ME, MD, MI, MN, MO, MT, NC, NE, NJ, NM, NV, NY, OH, OK, OR, PA, SC, SD, TN, TX, UT, VT, VA, WA, WI

🧭 Full-Time

πŸ” Insurance

🏒 Company: Kin Insurance

  • Depth of experience in modern big data environments.
  • Advanced knowledge and experience with SQL and Python is expected.
  • Insurance domain knowledge.
  • Design and develop data pipelines and modeling raw data for downstream ingestion.
  • Mentor and guide data engineers on your team and across the organization, while collaborating with other engineers, product managers, analysts, and stakeholders.
  • Lead a cross-functional project team with members from AppEng, DataEng, BI, and business stakeholders.

AWSPythonSQLApache AirflowETLCross-functional Team LeadershipData engineeringProblem SolvingMentoringDocumentationComplianceData visualizationData modelingData management

Posted about 17 hours ago
Apply
Apply

πŸ“ United States

🏒 Company: ge_externalsite

  • Hands-on experience in programming languages like Java, Python or Scala and experience in writing SQL scripts for Oracle, MySQL, PostgreSQL or HiveQL
  • Exposure to industry standard data modeling tools (e.g., ERWin, ER Studio, etc.).
  • Exposure to Extract, Transform & Load (ETL) tools like Informatica or Talend
  • Exposure to industry standard data catalog, automated data discovery and data lineage tools (e.g., Alation, Collibra, etc., )
  • Experience with Big Data / Hadoop / Spark / Hive / NoSQL database engines (i.e. Cassandra or HBase)
  • Exposure to unstructured datasets and ability to handle XML, JSON file formats
  • Conduct exploratory data analysis and generate visual summaries of data. Identify data quality issues proactively.
  • Developing reusable code pipelines through CI/CD.
  • Hands-on experience of big data or MPP databases.
  • Developing and executing integrated test plans.
  • Be responsible for identifying solutions for complex data analysis and data structure.
  • Be responsible for creating digital thread requirements
  • Be responsible for change management of database artifacts to support next gen QMS applications
  • Be responsible for monitoring data availability and data health of complex systems
  • Understand industry trends and stay up to date on associated Quality and tech landscape.
  • Design & build technical data dictionaries and support business glossaries to analyze the datasets
  • This role may also work on other Quality team digital and strategic deliveries that support the business.
  • Perform data profiling and data analysis for source systems, manually maintained data, machine or sensor generated data and target data repositories
  • Design & build both logical and physical data models for both Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) solutions
  • Develop and maintain data mapping specifications based on the results of data analysis and functional requirements
  • Build a variety of data loading & data transformation methods using multiple tools and technologies.
  • Design & build automated Extract, Transform & Load (ETL) jobs based on data mapping specifications
  • Manage metadata structures needed for building reusable Extract, Transform & Load (ETL) components.
  • Analyze reference datasets and familiarize with Master Data Management (MDM) tools.
  • Analyze the impact of changes to downstream systems/products and recommend alternatives to minimize the impact.
  • Derive solutions and make recommendations from deep dive data analysis proactively.
  • Design and build Data Quality (DQ) rules.
  • Drives design and implementation of the roadmap.
  • Design and develop complex code in multiple languages.
  • This role may also work on other Quality team digital and strategic deliveries that support the business.

PostgreSQLPythonSQLData AnalysisETLHadoopJavaMySQLOracleData engineeringNosqlSparkCI/CDAgile methodologiesJSONScalaData visualizationData modeling

Posted 7 days ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ” Software Development

🏒 Company: Apollo.ioπŸ‘₯ 501-1000πŸ’° $100,000,000 Series D over 1 year agoSoftware Development

  • 8+ years of experience as a data platform engineer or a software engineer in data or big data engineer.
  • Experience in data modeling, data warehousing, APIs, and building data pipelines.
  • Deep knowledge of databases and data warehousing with an ability to collaborate cross-functionally.
  • Bachelor's degree in a quantitative field (Physical/Computer Science, Engineering, Mathematics, or Statistics).
  • Develop and maintain scalable data pipelines and build new integrations to support continuing increases in data volume and complexity.
  • Develop and improve Data APIs used in machine learning / AI product offerings
  • Implement automated monitoring, alerting, self-healing (restartable/graceful failures) features while building the consumption pipelines.
  • Implement processes and systems to monitor data quality, ensuring production data is always accurate and available.
  • Write unit/integration tests, contribute to the engineering wiki, and document work.
  • Define company data models and write jobs to populate data models in our data warehouse.
  • Work closely with all business units and engineering teams to develop a strategy for long-term data platform architecture.

PythonSQLApache AirflowApache HadoopCloud ComputingETLApache KafkaData engineeringFastAPIData modelingData analytics

Posted 8 days ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ’Έ 117800.0 - 214300.0 USD per year

πŸ” Software Development

🏒 Company: careers_gm

  • 7+ years of hands-on experience.
  • Bachelor's degree (or equivalent work experience) in Computer Science, Data Science, Software Engineering, or a related field.
  • Strong understanding and ability to provide mentorship in the areas of data ETL processes and tools for designing and managing data pipelines
  • Proficient with big data frameworks and tools like Apache Hadoop, Apache Spark, or Apache Kafka for processing and analyzing large datasets.
  • Hands on experience with data serialization formats like JSON, Parquet and XML
  • Consistently models and leads in best practices and optimization for scripting skills in languages like Python, Java, Scala, etc for automation and data processing.
  • Proficient with database administration and performance tuning for databases like MySQL, PostgresSQL or NoSQL databases
  • Proficient with containerization (e.g., Docker) and orchestration platforms (e.g., Kubernetes) for managing data applications.
  • Experience with cloud platforms and data services for data storage and processing
  • Consistently designs solutions and build data solutions that are highly automated, performant, with quality checks that provide data consistency and accuracy outcomes
  • Experienced at actively managing large-scale data engineering projects, including planning, resource allocation, risk management, and ensuring successful project delivery and adjust style for all delivery methods (ie: Waterfall, Agile, POD, etc)
  • Understands data governance principles, data privacy regulations, and experience implementing security measures to protect data
  • Able to integrate data engineering pipelines with machine learning models and platforms
  • Strong problem-solving skills to identify and resolve complex data engineering issues efficiently.
  • Ability to work effectively in cross-functional teams, collaborate with data scientists, analysts, and stakeholders to deliver data solutions.
  • Ability to lead and mentor junior data engineers, providing guidance and support in complex data engineering projects.
  • Influential communication skills to effectively convey technical concepts to non-technical stakeholders and document data engineering processes.
  • Models a mindset of continuous learning, staying updated with the latest advancements in data engineering technologies, and a drive for innovation.
  • Design, construct, install and maintain data architectures, including database and large-scale processing systems.
  • Develop and maintain ETL (Extract, Transform, Load) processes to collect, cleanse and transform data from various sources inclusive of cloud.
  • Design and implement data pipelines to collect, process and transfer data from various sources to storage systems (data warehouses, data lakes, etc)
  • Implement security measures to protect sensitive data and ensure compliance with data privacy regulations.
  • Build data solutions that ensure data quality, integrity and security through data validation, monitoring, and compliance with data governance policies
  • Administer and optimize databases for performance and scalability
  • Maintain Master Data, Metadata, Data Management Repositories, Logical Data Models, and Data Standards
  • Troubleshoot and resolve data-related issues affecting data quality fidelity
  • Document data architectures, processes and best practices for knowledge sharing across the GM data engineering community
  • Participate in the evaluation and selection of data related tools and technologies
  • Collaborate across other engineering functions within EDAI, Marketing Technology, and Software & Services

AWSDockerPostgreSQLPythonSQLApache HadoopCloud ComputingData AnalysisETLJavaKubernetesMySQLAlgorithmsApache KafkaData engineeringData scienceData StructuresREST APINosqlCI/CDProblem SolvingJSONScalaData visualizationData modelingScriptingData analyticsData management

Posted 13 days ago
Apply
Apply

πŸ“ United States

πŸ’Έ 204000.0 - 260000.0 USD per year

πŸ” Software Development

🏒 Company: AirbnbπŸ‘₯ 5001-10000πŸ’° Secondary Market almost 5 years agoπŸ«‚ Last layoff about 2 years agoHospitalityTravel AccommodationsPropTechMarketplaceMobile AppsTravel

  • 9+ years of experience with a BS/Masters or 6+ years with a PhD
  • Expertise in SQL and proficient in at least one data engineering language, such as Python or Scala
  • Experience with Superset and Tableau
  • Expertise in large-scale distributed data processing frameworks like Presto or Spark
  • Experience with an ETL framework like Airflow
  • Extensive knowledge of data management concepts, including data modeling, ETL processes, data warehousing, and data governance.
  • Understanding of data security and privacy principles, as well as regulatory compliance requirements (e.g., GDPR, CCPA).
  • Strong problem-solving skills and the ability to translate business requirements into technical solutions.
  • Excellent communication skills, both written and verbal, ability to distill complex ideas for technical and non-technical stakeholders
  • Strong capability to forge trusted partnerships across working teams
  • Design and implement data pipelines by leveraging best-in-class tools and infrastructure to meet critical business and product requirements.
  • Develop high quality data assets for product and AI/ML use-cases
  • Collaborate with cross-functional teams to gather requirements, assess data needs, and design efficient solutions that align with business objectives.
  • Contribute to the development of long-term data strategies and roadmaps and ML infrastructure development within the organization.
  • Influence the trajectory of data in decision making
  • Improve trust in our data by championing for data quality across the stack
  • Identify and actively work upon opportunities for automation and implement data management tools and frameworks to enhance efficiency and productivity.
  • Mentor and coach team members, providing guidance in data engineering best practices and support to enhance their skills and performance.

LeadershipPythonSQLETLMachine LearningCross-functional Team LeadershipTableauAirflowData engineeringREST APISparkCI/CDProblem SolvingExcellent communication skillsScalaData visualizationData modelingData management

Posted 17 days ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ” Software Development

🏒 Company: Life360πŸ‘₯ 251-500πŸ’° $33,038,258 Post-IPO Equity over 2 years agoπŸ«‚ Last layoff about 2 years agoAndroidFamilyAppsMobile AppsMobile

  • Minimum 7 years of experience working with high volume data infrastructure.
  • Experience with Databricks and AWS.
  • Experience with dbt.
  • Experience with job orchestration tooling like Airflow.
  • Proficient programming in Python.
  • Proficient with SQL and the ability to optimize complex queries.
  • Proficient with large-scale data processing using Spark and/or Presto/Trino.
  • Proficient in data modeling and database design.
  • Experience with streaming data with a tool like Kinesis or Kafka.
  • Experience working with high volume event based data architecture like Amplitude and Braze.
  • Experience in modern development lifecycle including Agile methodology, CI/CD, automated deployments using Terraform, GitHub Actions, etc.
  • Knowledge and proficiency in the latest open source and data frameworks, modern data platform tech stacks and tools.
  • Always learning and staying up to speed with the fast moving data world.
  • You have good communication and collaboration skills and can work independently.
  • BS in Computer Science, Software Engineering, Mathematics, or equivalent experience.
  • Design, implement, and manage scalable data processing platforms used for real-time analytics and exploratory data analysis.
  • Manage our financial data from ingestion through ETL to storage and batch processing.
  • Automate, test and harden all data workflows.
  • Architect logical and physical data models to ensure the needs of the business are met.
  • Collaborate across the data teams, engineering, data science, and analytics, to understand their needs, while applying engineering best practices.
  • Architect and develop systems and algorithms for distributed real-time analytics and data processing.
  • Implement strategies for acquiring data to develop new insights.
  • Mentor junior engineers, imparting best practices and institutionalizing efficient processes to foster growth and innovation within the team.
  • Champion data engineering best practices and institutionalizing efficient processes to foster growth and innovation within the team.

AWSProject ManagementPythonSQLApache AirflowETLKafkaAlgorithmsData engineeringData StructuresREST APISparkCommunication SkillsAnalytical SkillsCollaborationCI/CDProblem SolvingAgile methodologiesMentoringTerraformData visualizationTechnical supportData modelingData analyticsData managementDebugging

Posted 27 days ago
Apply
Apply
πŸ”₯ Staff Data Engineer
Posted about 1 month ago

πŸ“ US

🧭 Full-Time

πŸ’Έ 160000.0 - 182000.0 USD per year

πŸ” Adtech

🏒 Company: tvScientificπŸ‘₯ 11-50πŸ’° $9,400,000 Convertible Note about 1 year agoInternetAdvertising

  • 7+ years of experience in data engineering.
  • Proven experience building data infrastructure using Spark with Scala.
  • Familiarity with data lakes, cloud warehouses, and storage formats.
  • Strong proficiency in AWS services.
  • Expertise in SQL for data manipulation and extraction.
  • Bachelor's degree in Computer Science or a related field.
  • Design and implement robust data infrastructure using Spark with Scala.
  • Collaborate with our cross-functional teams to design data solutions that meet business needs.
  • Build out our core data pipelines, store data in optimal engines and formats, and feed our machine learning models.
  • Leverage and optimize AWS resources.
  • Collaborate closely with the Data Science team.

AWSSQLCloud ComputingETLMachine LearningData engineeringSparkScalaData modeling

Posted about 1 month ago
Apply
Apply
πŸ”₯ Staff Data Engineer
Posted about 2 months ago

πŸ“ United States, Canada

🧭 Full-Time

πŸ’Έ 200000.0 - 228000.0 USD per year

πŸ” Software Development

🏒 Company: LaterπŸ‘₯ 1-10Consumer ElectronicsiOSAppsSoftware

  • 10+ years of experience in data engineering, software engineering, or related fields.
  • Proven experience leading the technical strategy and execution of large-scale data platforms.
  • Expertise in cloud technologies (Google Cloud Platform, AWS, Azure) with a focus on scalable data solutions (BigQuery, Snowflake, Redshift, etc.).
  • Strong proficiency in SQL, Python, and distributed data processing frameworks (Apache Spark, Flink, Beam, etc.).
  • Extensive experience with streaming data architectures using Kafka, Flink, Pub/Sub, Kinesis, or similar technologies.
  • Expertise in data modeling, schema design, indexing, partitioning, and performance tuning for analytical workloads, including data governance (security, access control, compliance: GDPR, CCPA, SOC 2)
  • Strong experience designing and optimizing scalable, fault-tolerant data pipelines using workflow orchestration tools like Airflow, Dagster, or Dataflow.
  • Ability to lead and influence engineering teams, drive cross-functional projects, and align stakeholders towards a common data vision.
  • Experience mentoring senior and mid-level data engineers to enhance team performance and skill development.
  • Lead the design and evolution of a scalable data architecture that meets analytical, machine learning, and operational needs.
  • Architect and optimize data pipelines for batch and real-time data processing, ensuring efficiency and reliability.
  • Implement best practices for distributed data processing, ensuring scalability, performance, and cost-effectiveness of data workflows.
  • Define and enforce data governance policies, implement automated validation checks, and establish monitoring frameworks to maintain data integrity.
  • Ensure data security and compliance with industry regulations by designing appropriate access controls, encryption mechanisms, and auditing processes.
  • Drive innovation in data engineering practices by researching and implementing new technologies, tools, and methodologies.
  • Work closely with data scientists, engineers, analysts, and business stakeholders to understand data requirements and deliver impactful solutions.
  • Develop reusable frameworks, libraries, and automation tools to improve efficiency, reliability, and maintainability of data infrastructure.
  • Guide and mentor data engineers, fostering a high-performing engineering culture through best practices, peer reviews, and knowledge sharing.
  • Establish and monitor SLAs for data pipelines, proactively identifying and mitigating risks to ensure high availability and reliability.

AWSPythonSQLApache AirflowCloud ComputingData AnalysisETLGCPKafkaMachine LearningSnowflakeData engineeringData modelingData management

Posted about 2 months ago
Apply