Apply

Staff Data Engineer

Posted 3 months agoViewed

View full description

πŸ’Ž Seniority level: Staff, Significant experience designing and maintaining ETLs that process large-scale datasets.

πŸ“ Location: US, Ontario, CAN

πŸ” Industry: Food waste reduction and grocery technology

🏒 Company: AfreshπŸ‘₯ 51-100πŸ’° $115,000,000 Series B over 2 years agoArtificial Intelligence (AI)LogisticsFood and BeverageMachine LearningAgricultureSupply Chain ManagementSoftware

πŸ—£οΈ Languages: English

⏳ Experience: Significant experience designing and maintaining ETLs that process large-scale datasets.

πŸͺ„ Skills: PythonSQLETLData engineeringData management

Requirements:
  • Significant experience designing and maintaining ETLs that process large-scale datasets.
  • Proficiency with Python, PySpark, SQL, and experience with tools like Databricks, Snowflake, or DBT.
  • Strong problem-solving skills with ambiguous requirements.
  • Focus on practical outcomes balancing technical rigor and execution.
  • Experience with complex, unclean datasets and innovative processing methods.
  • Identifying areas for tooling or automation to simplify workflows.
  • Excellent communication skills for technical presentation.
  • Proven leadership in technical projects with mentoring ability.
Responsibilities:
  • Build tools and frameworks that streamline customer integrations.
  • Create robust ETLs in PySpark and DBT to process billions of records.
  • Collaborate with teams to design and deliver data solutions for new products.
  • Identify optimizations to improve ETL runtime and scalability.
  • Solve data quality challenges with messy datasets.
  • Investigate and implement new technologies into the data platform.
  • Support team members by mentoring and leading technical discussions.
Apply

Related Jobs

Apply

πŸ“ AL, AR, AZ, CA (exempt only), CO, CT, FL, GA, ID, IL, IN, IA, KS, KY, MA, ME, MD, MI, MN, MO, MT, NC, NE, NJ, NM, NV, NY, OH, OK, OR, PA, SC, SD, TN, TX, UT, VT, VA, WA, WI

🧭 Full-Time

πŸ” Insurance

🏒 Company: Kin Insurance

  • Depth of experience in modern big data environments.
  • Advanced knowledge and experience with SQL and Python is expected.
  • Insurance domain knowledge.
  • Design and develop data pipelines and modeling raw data for downstream ingestion.
  • Mentor and guide data engineers on your team and across the organization, while collaborating with other engineers, product managers, analysts, and stakeholders.
  • Lead a cross-functional project team with members from AppEng, DataEng, BI, and business stakeholders.

AWSPythonSQLApache AirflowETLCross-functional Team LeadershipData engineeringProblem SolvingMentoringDocumentationComplianceData visualizationData modelingData management

Posted about 6 hours ago
Apply
Apply

πŸ“ United States

🏒 Company: ge_externalsite

  • Hands-on experience in programming languages like Java, Python or Scala and experience in writing SQL scripts for Oracle, MySQL, PostgreSQL or HiveQL
  • Exposure to industry standard data modeling tools (e.g., ERWin, ER Studio, etc.).
  • Exposure to Extract, Transform & Load (ETL) tools like Informatica or Talend
  • Exposure to industry standard data catalog, automated data discovery and data lineage tools (e.g., Alation, Collibra, etc., )
  • Experience with Big Data / Hadoop / Spark / Hive / NoSQL database engines (i.e. Cassandra or HBase)
  • Exposure to unstructured datasets and ability to handle XML, JSON file formats
  • Conduct exploratory data analysis and generate visual summaries of data. Identify data quality issues proactively.
  • Developing reusable code pipelines through CI/CD.
  • Hands-on experience of big data or MPP databases.
  • Developing and executing integrated test plans.
  • Be responsible for identifying solutions for complex data analysis and data structure.
  • Be responsible for creating digital thread requirements
  • Be responsible for change management of database artifacts to support next gen QMS applications
  • Be responsible for monitoring data availability and data health of complex systems
  • Understand industry trends and stay up to date on associated Quality and tech landscape.
  • Design & build technical data dictionaries and support business glossaries to analyze the datasets
  • This role may also work on other Quality team digital and strategic deliveries that support the business.
  • Perform data profiling and data analysis for source systems, manually maintained data, machine or sensor generated data and target data repositories
  • Design & build both logical and physical data models for both Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) solutions
  • Develop and maintain data mapping specifications based on the results of data analysis and functional requirements
  • Build a variety of data loading & data transformation methods using multiple tools and technologies.
  • Design & build automated Extract, Transform & Load (ETL) jobs based on data mapping specifications
  • Manage metadata structures needed for building reusable Extract, Transform & Load (ETL) components.
  • Analyze reference datasets and familiarize with Master Data Management (MDM) tools.
  • Analyze the impact of changes to downstream systems/products and recommend alternatives to minimize the impact.
  • Derive solutions and make recommendations from deep dive data analysis proactively.
  • Design and build Data Quality (DQ) rules.
  • Drives design and implementation of the roadmap.
  • Design and develop complex code in multiple languages.
  • This role may also work on other Quality team digital and strategic deliveries that support the business.

PostgreSQLPythonSQLData AnalysisETLHadoopJavaMySQLOracleData engineeringNosqlSparkCI/CDAgile methodologiesJSONScalaData visualizationData modeling

Posted 6 days ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ” Software Development

🏒 Company: Apollo.ioπŸ‘₯ 501-1000πŸ’° $100,000,000 Series D over 1 year agoSoftware Development

  • 8+ years of experience as a data platform engineer or a software engineer in data or big data engineer.
  • Experience in data modeling, data warehousing, APIs, and building data pipelines.
  • Deep knowledge of databases and data warehousing with an ability to collaborate cross-functionally.
  • Bachelor's degree in a quantitative field (Physical/Computer Science, Engineering, Mathematics, or Statistics).
  • Develop and maintain scalable data pipelines and build new integrations to support continuing increases in data volume and complexity.
  • Develop and improve Data APIs used in machine learning / AI product offerings
  • Implement automated monitoring, alerting, self-healing (restartable/graceful failures) features while building the consumption pipelines.
  • Implement processes and systems to monitor data quality, ensuring production data is always accurate and available.
  • Write unit/integration tests, contribute to the engineering wiki, and document work.
  • Define company data models and write jobs to populate data models in our data warehouse.
  • Work closely with all business units and engineering teams to develop a strategy for long-term data platform architecture.

PythonSQLApache AirflowApache HadoopCloud ComputingETLApache KafkaData engineeringFastAPIData modelingData analytics

Posted 7 days ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ’Έ 117800.0 - 214300.0 USD per year

πŸ” Software Development

🏒 Company: careers_gm

  • 7+ years of hands-on experience.
  • Bachelor's degree (or equivalent work experience) in Computer Science, Data Science, Software Engineering, or a related field.
  • Strong understanding and ability to provide mentorship in the areas of data ETL processes and tools for designing and managing data pipelines
  • Proficient with big data frameworks and tools like Apache Hadoop, Apache Spark, or Apache Kafka for processing and analyzing large datasets.
  • Hands on experience with data serialization formats like JSON, Parquet and XML
  • Consistently models and leads in best practices and optimization for scripting skills in languages like Python, Java, Scala, etc for automation and data processing.
  • Proficient with database administration and performance tuning for databases like MySQL, PostgresSQL or NoSQL databases
  • Proficient with containerization (e.g., Docker) and orchestration platforms (e.g., Kubernetes) for managing data applications.
  • Experience with cloud platforms and data services for data storage and processing
  • Consistently designs solutions and build data solutions that are highly automated, performant, with quality checks that provide data consistency and accuracy outcomes
  • Experienced at actively managing large-scale data engineering projects, including planning, resource allocation, risk management, and ensuring successful project delivery and adjust style for all delivery methods (ie: Waterfall, Agile, POD, etc)
  • Understands data governance principles, data privacy regulations, and experience implementing security measures to protect data
  • Able to integrate data engineering pipelines with machine learning models and platforms
  • Strong problem-solving skills to identify and resolve complex data engineering issues efficiently.
  • Ability to work effectively in cross-functional teams, collaborate with data scientists, analysts, and stakeholders to deliver data solutions.
  • Ability to lead and mentor junior data engineers, providing guidance and support in complex data engineering projects.
  • Influential communication skills to effectively convey technical concepts to non-technical stakeholders and document data engineering processes.
  • Models a mindset of continuous learning, staying updated with the latest advancements in data engineering technologies, and a drive for innovation.
  • Design, construct, install and maintain data architectures, including database and large-scale processing systems.
  • Develop and maintain ETL (Extract, Transform, Load) processes to collect, cleanse and transform data from various sources inclusive of cloud.
  • Design and implement data pipelines to collect, process and transfer data from various sources to storage systems (data warehouses, data lakes, etc)
  • Implement security measures to protect sensitive data and ensure compliance with data privacy regulations.
  • Build data solutions that ensure data quality, integrity and security through data validation, monitoring, and compliance with data governance policies
  • Administer and optimize databases for performance and scalability
  • Maintain Master Data, Metadata, Data Management Repositories, Logical Data Models, and Data Standards
  • Troubleshoot and resolve data-related issues affecting data quality fidelity
  • Document data architectures, processes and best practices for knowledge sharing across the GM data engineering community
  • Participate in the evaluation and selection of data related tools and technologies
  • Collaborate across other engineering functions within EDAI, Marketing Technology, and Software & Services

AWSDockerPostgreSQLPythonSQLApache HadoopCloud ComputingData AnalysisETLJavaKubernetesMySQLAlgorithmsApache KafkaData engineeringData scienceData StructuresREST APINosqlCI/CDProblem SolvingJSONScalaData visualizationData modelingScriptingData analyticsData management

Posted 13 days ago
Apply
Apply

πŸ“ United States

πŸ’Έ 204000.0 - 260000.0 USD per year

πŸ” Software Development

🏒 Company: AirbnbπŸ‘₯ 5001-10000πŸ’° Secondary Market almost 5 years agoπŸ«‚ Last layoff about 2 years agoHospitalityTravel AccommodationsPropTechMarketplaceMobile AppsTravel

  • 9+ years of experience with a BS/Masters or 6+ years with a PhD
  • Expertise in SQL and proficient in at least one data engineering language, such as Python or Scala
  • Experience with Superset and Tableau
  • Expertise in large-scale distributed data processing frameworks like Presto or Spark
  • Experience with an ETL framework like Airflow
  • Extensive knowledge of data management concepts, including data modeling, ETL processes, data warehousing, and data governance.
  • Understanding of data security and privacy principles, as well as regulatory compliance requirements (e.g., GDPR, CCPA).
  • Strong problem-solving skills and the ability to translate business requirements into technical solutions.
  • Excellent communication skills, both written and verbal, ability to distill complex ideas for technical and non-technical stakeholders
  • Strong capability to forge trusted partnerships across working teams
  • Design and implement data pipelines by leveraging best-in-class tools and infrastructure to meet critical business and product requirements.
  • Develop high quality data assets for product and AI/ML use-cases
  • Collaborate with cross-functional teams to gather requirements, assess data needs, and design efficient solutions that align with business objectives.
  • Contribute to the development of long-term data strategies and roadmaps and ML infrastructure development within the organization.
  • Influence the trajectory of data in decision making
  • Improve trust in our data by championing for data quality across the stack
  • Identify and actively work upon opportunities for automation and implement data management tools and frameworks to enhance efficiency and productivity.
  • Mentor and coach team members, providing guidance in data engineering best practices and support to enhance their skills and performance.

LeadershipPythonSQLETLMachine LearningCross-functional Team LeadershipTableauAirflowData engineeringREST APISparkCI/CDProblem SolvingExcellent communication skillsScalaData visualizationData modelingData management

Posted 16 days ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ” Software Development

🏒 Company: Life360πŸ‘₯ 251-500πŸ’° $33,038,258 Post-IPO Equity over 2 years agoπŸ«‚ Last layoff about 2 years agoAndroidFamilyAppsMobile AppsMobile

  • Minimum 7 years of experience working with high volume data infrastructure.
  • Experience with Databricks and AWS.
  • Experience with dbt.
  • Experience with job orchestration tooling like Airflow.
  • Proficient programming in Python.
  • Proficient with SQL and the ability to optimize complex queries.
  • Proficient with large-scale data processing using Spark and/or Presto/Trino.
  • Proficient in data modeling and database design.
  • Experience with streaming data with a tool like Kinesis or Kafka.
  • Experience working with high volume event based data architecture like Amplitude and Braze.
  • Experience in modern development lifecycle including Agile methodology, CI/CD, automated deployments using Terraform, GitHub Actions, etc.
  • Knowledge and proficiency in the latest open source and data frameworks, modern data platform tech stacks and tools.
  • Always learning and staying up to speed with the fast moving data world.
  • You have good communication and collaboration skills and can work independently.
  • BS in Computer Science, Software Engineering, Mathematics, or equivalent experience.
  • Design, implement, and manage scalable data processing platforms used for real-time analytics and exploratory data analysis.
  • Manage our financial data from ingestion through ETL to storage and batch processing.
  • Automate, test and harden all data workflows.
  • Architect logical and physical data models to ensure the needs of the business are met.
  • Collaborate across the data teams, engineering, data science, and analytics, to understand their needs, while applying engineering best practices.
  • Architect and develop systems and algorithms for distributed real-time analytics and data processing.
  • Implement strategies for acquiring data to develop new insights.
  • Mentor junior engineers, imparting best practices and institutionalizing efficient processes to foster growth and innovation within the team.
  • Champion data engineering best practices and institutionalizing efficient processes to foster growth and innovation within the team.

AWSProject ManagementPythonSQLApache AirflowETLKafkaAlgorithmsData engineeringData StructuresREST APISparkCommunication SkillsAnalytical SkillsCollaborationCI/CDProblem SolvingAgile methodologiesMentoringTerraformData visualizationTechnical supportData modelingData analyticsData managementDebugging

Posted 27 days ago
Apply
Apply
πŸ”₯ Staff Data Engineer
Posted about 2 months ago

πŸ“ United States, Canada

🧭 Full-Time

πŸ’Έ 200000.0 - 228000.0 USD per year

πŸ” Software Development

🏒 Company: LaterπŸ‘₯ 1-10Consumer ElectronicsiOSAppsSoftware

  • 10+ years of experience in data engineering, software engineering, or related fields.
  • Proven experience leading the technical strategy and execution of large-scale data platforms.
  • Expertise in cloud technologies (Google Cloud Platform, AWS, Azure) with a focus on scalable data solutions (BigQuery, Snowflake, Redshift, etc.).
  • Strong proficiency in SQL, Python, and distributed data processing frameworks (Apache Spark, Flink, Beam, etc.).
  • Extensive experience with streaming data architectures using Kafka, Flink, Pub/Sub, Kinesis, or similar technologies.
  • Expertise in data modeling, schema design, indexing, partitioning, and performance tuning for analytical workloads, including data governance (security, access control, compliance: GDPR, CCPA, SOC 2)
  • Strong experience designing and optimizing scalable, fault-tolerant data pipelines using workflow orchestration tools like Airflow, Dagster, or Dataflow.
  • Ability to lead and influence engineering teams, drive cross-functional projects, and align stakeholders towards a common data vision.
  • Experience mentoring senior and mid-level data engineers to enhance team performance and skill development.
  • Lead the design and evolution of a scalable data architecture that meets analytical, machine learning, and operational needs.
  • Architect and optimize data pipelines for batch and real-time data processing, ensuring efficiency and reliability.
  • Implement best practices for distributed data processing, ensuring scalability, performance, and cost-effectiveness of data workflows.
  • Define and enforce data governance policies, implement automated validation checks, and establish monitoring frameworks to maintain data integrity.
  • Ensure data security and compliance with industry regulations by designing appropriate access controls, encryption mechanisms, and auditing processes.
  • Drive innovation in data engineering practices by researching and implementing new technologies, tools, and methodologies.
  • Work closely with data scientists, engineers, analysts, and business stakeholders to understand data requirements and deliver impactful solutions.
  • Develop reusable frameworks, libraries, and automation tools to improve efficiency, reliability, and maintainability of data infrastructure.
  • Guide and mentor data engineers, fostering a high-performing engineering culture through best practices, peer reviews, and knowledge sharing.
  • Establish and monitor SLAs for data pipelines, proactively identifying and mitigating risks to ensure high availability and reliability.

AWSPythonSQLApache AirflowCloud ComputingData AnalysisETLGCPKafkaMachine LearningSnowflakeData engineeringData modelingData management

Posted about 2 months ago
Apply
Apply
πŸ”₯ Staff Data Engineer
Posted about 2 months ago

πŸ“ United States

🧭 Full-Time

πŸ’Έ 85500.0 - 117500.0 USD per year

πŸ” Software Development

  • 5+ years of work experience as a data engineer/full stack engineering, coding in Python.
  • 5+ years of experience building web scraping tools in python, using Beautiful Soup, Scrapy, Selenium, or similar tooling
  • 3-5 years of deployment experience with CI/CD
  • Strong experience of HTML, CSS, JavaScript, and browser behavior.
  • Experience with RESTful APIs and JSON/XML data formats.
  • Knowledge of cloud platforms and containerization technologies (e.g., Docker, Kubernetes).
  • Advanced understanding of how at least one big data processing technology works under the hood (e.g. Spark / Hadoop / HDFS / Redshift / BigQuery / Snowflake)
  • Use modern tooling to build robust, extensible, and performant web scraping platform
  • Build thoughtful and reliable data acquisition and integration solutions to meet business requirements and data sourcing needs.
  • Deliver best in class infrastructure solutions for flexible and repeatable applications across disparate sources.
  • Troubleshoot, improve and scale existing data pipelines, models and solutions
  • Build upon data engineering's CI/CD deployments, and infrastructure-as-code for provisioning AWS and 3rd party (Apify) services.

AWSBackend DevelopmentPostgreSQLPythonSQLApache AirflowETLData engineeringREST APINodeJSSoftware EngineeringData analytics

Posted about 2 months ago
Apply
Apply
πŸ”₯ Staff Data Engineer
Posted about 2 months ago

πŸ“ United States

πŸ’Έ 131414.0 - 197100.0 USD per year

πŸ” Mental healthcare

🏒 Company: HeadspaceπŸ‘₯ 11-50WellnessHealth CareChild Care

  • 10+ years of success in enterprise data solutions and high-impact initiatives.
  • Expertise in platforms like Databricks, Snowflake, dbt, and Redshift.
  • Experience designing and optimizing real-time and batch ETL pipelines.
  • Demonstrated leadership and mentorship abilities in engineering.
  • Strong collaboration skills with product and analytics stakeholders.
  • Bachelor’s or advanced degree in Computer Science, Engineering, or a related field.
  • Drive the architecture and implementation of pySpark data pipelines.
  • Create and enforce design patterns in code and schema.
  • Design and lead secure and compliant data warehousing platforms.
  • Partner with analytics and product leaders for actionable insights.
  • Mentor team members on dbt architecture and foster a data-first culture.
  • Act as a thought leader on data strategy and cross-functional roadmaps.

SQLCloud ComputingETLSnowflakeData engineeringData modelingData analytics

Posted about 2 months ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ’Έ 170000.0 - 195000.0 USD per year

πŸ” Healthcare

🏒 Company: Parachute HealthπŸ‘₯ 101-250πŸ’° $1,000 over 5 years agoMedicalHealth CareSoftware

  • 5+ years of relevant experience.
  • Experience in Data Engineering with Python.
  • Experience building customer-facing software.
  • Strong listening and communication skills.
  • Time management and organizational skills.
  • Proactive, a driven self-starter who can work independently or as part of a team.
  • Ability to think with the 'big picture' in mind.
  • Passionate about improving patient outcomes in the healthcare space.
  • Architect solutions to integrate and manage large volumes of data across various internal and external systems.
  • Establish best practices and data governance standards to ensure that data infrastructure is built for long-term scalability.
  • Build and maintain a reporting product for external customers that visualizes data and provides tabular reports.
  • Collaborate across the organization to assess data engineering needs.

PythonETLAirflowData engineeringData visualization

Posted 2 months ago
Apply