Apply

Staff Data Engineer

Posted 6 days agoViewed

View full description

πŸ’Ž Seniority level: Staff, 8+ years

πŸ“ Location: United States

πŸ” Industry: Software Development

🏒 Company: Apollo.ioπŸ‘₯ 501-1000πŸ’° $100,000,000 Series D over 1 year agoSoftware Development

πŸ—£οΈ Languages: English

⏳ Experience: 8+ years

πŸͺ„ Skills: PythonSQLApache AirflowApache HadoopCloud ComputingETLApache KafkaData engineeringFastAPIData modelingData analytics

Requirements:
  • 8+ years of experience as a data platform engineer or a software engineer in data or big data engineer.
  • Experience in data modeling, data warehousing, APIs, and building data pipelines.
  • Deep knowledge of databases and data warehousing with an ability to collaborate cross-functionally.
  • Bachelor's degree in a quantitative field (Physical/Computer Science, Engineering, Mathematics, or Statistics).
Responsibilities:
  • Develop and maintain scalable data pipelines and build new integrations to support continuing increases in data volume and complexity.
  • Develop and improve Data APIs used in machine learning / AI product offerings
  • Implement automated monitoring, alerting, self-healing (restartable/graceful failures) features while building the consumption pipelines.
  • Implement processes and systems to monitor data quality, ensuring production data is always accurate and available.
  • Write unit/integration tests, contribute to the engineering wiki, and document work.
  • Define company data models and write jobs to populate data models in our data warehouse.
  • Work closely with all business units and engineering teams to develop a strategy for long-term data platform architecture.
Apply

Related Jobs

Apply

πŸ“ United States

🏒 Company: ge_externalsite

  • Hands-on experience in programming languages like Java, Python or Scala and experience in writing SQL scripts for Oracle, MySQL, PostgreSQL or HiveQL
  • Exposure to industry standard data modeling tools (e.g., ERWin, ER Studio, etc.).
  • Exposure to Extract, Transform & Load (ETL) tools like Informatica or Talend
  • Exposure to industry standard data catalog, automated data discovery and data lineage tools (e.g., Alation, Collibra, etc., )
  • Experience with Big Data / Hadoop / Spark / Hive / NoSQL database engines (i.e. Cassandra or HBase)
  • Exposure to unstructured datasets and ability to handle XML, JSON file formats
  • Conduct exploratory data analysis and generate visual summaries of data. Identify data quality issues proactively.
  • Developing reusable code pipelines through CI/CD.
  • Hands-on experience of big data or MPP databases.
  • Developing and executing integrated test plans.
  • Be responsible for identifying solutions for complex data analysis and data structure.
  • Be responsible for creating digital thread requirements
  • Be responsible for change management of database artifacts to support next gen QMS applications
  • Be responsible for monitoring data availability and data health of complex systems
  • Understand industry trends and stay up to date on associated Quality and tech landscape.
  • Design & build technical data dictionaries and support business glossaries to analyze the datasets
  • This role may also work on other Quality team digital and strategic deliveries that support the business.
  • Perform data profiling and data analysis for source systems, manually maintained data, machine or sensor generated data and target data repositories
  • Design & build both logical and physical data models for both Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) solutions
  • Develop and maintain data mapping specifications based on the results of data analysis and functional requirements
  • Build a variety of data loading & data transformation methods using multiple tools and technologies.
  • Design & build automated Extract, Transform & Load (ETL) jobs based on data mapping specifications
  • Manage metadata structures needed for building reusable Extract, Transform & Load (ETL) components.
  • Analyze reference datasets and familiarize with Master Data Management (MDM) tools.
  • Analyze the impact of changes to downstream systems/products and recommend alternatives to minimize the impact.
  • Derive solutions and make recommendations from deep dive data analysis proactively.
  • Design and build Data Quality (DQ) rules.
  • Drives design and implementation of the roadmap.
  • Design and develop complex code in multiple languages.
  • This role may also work on other Quality team digital and strategic deliveries that support the business.

PostgreSQLPythonSQLData AnalysisETLHadoopJavaMySQLOracleData engineeringNosqlSparkCI/CDAgile methodologiesJSONScalaData visualizationData modeling

Posted 6 days ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ’Έ 117800.0 - 214300.0 USD per year

πŸ” Software Development

🏒 Company: careers_gm

  • 7+ years of hands-on experience.
  • Bachelor's degree (or equivalent work experience) in Computer Science, Data Science, Software Engineering, or a related field.
  • Strong understanding and ability to provide mentorship in the areas of data ETL processes and tools for designing and managing data pipelines
  • Proficient with big data frameworks and tools like Apache Hadoop, Apache Spark, or Apache Kafka for processing and analyzing large datasets.
  • Hands on experience with data serialization formats like JSON, Parquet and XML
  • Consistently models and leads in best practices and optimization for scripting skills in languages like Python, Java, Scala, etc for automation and data processing.
  • Proficient with database administration and performance tuning for databases like MySQL, PostgresSQL or NoSQL databases
  • Proficient with containerization (e.g., Docker) and orchestration platforms (e.g., Kubernetes) for managing data applications.
  • Experience with cloud platforms and data services for data storage and processing
  • Consistently designs solutions and build data solutions that are highly automated, performant, with quality checks that provide data consistency and accuracy outcomes
  • Experienced at actively managing large-scale data engineering projects, including planning, resource allocation, risk management, and ensuring successful project delivery and adjust style for all delivery methods (ie: Waterfall, Agile, POD, etc)
  • Understands data governance principles, data privacy regulations, and experience implementing security measures to protect data
  • Able to integrate data engineering pipelines with machine learning models and platforms
  • Strong problem-solving skills to identify and resolve complex data engineering issues efficiently.
  • Ability to work effectively in cross-functional teams, collaborate with data scientists, analysts, and stakeholders to deliver data solutions.
  • Ability to lead and mentor junior data engineers, providing guidance and support in complex data engineering projects.
  • Influential communication skills to effectively convey technical concepts to non-technical stakeholders and document data engineering processes.
  • Models a mindset of continuous learning, staying updated with the latest advancements in data engineering technologies, and a drive for innovation.
  • Design, construct, install and maintain data architectures, including database and large-scale processing systems.
  • Develop and maintain ETL (Extract, Transform, Load) processes to collect, cleanse and transform data from various sources inclusive of cloud.
  • Design and implement data pipelines to collect, process and transfer data from various sources to storage systems (data warehouses, data lakes, etc)
  • Implement security measures to protect sensitive data and ensure compliance with data privacy regulations.
  • Build data solutions that ensure data quality, integrity and security through data validation, monitoring, and compliance with data governance policies
  • Administer and optimize databases for performance and scalability
  • Maintain Master Data, Metadata, Data Management Repositories, Logical Data Models, and Data Standards
  • Troubleshoot and resolve data-related issues affecting data quality fidelity
  • Document data architectures, processes and best practices for knowledge sharing across the GM data engineering community
  • Participate in the evaluation and selection of data related tools and technologies
  • Collaborate across other engineering functions within EDAI, Marketing Technology, and Software & Services

AWSDockerPostgreSQLPythonSQLApache HadoopCloud ComputingData AnalysisETLJavaKubernetesMySQLAlgorithmsApache KafkaData engineeringData scienceData StructuresREST APINosqlCI/CDProblem SolvingJSONScalaData visualizationData modelingScriptingData analyticsData management

Posted 12 days ago
Apply
Apply

πŸ“ United States

πŸ’Έ 204000.0 - 260000.0 USD per year

πŸ” Software Development

🏒 Company: AirbnbπŸ‘₯ 5001-10000πŸ’° Secondary Market almost 5 years agoπŸ«‚ Last layoff about 2 years agoHospitalityTravel AccommodationsPropTechMarketplaceMobile AppsTravel

  • 9+ years of experience with a BS/Masters or 6+ years with a PhD
  • Expertise in SQL and proficient in at least one data engineering language, such as Python or Scala
  • Experience with Superset and Tableau
  • Expertise in large-scale distributed data processing frameworks like Presto or Spark
  • Experience with an ETL framework like Airflow
  • Extensive knowledge of data management concepts, including data modeling, ETL processes, data warehousing, and data governance.
  • Understanding of data security and privacy principles, as well as regulatory compliance requirements (e.g., GDPR, CCPA).
  • Strong problem-solving skills and the ability to translate business requirements into technical solutions.
  • Excellent communication skills, both written and verbal, ability to distill complex ideas for technical and non-technical stakeholders
  • Strong capability to forge trusted partnerships across working teams
  • Design and implement data pipelines by leveraging best-in-class tools and infrastructure to meet critical business and product requirements.
  • Develop high quality data assets for product and AI/ML use-cases
  • Collaborate with cross-functional teams to gather requirements, assess data needs, and design efficient solutions that align with business objectives.
  • Contribute to the development of long-term data strategies and roadmaps and ML infrastructure development within the organization.
  • Influence the trajectory of data in decision making
  • Improve trust in our data by championing for data quality across the stack
  • Identify and actively work upon opportunities for automation and implement data management tools and frameworks to enhance efficiency and productivity.
  • Mentor and coach team members, providing guidance in data engineering best practices and support to enhance their skills and performance.

LeadershipPythonSQLETLMachine LearningCross-functional Team LeadershipTableauAirflowData engineeringREST APISparkCI/CDProblem SolvingExcellent communication skillsScalaData visualizationData modelingData management

Posted 16 days ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ” Software Development

🏒 Company: Life360πŸ‘₯ 251-500πŸ’° $33,038,258 Post-IPO Equity over 2 years agoπŸ«‚ Last layoff about 2 years agoAndroidFamilyAppsMobile AppsMobile

  • Minimum 7 years of experience working with high volume data infrastructure.
  • Experience with Databricks and AWS.
  • Experience with dbt.
  • Experience with job orchestration tooling like Airflow.
  • Proficient programming in Python.
  • Proficient with SQL and the ability to optimize complex queries.
  • Proficient with large-scale data processing using Spark and/or Presto/Trino.
  • Proficient in data modeling and database design.
  • Experience with streaming data with a tool like Kinesis or Kafka.
  • Experience working with high volume event based data architecture like Amplitude and Braze.
  • Experience in modern development lifecycle including Agile methodology, CI/CD, automated deployments using Terraform, GitHub Actions, etc.
  • Knowledge and proficiency in the latest open source and data frameworks, modern data platform tech stacks and tools.
  • Always learning and staying up to speed with the fast moving data world.
  • You have good communication and collaboration skills and can work independently.
  • BS in Computer Science, Software Engineering, Mathematics, or equivalent experience.
  • Design, implement, and manage scalable data processing platforms used for real-time analytics and exploratory data analysis.
  • Manage our financial data from ingestion through ETL to storage and batch processing.
  • Automate, test and harden all data workflows.
  • Architect logical and physical data models to ensure the needs of the business are met.
  • Collaborate across the data teams, engineering, data science, and analytics, to understand their needs, while applying engineering best practices.
  • Architect and develop systems and algorithms for distributed real-time analytics and data processing.
  • Implement strategies for acquiring data to develop new insights.
  • Mentor junior engineers, imparting best practices and institutionalizing efficient processes to foster growth and innovation within the team.
  • Champion data engineering best practices and institutionalizing efficient processes to foster growth and innovation within the team.

AWSProject ManagementPythonSQLApache AirflowETLKafkaAlgorithmsData engineeringData StructuresREST APISparkCommunication SkillsAnalytical SkillsCollaborationCI/CDProblem SolvingAgile methodologiesMentoringTerraformData visualizationTechnical supportData modelingData analyticsData managementDebugging

Posted 26 days ago
Apply
Apply
πŸ”₯ Staff Data Engineer
Posted about 2 months ago

πŸ“ United States, Canada

🧭 Full-Time

πŸ’Έ 200000.0 - 228000.0 USD per year

πŸ” Software Development

🏒 Company: LaterπŸ‘₯ 1-10Consumer ElectronicsiOSAppsSoftware

  • 10+ years of experience in data engineering, software engineering, or related fields.
  • Proven experience leading the technical strategy and execution of large-scale data platforms.
  • Expertise in cloud technologies (Google Cloud Platform, AWS, Azure) with a focus on scalable data solutions (BigQuery, Snowflake, Redshift, etc.).
  • Strong proficiency in SQL, Python, and distributed data processing frameworks (Apache Spark, Flink, Beam, etc.).
  • Extensive experience with streaming data architectures using Kafka, Flink, Pub/Sub, Kinesis, or similar technologies.
  • Expertise in data modeling, schema design, indexing, partitioning, and performance tuning for analytical workloads, including data governance (security, access control, compliance: GDPR, CCPA, SOC 2)
  • Strong experience designing and optimizing scalable, fault-tolerant data pipelines using workflow orchestration tools like Airflow, Dagster, or Dataflow.
  • Ability to lead and influence engineering teams, drive cross-functional projects, and align stakeholders towards a common data vision.
  • Experience mentoring senior and mid-level data engineers to enhance team performance and skill development.
  • Lead the design and evolution of a scalable data architecture that meets analytical, machine learning, and operational needs.
  • Architect and optimize data pipelines for batch and real-time data processing, ensuring efficiency and reliability.
  • Implement best practices for distributed data processing, ensuring scalability, performance, and cost-effectiveness of data workflows.
  • Define and enforce data governance policies, implement automated validation checks, and establish monitoring frameworks to maintain data integrity.
  • Ensure data security and compliance with industry regulations by designing appropriate access controls, encryption mechanisms, and auditing processes.
  • Drive innovation in data engineering practices by researching and implementing new technologies, tools, and methodologies.
  • Work closely with data scientists, engineers, analysts, and business stakeholders to understand data requirements and deliver impactful solutions.
  • Develop reusable frameworks, libraries, and automation tools to improve efficiency, reliability, and maintainability of data infrastructure.
  • Guide and mentor data engineers, fostering a high-performing engineering culture through best practices, peer reviews, and knowledge sharing.
  • Establish and monitor SLAs for data pipelines, proactively identifying and mitigating risks to ensure high availability and reliability.

AWSPythonSQLApache AirflowCloud ComputingData AnalysisETLGCPKafkaMachine LearningSnowflakeData engineeringData modelingData management

Posted about 2 months ago
Apply
Apply
πŸ”₯ Staff Data Engineer
Posted about 2 months ago

πŸ“ United States

🧭 Full-Time

πŸ’Έ 85500.0 - 117500.0 USD per year

πŸ” Software Development

  • 5+ years of work experience as a data engineer/full stack engineering, coding in Python.
  • 5+ years of experience building web scraping tools in python, using Beautiful Soup, Scrapy, Selenium, or similar tooling
  • 3-5 years of deployment experience with CI/CD
  • Strong experience of HTML, CSS, JavaScript, and browser behavior.
  • Experience with RESTful APIs and JSON/XML data formats.
  • Knowledge of cloud platforms and containerization technologies (e.g., Docker, Kubernetes).
  • Advanced understanding of how at least one big data processing technology works under the hood (e.g. Spark / Hadoop / HDFS / Redshift / BigQuery / Snowflake)
  • Use modern tooling to build robust, extensible, and performant web scraping platform
  • Build thoughtful and reliable data acquisition and integration solutions to meet business requirements and data sourcing needs.
  • Deliver best in class infrastructure solutions for flexible and repeatable applications across disparate sources.
  • Troubleshoot, improve and scale existing data pipelines, models and solutions
  • Build upon data engineering's CI/CD deployments, and infrastructure-as-code for provisioning AWS and 3rd party (Apify) services.

AWSBackend DevelopmentPostgreSQLPythonSQLApache AirflowETLData engineeringREST APINodeJSSoftware EngineeringData analytics

Posted about 2 months ago
Apply
Apply
πŸ”₯ Staff Data Engineer
Posted about 2 months ago

πŸ“ United States

πŸ’Έ 131414.0 - 197100.0 USD per year

πŸ” Mental healthcare

🏒 Company: HeadspaceπŸ‘₯ 11-50WellnessHealth CareChild Care

  • 10+ years of success in enterprise data solutions and high-impact initiatives.
  • Expertise in platforms like Databricks, Snowflake, dbt, and Redshift.
  • Experience designing and optimizing real-time and batch ETL pipelines.
  • Demonstrated leadership and mentorship abilities in engineering.
  • Strong collaboration skills with product and analytics stakeholders.
  • Bachelor’s or advanced degree in Computer Science, Engineering, or a related field.
  • Drive the architecture and implementation of pySpark data pipelines.
  • Create and enforce design patterns in code and schema.
  • Design and lead secure and compliant data warehousing platforms.
  • Partner with analytics and product leaders for actionable insights.
  • Mentor team members on dbt architecture and foster a data-first culture.
  • Act as a thought leader on data strategy and cross-functional roadmaps.

SQLCloud ComputingETLSnowflakeData engineeringData modelingData analytics

Posted about 2 months ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ’Έ 170000.0 - 195000.0 USD per year

πŸ” Healthcare

🏒 Company: Parachute HealthπŸ‘₯ 101-250πŸ’° $1,000 over 5 years agoMedicalHealth CareSoftware

  • 5+ years of relevant experience.
  • Experience in Data Engineering with Python.
  • Experience building customer-facing software.
  • Strong listening and communication skills.
  • Time management and organizational skills.
  • Proactive, a driven self-starter who can work independently or as part of a team.
  • Ability to think with the 'big picture' in mind.
  • Passionate about improving patient outcomes in the healthcare space.
  • Architect solutions to integrate and manage large volumes of data across various internal and external systems.
  • Establish best practices and data governance standards to ensure that data infrastructure is built for long-term scalability.
  • Build and maintain a reporting product for external customers that visualizes data and provides tabular reports.
  • Collaborate across the organization to assess data engineering needs.

PythonETLAirflowData engineeringData visualization

Posted 2 months ago
Apply
Apply

πŸ“ US, Ontario, CAN

πŸ” Food waste reduction and grocery technology

🏒 Company: AfreshπŸ‘₯ 51-100πŸ’° $115,000,000 Series B over 2 years agoArtificial Intelligence (AI)LogisticsFood and BeverageMachine LearningAgricultureSupply Chain ManagementSoftware

  • Significant experience designing and maintaining ETLs that process large-scale datasets.
  • Proficiency with Python, PySpark, SQL, and experience with tools like Databricks, Snowflake, or DBT.
  • Strong problem-solving skills with ambiguous requirements.
  • Focus on practical outcomes balancing technical rigor and execution.
  • Experience with complex, unclean datasets and innovative processing methods.
  • Identifying areas for tooling or automation to simplify workflows.
  • Excellent communication skills for technical presentation.
  • Proven leadership in technical projects with mentoring ability.
  • Build tools and frameworks that streamline customer integrations.
  • Create robust ETLs in PySpark and DBT to process billions of records.
  • Collaborate with teams to design and deliver data solutions for new products.
  • Identify optimizations to improve ETL runtime and scalability.
  • Solve data quality challenges with messy datasets.
  • Investigate and implement new technologies into the data platform.
  • Support team members by mentoring and leading technical discussions.

PythonSQLETLData engineeringData management

Posted 3 months ago
Apply
Apply

πŸ“ United States

πŸ” Cyber security

🏒 Company: BeyondTrustπŸ‘₯ 1001-5000πŸ’° Private almost 4 years agoCloud ComputingSecurityCloud SecurityCyber SecuritySoftware

  • Strong programming and technology knowledge in cloud data processing.
  • Previous experience working in matured data lakes.
  • Strong data modelling skills for analytical workloads.
  • Spark (or equivalent parallel processing framework) experience is needed; existing Databricks knowledge is a plus.
  • Interest and aptitude for cybersecurity; interest in identity security is highly preferred.
  • Technical understanding of underlying systems and computation minutiae.
  • Experience working with distributed systems and data processing on object stores.
  • Ability to work autonomously.
  • Optimize data workloads at a software level by improving processing efficiency.
  • Develop new data processing routes to remove redundancy or reduce transformation overhead.
  • Monitor and maintain existing data workflows.
  • Use observability best practices to ensure pipeline performance.
  • Perform complex transformations on both real time and batch data assets.
  • Create new ML/Engineering solutions to tackle existing issues in the cybersecurity space.
  • Leverage CI/CD best practices to effectively develop and release source code.

PythonSparkCI/CDData modeling

Posted 3 months ago
Apply