Staff Data Engineer

Posted 6 days agoViewed

View full description

💎 Seniority level: Staff, 8+ years

📍 Location: United States

🔍 Industry: Software Development

🏢 Company: Apollo.io👥 501-1000💰 $100,000,000 Series D over 1 year agoSoftware Development

🗣️ Languages: English

⏳ Experience: 8+ years

🪄 Skills: PythonSQLApache AirflowApache HadoopCloud ComputingETLApache KafkaData engineeringFastAPIData modelingData analytics

Requirements:

8+ years of experience as a data platform engineer or a software engineer in data or big data engineer.
Experience in data modeling, data warehousing, APIs, and building data pipelines.
Deep knowledge of databases and data warehousing with an ability to collaborate cross-functionally.
Bachelor's degree in a quantitative field (Physical/Computer Science, Engineering, Mathematics, or Statistics).

Responsibilities:

Develop and maintain scalable data pipelines and build new integrations to support continuing increases in data volume and complexity.
Develop and improve Data APIs used in machine learning / AI product offerings
Implement automated monitoring, alerting, self-healing (restartable/graceful failures) features while building the consumption pipelines.
Implement processes and systems to monitor data quality, ensuring production data is always accurate and available.
Write unit/integration tests, contribute to the engineering wiki, and document work.
Define company data models and write jobs to populate data models in our data warehouse.
Work closely with all business units and engineering teams to develop a strategy for long-term data platform architecture.

Apply

Related Jobs

Apply

🔥 Staff Data Engineer

Posted 6 days ago

📍 United States

🏢 Company: ge_externalsite

🔧 Requirements

Hands-on experience in programming languages like Java, Python or Scala and experience in writing SQL scripts for Oracle, MySQL, PostgreSQL or HiveQL
Exposure to industry standard data modeling tools (e.g., ERWin, ER Studio, etc.).
Exposure to Extract, Transform & Load (ETL) tools like Informatica or Talend
Exposure to industry standard data catalog, automated data discovery and data lineage tools (e.g., Alation, Collibra, etc., )
Experience with Big Data / Hadoop / Spark / Hive / NoSQL database engines (i.e. Cassandra or HBase)
Exposure to unstructured datasets and ability to handle XML, JSON file formats
Conduct exploratory data analysis and generate visual summaries of data. Identify data quality issues proactively.
Developing reusable code pipelines through CI/CD.
Hands-on experience of big data or MPP databases.
Developing and executing integrated test plans.

💡 Responsibilities

Be responsible for identifying solutions for complex data analysis and data structure.
Be responsible for creating digital thread requirements
Be responsible for change management of database artifacts to support next gen QMS applications
Be responsible for monitoring data availability and data health of complex systems
Understand industry trends and stay up to date on associated Quality and tech landscape.
Design & build technical data dictionaries and support business glossaries to analyze the datasets
This role may also work on other Quality team digital and strategic deliveries that support the business.
Perform data profiling and data analysis for source systems, manually maintained data, machine or sensor generated data and target data repositories
Design & build both logical and physical data models for both Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) solutions
Develop and maintain data mapping specifications based on the results of data analysis and functional requirements
Build a variety of data loading & data transformation methods using multiple tools and technologies.
Design & build automated Extract, Transform & Load (ETL) jobs based on data mapping specifications
Manage metadata structures needed for building reusable Extract, Transform & Load (ETL) components.
Analyze reference datasets and familiarize with Master Data Management (MDM) tools.
Analyze the impact of changes to downstream systems/products and recommend alternatives to minimize the impact.
Derive solutions and make recommendations from deep dive data analysis proactively.
Design and build Data Quality (DQ) rules.
Drives design and implementation of the roadmap.
Design and develop complex code in multiple languages.
This role may also work on other Quality team digital and strategic deliveries that support the business.

PostgreSQLPythonSQLData AnalysisETLHadoopJavaMySQLOracleData engineeringNosqlSparkCI/CDAgile methodologiesJSONScalaData visualizationData modeling

Posted 6 days ago

Apply

🔥 Staff Data Engineer

Posted 12 days ago

📍 United States

🧭 Full-Time

💸 117800.0 - 214300.0 USD per year

🔍 Software Development

🏢 Company: careers_gm

🔧 Requirements

7+ years of hands-on experience.
Bachelor's degree (or equivalent work experience) in Computer Science, Data Science, Software Engineering, or a related field.
Strong understanding and ability to provide mentorship in the areas of data ETL processes and tools for designing and managing data pipelines
Proficient with big data frameworks and tools like Apache Hadoop, Apache Spark, or Apache Kafka for processing and analyzing large datasets.
Hands on experience with data serialization formats like JSON, Parquet and XML
Consistently models and leads in best practices and optimization for scripting skills in languages like Python, Java, Scala, etc for automation and data processing.
Proficient with database administration and performance tuning for databases like MySQL, PostgresSQL or NoSQL databases
Proficient with containerization (e.g., Docker) and orchestration platforms (e.g., Kubernetes) for managing data applications.
Experience with cloud platforms and data services for data storage and processing
Consistently designs solutions and build data solutions that are highly automated, performant, with quality checks that provide data consistency and accuracy outcomes
Experienced at actively managing large-scale data engineering projects, including planning, resource allocation, risk management, and ensuring successful project delivery and adjust style for all delivery methods (ie: Waterfall, Agile, POD, etc)
Understands data governance principles, data privacy regulations, and experience implementing security measures to protect data
Able to integrate data engineering pipelines with machine learning models and platforms
Strong problem-solving skills to identify and resolve complex data engineering issues efficiently.
Ability to work effectively in cross-functional teams, collaborate with data scientists, analysts, and stakeholders to deliver data solutions.
Ability to lead and mentor junior data engineers, providing guidance and support in complex data engineering projects.
Influential communication skills to effectively convey technical concepts to non-technical stakeholders and document data engineering processes.
Models a mindset of continuous learning, staying updated with the latest advancements in data engineering technologies, and a drive for innovation.

💡 Responsibilities

Design, construct, install and maintain data architectures, including database and large-scale processing systems.
Develop and maintain ETL (Extract, Transform, Load) processes to collect, cleanse and transform data from various sources inclusive of cloud.
Design and implement data pipelines to collect, process and transfer data from various sources to storage systems (data warehouses, data lakes, etc)
Implement security measures to protect sensitive data and ensure compliance with data privacy regulations.
Build data solutions that ensure data quality, integrity and security through data validation, monitoring, and compliance with data governance policies
Administer and optimize databases for performance and scalability
Maintain Master Data, Metadata, Data Management Repositories, Logical Data Models, and Data Standards
Troubleshoot and resolve data-related issues affecting data quality fidelity
Document data architectures, processes and best practices for knowledge sharing across the GM data engineering community
Participate in the evaluation and selection of data related tools and technologies
Collaborate across other engineering functions within EDAI, Marketing Technology, and Software & Services

AWSDockerPostgreSQLPythonSQLApache HadoopCloud ComputingData AnalysisETLJavaKubernetesMySQLAlgorithmsApache KafkaData engineeringData scienceData StructuresREST APINosqlCI/CDProblem SolvingJSONScalaData visualizationData modelingScriptingData analyticsData management

Posted 12 days ago

Apply

🔥 Staff Data Engineer: Host Pricing & Settings

Posted 16 days ago

📍 United States

💸 204000.0 - 260000.0 USD per year

🔍 Software Development

🏢 Company: Airbnb👥 5001-10000💰 Secondary Market almost 5 years ago🫂 Last layoff about 2 years agoHospitality Travel Accommodations PropTech Marketplace Mobile Apps Travel

🔧 Requirements

9+ years of experience with a BS/Masters or 6+ years with a PhD
Expertise in SQL and proficient in at least one data engineering language, such as Python or Scala
Experience with Superset and Tableau
Expertise in large-scale distributed data processing frameworks like Presto or Spark
Experience with an ETL framework like Airflow
Extensive knowledge of data management concepts, including data modeling, ETL processes, data warehousing, and data governance.
Understanding of data security and privacy principles, as well as regulatory compliance requirements (e.g., GDPR, CCPA).
Strong problem-solving skills and the ability to translate business requirements into technical solutions.
Excellent communication skills, both written and verbal, ability to distill complex ideas for technical and non-technical stakeholders
Strong capability to forge trusted partnerships across working teams

💡 Responsibilities

Design and implement data pipelines by leveraging best-in-class tools and infrastructure to meet critical business and product requirements.
Develop high quality data assets for product and AI/ML use-cases
Collaborate with cross-functional teams to gather requirements, assess data needs, and design efficient solutions that align with business objectives.
Contribute to the development of long-term data strategies and roadmaps and ML infrastructure development within the organization.
Influence the trajectory of data in decision making
Improve trust in our data by championing for data quality across the stack
Identify and actively work upon opportunities for automation and implement data management tools and frameworks to enhance efficiency and productivity.
Mentor and coach team members, providing guidance in data engineering best practices and support to enhance their skills and performance.

LeadershipPythonSQLETLMachine LearningCross-functional Team LeadershipTableauAirflowData engineeringREST APISparkCI/CDProblem SolvingExcellent communication skillsScalaData visualizationData modelingData management

Posted 16 days ago

Apply

🔥 Staff Data Engineer

Posted 26 days ago

📍 United States

🧭 Full-Time

🔍 Software Development

🏢 Company: Life360👥 251-500💰 $33,038,258 Post-IPO Equity over 2 years ago🫂 Last layoff about 2 years agoAndroid Family Apps Mobile Apps Mobile

🔧 Requirements

Minimum 7 years of experience working with high volume data infrastructure.
Experience with Databricks and AWS.
Experience with dbt.
Experience with job orchestration tooling like Airflow.
Proficient programming in Python.
Proficient with SQL and the ability to optimize complex queries.
Proficient with large-scale data processing using Spark and/or Presto/Trino.
Proficient in data modeling and database design.
Experience with streaming data with a tool like Kinesis or Kafka.
Experience working with high volume event based data architecture like Amplitude and Braze.
Experience in modern development lifecycle including Agile methodology, CI/CD, automated deployments using Terraform, GitHub Actions, etc.
Knowledge and proficiency in the latest open source and data frameworks, modern data platform tech stacks and tools.
Always learning and staying up to speed with the fast moving data world.
You have good communication and collaboration skills and can work independently.
BS in Computer Science, Software Engineering, Mathematics, or equivalent experience.

💡 Responsibilities

Design, implement, and manage scalable data processing platforms used for real-time analytics and exploratory data analysis.
Manage our financial data from ingestion through ETL to storage and batch processing.
Automate, test and harden all data workflows.
Architect logical and physical data models to ensure the needs of the business are met.
Collaborate across the data teams, engineering, data science, and analytics, to understand their needs, while applying engineering best practices.
Architect and develop systems and algorithms for distributed real-time analytics and data processing.
Implement strategies for acquiring data to develop new insights.
Mentor junior engineers, imparting best practices and institutionalizing efficient processes to foster growth and innovation within the team.
Champion data engineering best practices and institutionalizing efficient processes to foster growth and innovation within the team.

AWSProject ManagementPythonSQLApache AirflowETLKafkaAlgorithmsData engineeringData StructuresREST APISparkCommunication SkillsAnalytical SkillsCollaborationCI/CDProblem SolvingAgile methodologiesMentoringTerraformData visualizationTechnical supportData modelingData analyticsData managementDebugging

Posted 26 days ago

Apply

🔥 Staff Data Engineer

Posted about 2 months ago

📍 United States, Canada

🧭 Full-Time

💸 200000.0 - 228000.0 USD per year

🔍 Software Development

🏢 Company: Later👥 1-10 Consumer Electronics iOS Apps Software

🔧 Requirements

10+ years of experience in data engineering, software engineering, or related fields.
Proven experience leading the technical strategy and execution of large-scale data platforms.
Expertise in cloud technologies (Google Cloud Platform, AWS, Azure) with a focus on scalable data solutions (BigQuery, Snowflake, Redshift, etc.).
Strong proficiency in SQL, Python, and distributed data processing frameworks (Apache Spark, Flink, Beam, etc.).
Extensive experience with streaming data architectures using Kafka, Flink, Pub/Sub, Kinesis, or similar technologies.
Expertise in data modeling, schema design, indexing, partitioning, and performance tuning for analytical workloads, including data governance (security, access control, compliance: GDPR, CCPA, SOC 2)
Strong experience designing and optimizing scalable, fault-tolerant data pipelines using workflow orchestration tools like Airflow, Dagster, or Dataflow.
Ability to lead and influence engineering teams, drive cross-functional projects, and align stakeholders towards a common data vision.
Experience mentoring senior and mid-level data engineers to enhance team performance and skill development.

💡 Responsibilities

Lead the design and evolution of a scalable data architecture that meets analytical, machine learning, and operational needs.
Architect and optimize data pipelines for batch and real-time data processing, ensuring efficiency and reliability.
Implement best practices for distributed data processing, ensuring scalability, performance, and cost-effectiveness of data workflows.
Define and enforce data governance policies, implement automated validation checks, and establish monitoring frameworks to maintain data integrity.
Ensure data security and compliance with industry regulations by designing appropriate access controls, encryption mechanisms, and auditing processes.
Drive innovation in data engineering practices by researching and implementing new technologies, tools, and methodologies.
Work closely with data scientists, engineers, analysts, and business stakeholders to understand data requirements and deliver impactful solutions.
Develop reusable frameworks, libraries, and automation tools to improve efficiency, reliability, and maintainability of data infrastructure.
Guide and mentor data engineers, fostering a high-performing engineering culture through best practices, peer reviews, and knowledge sharing.
Establish and monitor SLAs for data pipelines, proactively identifying and mitigating risks to ensure high availability and reliability.

AWSPythonSQLApache AirflowCloud ComputingData AnalysisETLGCPKafkaMachine LearningSnowflakeData engineeringData modelingData management

Posted about 2 months ago

Apply

🔥 Staff Data Engineer

Posted about 2 months ago

📍 United States

🧭 Full-Time

💸 85500.0 - 117500.0 USD per year

🔍 Software Development

🔧 Requirements

5+ years of work experience as a data engineer/full stack engineering, coding in Python.
5+ years of experience building web scraping tools in python, using Beautiful Soup, Scrapy, Selenium, or similar tooling
3-5 years of deployment experience with CI/CD
Strong experience of HTML, CSS, JavaScript, and browser behavior.
Experience with RESTful APIs and JSON/XML data formats.
Knowledge of cloud platforms and containerization technologies (e.g., Docker, Kubernetes).
Advanced understanding of how at least one big data processing technology works under the hood (e.g. Spark / Hadoop / HDFS / Redshift / BigQuery / Snowflake)

💡 Responsibilities

Use modern tooling to build robust, extensible, and performant web scraping platform
Build thoughtful and reliable data acquisition and integration solutions to meet business requirements and data sourcing needs.
Deliver best in class infrastructure solutions for flexible and repeatable applications across disparate sources.
Troubleshoot, improve and scale existing data pipelines, models and solutions
Build upon data engineering's CI/CD deployments, and infrastructure-as-code for provisioning AWS and 3rd party (Apify) services.

AWSBackend DevelopmentPostgreSQLPythonSQLApache AirflowETLData engineeringREST APINodeJSSoftware EngineeringData analytics

Posted about 2 months ago

Apply

🔥 Staff Data Engineer

Posted about 2 months ago

📍 United States

💸 131414.0 - 197100.0 USD per year

🔍 Mental healthcare

🏢 Company: Headspace👥 11-50 Wellness Health Care Child Care

🔧 Requirements

10+ years of success in enterprise data solutions and high-impact initiatives.
Expertise in platforms like Databricks, Snowflake, dbt, and Redshift.
Experience designing and optimizing real-time and batch ETL pipelines.
Demonstrated leadership and mentorship abilities in engineering.
Strong collaboration skills with product and analytics stakeholders.
Bachelor’s or advanced degree in Computer Science, Engineering, or a related field.

💡 Responsibilities

Drive the architecture and implementation of pySpark data pipelines.
Create and enforce design patterns in code and schema.
Design and lead secure and compliant data warehousing platforms.
Partner with analytics and product leaders for actionable insights.
Mentor team members on dbt architecture and foster a data-first culture.
Act as a thought leader on data strategy and cross-functional roadmaps.

SQLCloud ComputingETLSnowflakeData engineeringData modelingData analytics

Posted about 2 months ago

Apply

🔥 Staff Data Engineer

Posted 2 months ago

📍 United States

🧭 Full-Time

💸 170000.0 - 195000.0 USD per year

🔍 Healthcare

🏢 Company: Parachute Health👥 101-250💰 $1,000 over 5 years agoMedical Health Care Software

🔧 Requirements

5+ years of relevant experience.
Experience in Data Engineering with Python.
Experience building customer-facing software.
Strong listening and communication skills.
Time management and organizational skills.
Proactive, a driven self-starter who can work independently or as part of a team.
Ability to think with the 'big picture' in mind.
Passionate about improving patient outcomes in the healthcare space.

💡 Responsibilities

Architect solutions to integrate and manage large volumes of data across various internal and external systems.
Establish best practices and data governance standards to ensure that data infrastructure is built for long-term scalability.
Build and maintain a reporting product for external customers that visualizes data and provides tabular reports.
Collaborate across the organization to assess data engineering needs.

PythonETLAirflowData engineeringData visualization

Posted 2 months ago

Apply

🔥 Staff Data Engineer

Posted 3 months ago

📍 US, Ontario, CAN

🔍 Food waste reduction and grocery technology

🏢 Company: Afresh👥 51-100💰 $115,000,000 Series B over 2 years agoArtificial Intelligence (AI)Logistics Food and Beverage Machine Learning Agriculture Supply Chain Management Software

🔧 Requirements

Significant experience designing and maintaining ETLs that process large-scale datasets.
Proficiency with Python, PySpark, SQL, and experience with tools like Databricks, Snowflake, or DBT.
Strong problem-solving skills with ambiguous requirements.
Focus on practical outcomes balancing technical rigor and execution.
Experience with complex, unclean datasets and innovative processing methods.
Identifying areas for tooling or automation to simplify workflows.
Excellent communication skills for technical presentation.
Proven leadership in technical projects with mentoring ability.

💡 Responsibilities

Build tools and frameworks that streamline customer integrations.
Create robust ETLs in PySpark and DBT to process billions of records.
Collaborate with teams to design and deliver data solutions for new products.
Identify optimizations to improve ETL runtime and scalability.
Solve data quality challenges with messy datasets.
Investigate and implement new technologies into the data platform.
Support team members by mentoring and leading technical discussions.

PythonSQLETLData engineeringData management

Posted 3 months ago

Apply

🔥 Staff Data Engineer

Posted 3 months ago

📍 United States

🔍 Cyber security

🏢 Company: BeyondTrust👥 1001-5000💰 Private almost 4 years agoCloud Computing Security Cloud Security Cyber Security Software

🔧 Requirements

Strong programming and technology knowledge in cloud data processing.
Previous experience working in matured data lakes.
Strong data modelling skills for analytical workloads.
Spark (or equivalent parallel processing framework) experience is needed; existing Databricks knowledge is a plus.
Interest and aptitude for cybersecurity; interest in identity security is highly preferred.
Technical understanding of underlying systems and computation minutiae.
Experience working with distributed systems and data processing on object stores.
Ability to work autonomously.

💡 Responsibilities

Optimize data workloads at a software level by improving processing efficiency.
Develop new data processing routes to remove redundancy or reduce transformation overhead.
Monitor and maintain existing data workflows.
Use observability best practices to ensure pipeline performance.
Perform complex transformations on both real time and batch data assets.
Create new ML/Engineering solutions to tackle existing issues in the cybersecurity space.
Leverage CI/CD best practices to effectively develop and release source code.

PythonSparkCI/CDData modeling

Posted 3 months ago

Apply