Apply

Data Engineer

Posted 6 days agoViewed

View full description

๐Ÿ’Ž Seniority level: Middle, 3+ years

๐Ÿ“ Location: United States

๐Ÿ’ธ Salary: 153000.0 - 216000.0 USD per year

๐Ÿ” Industry: Software Development

โณ Experience: 3+ years

๐Ÿช„ Skills: PythonSQLETLSnowflakeAirflowData engineeringSparkRESTful APIsData visualization

Requirements:
  • 3+ years of experience in a data engineering role building products, ideally in a fast-paced environment
  • Good foundations in Python and SQL.
  • Experience with Spark, PySpark, DBT, Snowflake and Airflow
  • Knowledge of visualization tools, such as Metabase, Jupyter Notebooks (Python)
Responsibilities:
  • Collaborate on the design and improvements of the data infrastructure
  • Partner with product and engineering to advocate best practices and build supporting systems and infrastructure for the various data needs
  • Create data pipelines that stitch together various data sources in order to produce valuable business insights
  • Create real-time data pipelines in collaboration with the Data Science team
Apply

Related Jobs

Apply

๐Ÿ“ United States

๐Ÿงญ Full-Time

๐Ÿ’ธ 114000.0 - 171599.0 USD per year

๐Ÿ” Fintech

  • Strong expertise in data pipeline development (ETL/ELT) and workflow automation.
  • Proficiency in Python, SQL, and scripting languages for data processing and automation.
  • Hands-on experience with Workato, Google Apps Script, and API-driven automation.
  • Automate customer support, success, and service workflows to improve speed, accuracy, and responsiveness.
  • Build and maintain scalable ETL/ELT pipelines to ensure real-time access to critical customer data.
  • Implement self-service automation to enable customers and internal teams to quickly access information.

PythonSQLETLJiraAPI testingData engineeringCI/CDRESTful APIsData visualizationScriptingCustomer Success

Posted about 17 hours ago
Apply
Apply
๐Ÿ”ฅ Staff Data Engineer
Posted about 19 hours ago

๐Ÿ“ United States

๐Ÿข Company: ge_externalsite

  • Hands-on experience in programming languages like Java, Python or Scala and experience in writing SQL scripts for Oracle, MySQL, PostgreSQL or HiveQL
  • Exposure to industry standard data modeling tools (e.g., ERWin, ER Studio, etc.).
  • Exposure to Extract, Transform & Load (ETL) tools like Informatica or Talend
  • Exposure to industry standard data catalog, automated data discovery and data lineage tools (e.g., Alation, Collibra, etc., )
  • Experience with Big Data / Hadoop / Spark / Hive / NoSQL database engines (i.e. Cassandra or HBase)
  • Exposure to unstructured datasets and ability to handle XML, JSON file formats
  • Conduct exploratory data analysis and generate visual summaries of data. Identify data quality issues proactively.
  • Developing reusable code pipelines through CI/CD.
  • Hands-on experience of big data or MPP databases.
  • Developing and executing integrated test plans.
  • Be responsible for identifying solutions for complex data analysis and data structure.
  • Be responsible for creating digital thread requirements
  • Be responsible for change management of database artifacts to support next gen QMS applications
  • Be responsible for monitoring data availability and data health of complex systems
  • Understand industry trends and stay up to date on associated Quality and tech landscape.
  • Design & build technical data dictionaries and support business glossaries to analyze the datasets
  • This role may also work on other Quality team digital and strategic deliveries that support the business.
  • Perform data profiling and data analysis for source systems, manually maintained data, machine or sensor generated data and target data repositories
  • Design & build both logical and physical data models for both Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) solutions
  • Develop and maintain data mapping specifications based on the results of data analysis and functional requirements
  • Build a variety of data loading & data transformation methods using multiple tools and technologies.
  • Design & build automated Extract, Transform & Load (ETL) jobs based on data mapping specifications
  • Manage metadata structures needed for building reusable Extract, Transform & Load (ETL) components.
  • Analyze reference datasets and familiarize with Master Data Management (MDM) tools.
  • Analyze the impact of changes to downstream systems/products and recommend alternatives to minimize the impact.
  • Derive solutions and make recommendations from deep dive data analysis proactively.
  • Design and build Data Quality (DQ) rules.
  • Drives design and implementation of the roadmap.
  • Design and develop complex code in multiple languages.
  • This role may also work on other Quality team digital and strategic deliveries that support the business.

PostgreSQLPythonSQLData AnalysisETLHadoopJavaMySQLOracleData engineeringNosqlSparkCI/CDAgile methodologiesJSONScalaData visualizationData modeling

Posted about 19 hours ago
Apply
Apply

๐Ÿ“ United States

๐Ÿงญ Full-Time

๐Ÿ” Software Development

๐Ÿข Company: Apollo.io๐Ÿ‘ฅ 501-1000๐Ÿ’ฐ $100,000,000 Series D over 1 year agoSoftware Development

  • 8+ years of experience as a data platform engineer or a software engineer in data or big data engineer.
  • Experience in data modeling, data warehousing, APIs, and building data pipelines.
  • Deep knowledge of databases and data warehousing with an ability to collaborate cross-functionally.
  • Bachelor's degree in a quantitative field (Physical/Computer Science, Engineering, Mathematics, or Statistics).
  • Develop and maintain scalable data pipelines and build new integrations to support continuing increases in data volume and complexity.
  • Develop and improve Data APIs used in machine learning / AI product offerings
  • Implement automated monitoring, alerting, self-healing (restartable/graceful failures) features while building the consumption pipelines.
  • Implement processes and systems to monitor data quality, ensuring production data is always accurate and available.
  • Write unit/integration tests, contribute to the engineering wiki, and document work.
  • Define company data models and write jobs to populate data models in our data warehouse.
  • Work closely with all business units and engineering teams to develop a strategy for long-term data platform architecture.

PythonSQLApache AirflowApache HadoopCloud ComputingETLApache KafkaData engineeringFastAPIData modelingData analytics

Posted 1 day ago
Apply
Apply

๐Ÿ“ United States

๐Ÿงญ Full-Time

๐Ÿ’ธ 150363.0 - 180870.0 USD per year

๐Ÿ” Software Development

๐Ÿข Company: phData๐Ÿ‘ฅ 501-1000๐Ÿ’ฐ $2,499,997 Seed about 7 years agoInformation ServicesAnalyticsInformation Technology

  • At least a Bachelors Degree or foreign equivalent in Computer Science, Computer Engineering, Electrical and Electronics Engineering, or a closely related technical field, and at least five (5) years of post-bachelorโ€™s, progressive experience writing shell scripts; validating data; and engaging in data wrangling.
  • Experience must include at least three (3) years of experience debugging data; transforming data into Microsoft SQL server; developing processes to import data into HDFS using Sqoop; and using Java, UNIX Shell Scripts, and Python.
  • Experience must also include at least one (1) year of experience developing Hive scripts for data transformation on data lake projects; converting Hive scripts to Pyspark applications; automating in Hadoop; and implementing CI/CD pipelines.
  • Design, develop, test, and implement Big Data technical solutions.
  • Recommend the right technologies and solutions for a given use case, from the application layer to infrastructure.
  • Lead the delivery of compiling and installing database systems, integrating data from a variety of data sources (data warehouse, data marts) utilizing on-prem or cloud-based data structures.
  • Drive solution architecture and perform deployments of data pipelines and applications.
  • Author DDL and DML SQL spanning technical tacks.
  • Develop data transformation code and highly complex provisioning pipelines.
  • Ingest data from relational databases.
  • Execute automation strategy.

AWSPythonSQLETLHadoopJavaKafkaSnowflakeData engineeringSparkCI/CDLinuxScala

Posted 2 days ago
Apply
Apply
๐Ÿ”ฅ Data Engineer
Posted 2 days ago

๐Ÿ“ Worldwide

๐Ÿงญ Full-Time

๐Ÿ’ธ 145000.0 - 160000.0 USD per year

  • Proficiency in managing MongoDB databases, including performance tuning and maintenance.
  • Experience with cloud-based data warehousing, particularly using BigQuery.
  • Familiarity with DBT for data transformation and modeling.
  • Exposure to tools like Segment for data collection and integration.
  • Basic knowledge of integrating third-party data sources to build a comprehensive data ecosystem.
  • Overseeing our production MongoDB database to ensure optimal performance, reliability, and security.
  • Assisting in the management and optimization of data pipelines into BigQuery, ensuring data is organized and accessible for downstream users.
  • Utilizing DBT to transform raw data into structured formats, making it useful for analysis and reporting.
  • Collaborating on the integration of data from Segment and various third-party sources to create a unified, clean data ecosystem.
  • Working closely with BI, Marketing, and Data Science teams to understand data requirements and ensure our infrastructure meets their needs.
  • Participating in code reviews, learning new tools, and contributing to the refinement of data processes and best practices.

SQLETLMongoDBData engineeringData modeling

Posted 2 days ago
Apply
Apply

๐Ÿ“ United States

๐Ÿงญ Full-Time

๐Ÿ” Software Development

  • Experience with Infrastructure as Code tools such as Terraform or CloudFormation. Ability to automate the deployment and management of data infrastructure.
  • Familiarity with Continuous Integration and Continuous Deployment (CI/CD) processes. Experience setting up and maintaining CI/CD pipelines for data applications.
  • Proficiency in software development lifecycle process. Release fast and improve incrementally.
  • Experience with tools and frameworks for ensuring data quality, such as data validation, anomaly detection, and monitoring. Ability to design systems to track and enforce data quality standards.
  • Proven experience in designing, building, and maintaining scalable data pipelines capable of processing terabytes of data daily using modern data processing frameworks (e.g., Apache Spark, Apache Kafka, Flink, Open Table Formats, modern OLAP databases).
  • Strong foundation in data architecture principles and the ability to evaluate emerging technologies.
  • Proficient in at least one modern programming language (Go, Python, Java, Rust) and SQL.
  • Design and implement both real-time and batch data processing pipelines, leveraging technologies like Apache Kafka, Apache Flink, or managed cloud streaming services to ensure scalability and resilience
  • Create data pipelines that efficiently process terabytes of data daily, leveraging data lakes and data warehouses within the AWS cloud. Must be proficient with technologies like Apache Spark to handle large-scale data processing.
  • Implement robust schema management practices and lay the groundwork for future data contracts. Ensure pipeline integrity by establishing and enforcing data quality checks, improving overall data reliability and consistency
  • Develop tools to support rapid development of data products. Provide recommended patterns to support data pipeline deployments.
  • Designing, implementing, and maintaining data governance frameworks and best practices to ensure data quality, security, compliance, and accessibility across the organization.
  • Develop tools to support the rapid development of data products and establish recommended patterns for data pipeline deployments. Mentor and guide junior engineers, fostering their growth in best practices and efficient development processes.
  • Collaborate with the DevOps team to integrate data needs into DevOps tooling.
  • Champion DataOps practices within the organization, promoting a culture of collaboration, automation, and continuous improvement in data engineering processes.
  • Stay abreast of emerging technologies, tools and trends in data processing and analytics, and evaluate their potential impact and relevance to Fetchโ€™s strategy.

AWSPythonSQLETLJavaApache KafkaData engineeringGoRustCI/CDDevOpsTerraformData visualizationData modelingData analyticsData management

Posted 4 days ago
Apply
Apply

๐Ÿ“ United States, Canada

๐Ÿ” Software Development

AWSSQLCloud ComputingData AnalysisETLData engineeringData visualizationData modeling

Posted 7 days ago
Apply
Apply

๐Ÿ“ United States

๐Ÿงญ Full-Time

๐Ÿ’ธ 117800.0 - 214300.0 USD per year

๐Ÿ” Software Development

๐Ÿข Company: careers_gm

  • 7+ years of hands-on experience.
  • Bachelor's degree (or equivalent work experience) in Computer Science, Data Science, Software Engineering, or a related field.
  • Strong understanding and ability to provide mentorship in the areas of data ETL processes and tools for designing and managing data pipelines
  • Proficient with big data frameworks and tools like Apache Hadoop, Apache Spark, or Apache Kafka for processing and analyzing large datasets.
  • Hands on experience with data serialization formats like JSON, Parquet and XML
  • Consistently models and leads in best practices and optimization for scripting skills in languages like Python, Java, Scala, etc for automation and data processing.
  • Proficient with database administration and performance tuning for databases like MySQL, PostgresSQL or NoSQL databases
  • Proficient with containerization (e.g., Docker) and orchestration platforms (e.g., Kubernetes) for managing data applications.
  • Experience with cloud platforms and data services for data storage and processing
  • Consistently designs solutions and build data solutions that are highly automated, performant, with quality checks that provide data consistency and accuracy outcomes
  • Experienced at actively managing large-scale data engineering projects, including planning, resource allocation, risk management, and ensuring successful project delivery and adjust style for all delivery methods (ie: Waterfall, Agile, POD, etc)
  • Understands data governance principles, data privacy regulations, and experience implementing security measures to protect data
  • Able to integrate data engineering pipelines with machine learning models and platforms
  • Strong problem-solving skills to identify and resolve complex data engineering issues efficiently.
  • Ability to work effectively in cross-functional teams, collaborate with data scientists, analysts, and stakeholders to deliver data solutions.
  • Ability to lead and mentor junior data engineers, providing guidance and support in complex data engineering projects.
  • Influential communication skills to effectively convey technical concepts to non-technical stakeholders and document data engineering processes.
  • Models a mindset of continuous learning, staying updated with the latest advancements in data engineering technologies, and a drive for innovation.
  • Design, construct, install and maintain data architectures, including database and large-scale processing systems.
  • Develop and maintain ETL (Extract, Transform, Load) processes to collect, cleanse and transform data from various sources inclusive of cloud.
  • Design and implement data pipelines to collect, process and transfer data from various sources to storage systems (data warehouses, data lakes, etc)
  • Implement security measures to protect sensitive data and ensure compliance with data privacy regulations.
  • Build data solutions that ensure data quality, integrity and security through data validation, monitoring, and compliance with data governance policies
  • Administer and optimize databases for performance and scalability
  • Maintain Master Data, Metadata, Data Management Repositories, Logical Data Models, and Data Standards
  • Troubleshoot and resolve data-related issues affecting data quality fidelity
  • Document data architectures, processes and best practices for knowledge sharing across the GM data engineering community
  • Participate in the evaluation and selection of data related tools and technologies
  • Collaborate across other engineering functions within EDAI, Marketing Technology, and Software & Services

AWSDockerPostgreSQLPythonSQLApache HadoopCloud ComputingData AnalysisETLJavaKubernetesMySQLAlgorithmsApache KafkaData engineeringData scienceData StructuresREST APINosqlCI/CDProblem SolvingJSONScalaData visualizationData modelingScriptingData analyticsData management

Posted 7 days ago
Apply
Apply
๐Ÿ”ฅ Data Engineer
Posted 8 days ago

๐Ÿ“ United States, Latin America, India

๐Ÿ” Software Development

  • 1-4 years experience as a Software Engineer, Data Engineer or Data Analyst
  • Programming expertise in Java, Python and/or Scala
  • Experience with core cloud data platforms including Snowflake, AWS, Azure, Databricks and GCP
  • SQL and the ability to write, debug, and optimize SQL queries
  • 4-year Bachelor's degree in Computer Science or a related field
  • Develop end-to-end technical solutions into production โ€” and to help ensure performance, security, scalability, and robust data integration.
  • Client-facing written and verbal communication skills and experience
  • Create and deliver detailed presentations
  • Detailed solution documentation (e.g. including POCS and roadmaps, sequence diagrams, class hierarchies, logical system views, etc.)

AWSPythonSoftware DevelopmentSQLData AnalysisGCPSnowflakeAzureData engineeringCommunication SkillsWritten communicationClient relationship managementData modeling

Posted 8 days ago
Apply
Apply

๐Ÿ“ United States

๐Ÿ’ธ 120000.0 - 145000.0 USD per year

๐Ÿ” Software Development

๐Ÿข Company: Energy Solutions - USA

  • High proficiency in programming languages commonly used in ETL development, such as PLSQL, SQL, and Python.
  • Expertise in utilizing AWS services, including but not limited to Amazon s3, glue, data catalog, Amazon redshift, redshift spectrum, and Amazon Athena.
  • Proficiency in working with relational databases such as Postgres, Oracle, MySQL, or SQL Server.
  • Design, implement, and optimize scalable ETL processes using industry-leading tools and technologies.
  • Develop and refine enterprise-level data models and database architectures for optimal storage, retrieval, and analytics.
  • Identify and resolve performance bottlenecks across ETL pipelines, database operations, and distributed systems.

AWSPostgreSQLPythonSQLETLData modeling

Posted 8 days ago
Apply