Apply

Data Engineer

Posted 5 months agoViewed

View full description

📍 Location: United States

🔍 Industry: Consulting and Technology

🏢 Company: BCC-NIH

🗣️ Languages: English

🪄 Skills: PythonSQLAgileBashKubernetesSCRUMJiraInterpersonal skillsTroubleshootingScripting

Requirements:
  • Bachelor's degree in a STEM field (Engineering, Computer Science, Mathematics, Physics) or equivalent industry experience in bioinformatics.
  • Experience in large and complex data operations environments including relational databases and SQL.
  • Proficiency in scripting languages like Bash and Python.
  • Familiarity with LINUX/UNIX systems and troubleshooting operational pipelines.
  • Excellent interpersonal skills and team collaboration.
Responsibilities:
  • Support data engineering efforts at the National Institutes of Health (NIH) by managing and optimizing data pipelines.
  • Troubleshoot operational pipelines to resolve priority issues and implement effective solutions.
Apply

Related Jobs

Apply

📍 United States

🧭 Full-Time

💸 150363.0 - 180870.0 USD per year

🔍 Software Development

  • At least a Bachelors Degree or foreign equivalent in Computer Science, Computer Engineering, Electrical and Electronics Engineering, or a closely related technical field, and at least five (5) years of post-bachelor’s, progressive experience writing shell scripts; validating data; and engaging in data wrangling.
  • Experience must include at least three (3) years of experience debugging data; transforming data into Microsoft SQL server; developing processes to import data into HDFS using Sqoop; and using Java, UNIX Shell Scripts, and Python.
  • Experience must also include at least one (1) year of experience developing Hive scripts for data transformation on data lake projects; converting Hive scripts to Pyspark applications; automating in Hadoop; and implementing CI/CD pipelines.
  • Design, develop, test, and implement Big Data technical solutions.
  • Recommend the right technologies and solutions for a given use case, from the application layer to infrastructure.
  • Lead the delivery of compiling and installing database systems, integrating data from a variety of data sources (data warehouse, data marts) utilizing on-prem or cloud-based data structures.
  • Drive solution architecture and perform deployments of data pipelines and applications.
  • Author DDL and DML SQL spanning technical tacks.
  • Develop data transformation code and highly complex provisioning pipelines.
  • Ingest data from relational databases.
  • Execute automation strategy.

AWSPythonSQLHadoopJavaKafkaSnowflakeData engineeringSparkCI/CDScalaScriptingDebugging

Posted 1 day ago
Apply
Apply
🔥 Data Engineer-I
Posted 1 day ago

📍 USA

🔍 Healthcare

🏢 Company: Innovaccer Inc.

  • SQL knowledge
  • ETL/ELT/Data pipeline knowledge
  • Python knowledge
  • Powershell / Bash knowledge
  • Excellent problem-solving and effective communication skills
  • Self-motivation, integrity and honesty
  • Collaborate with team, management, departments using virtual tools
  • Run Production data pipelines/processes, ensure the integrity of the data, and send out deliverables based on requirement/runbook documentation
  • Coordinate with the various technical teams to resolve issues/bugs/optimize said production processes
  • Coordinate with internal client facing team members to communicate the status of deliverables
  • Help develop/improve technical documentation to guide future software development projects and operations
  • Dedicated time to explore building out tech stack and capabilities where there are applicable use cases
  • Provide critical thinking, technical innovation, and extra attention to detail by serving as a trusted team member and peer code reviewer
  • Assists with external client communications when deliverables or receivables do not meet technical or project requirements, ensuring timely resolution and alignment

PythonSQLBashETLMicrosoft AzurePostgresData modeling

Posted 1 day ago
Apply
Apply
🔥 Senior Data Engineer
Posted 4 days ago

📍 Worldwide

🔍 Hospitality

🏢 Company: Lighthouse

  • 4+ years of professional experience using Python, Java, or Scala for data processing (Python preferred)
  • You stay up-to-date with industry trends, emerging technologies, and best practices in data engineering.
  • Improve, manage, and teach standards for code maintainability and performance in code submitted and reviewed
  • Ship large features independently, generate architecture recommendations and have the ability to implement them
  • Great communication: Regularly achieve consensus amongst teams
  • Familiarity with GCP, Kubernetes (GKE preferred),  CI/CD tools (Gitlab CI preferred), familiarity with the concept of Lambda Architecture.
  • Experience with Apache Beam or Apache Spark for distributed data processing or event sourcing technologies like Apache Kafka.
  • Familiarity with monitoring tools like Grafana & Prometheus.
  • Design and develop scalable, reliable data pipelines using the Google Cloud stack.
  • Optimise data pipelines for performance and scalability.
  • Implement and maintain data governance frameworks, ensuring data accuracy, consistency, and compliance.
  • Monitor and troubleshoot data pipeline issues, implementing proactive measures for reliability and performance.
  • Collaborate with the DevOps team to automate deployments and improve developer experience on the data front.
  • Work with data science and analytics teams to enable them to bring their research to production grade data solutions, using technologies like airflow, dbt or MLflow (but not limited to)
  • As a part of a platform team, you will communicate effectively with teams across the entire engineering organisation, to provide them with reliable foundational data models and data tools.
  • Mentor and provide technical guidance to other engineers working with data.

PythonSQLApache AirflowETLGCPKubernetesApache KafkaData engineeringCI/CDMentoringTerraformScalaData modeling

Posted 4 days ago
Apply
Apply
🔥 Senior Data Engineer
Posted 5 days ago

📍 United States

🧭 Full-Time

💸 183600.0 - 216000.0 USD per year

🔍 Software Development

  • 6+ years of experience in a data engineering role building products, ideally in a fast-paced environment
  • Good foundations in Python and SQL.
  • Experience with Spark, PySpark, DBT, Snowflake and Airflow
  • Knowledge of visualization tools, such as Metabase, Jupyter Notebooks (Python)
  • Collaborate on the design and improvements of the data infrastructure
  • Partner with product and engineering to advocate best practices and build supporting systems and infrastructure for the various data needs
  • Create data pipelines that stitch together various data sources in order to produce valuable business insights
  • Create real-time data pipelines in collaboration with the Data Science team

PythonSQLSnowflakeAirflowData engineeringSparkData visualizationData modeling

Posted 5 days ago
Apply
Apply
🔥 Senior Data Engineer
Posted 5 days ago

📍 United States

🧭 Full-Time

🔍 Healthcare

🏢 Company: Rad AI👥 101-250💰 $60,000,000 Series C 2 months agoArtificial Intelligence (AI)Enterprise SoftwareHealth Care

  • 4+ years relevant experience in data engineering.
  • Expertise in designing and developing distributed data pipelines using big data technologies on large scale data sets.
  • Deep and hands-on experience designing, planning, productionizing, maintaining and documenting reliable and scalable data infrastructure and data products in complex environments.
  • Solid experience with big data processing and analytics on AWS, using services such as Amazon EMR and AWS Batch.
  • Experience in large scale data processing technologies such as Spark.
  • Expertise in orchestrating workflows using tools like Metaflow.
  • Experience with various database technologies including SQL, NoSQL databases (e.g., AWS DynamoDB, ElasticSearch, Postgresql).
  • Hands-on experience with containerization technologies, such as Docker and Kubernetes.
  • Design and implement the data architecture, ensuring scalability, flexibility, and efficiency using pipeline authoring tools like Metaflow and large-scale data processing technologies like Spark.
  • Define and extend our internal standards for style, maintenance, and best practices for a high-scale data platform.
  • Collaborate with researchers and other stakeholders to understand their data needs including model training and production monitoring systems and develop solutions that meet those requirements.
  • Take ownership of key data engineering projects and work independently to design, develop, and maintain high-quality data solutions.
  • Ensure data quality, integrity, and security by implementing robust data validation, monitoring, and access controls.
  • Evaluate and recommend data technologies and tools to improve the efficiency and effectiveness of the data engineering process.
  • Continuously monitor, maintain, and improve the performance and stability of the data infrastructure.

AWSDockerSQLElasticSearchETLKubernetesData engineeringNosqlSparkData modeling

Posted 5 days ago
Apply
Apply

📍 Worldwide

🧭 Full-Time

NOT STATED
  • Own the design and implementation of cross-domain data models that support key business metrics and use cases.
  • Partner with analysts and data engineers to translate business logic into performant, well-documented dbt models.
  • Champion best practices in testing, documentation, CI/CD, and version control, and guide others in applying them.
  • Act as a technical mentor to other analytics engineers, supporting their development and reviewing their code.
  • Collaborate with central data platform and embedded teams to improve data quality, metric consistency, and lineage tracking.
  • Drive alignment on model architecture across domains—ensuring models are reusable, auditable, and trusted.
  • Identify and lead initiatives to reduce technical debt and modernise legacy reporting pipelines.
  • Contribute to the long-term vision of analytics engineering at Pleo and help shape our roadmap for scalability and impact.

SQLData AnalysisETLData engineeringCI/CDMentoringDocumentationData visualizationData modelingData analyticsData management

Posted 5 days ago
Apply
Apply
🔥 Senior Data Engineer
Posted 5 days ago

📍 United States

🧭 Full-Time

💸 183600.0 - 216000.0 USD per year

🔍 Mental Healthcare

🏢 Company: Headway👥 201-500💰 $125,000,000 Series C over 1 year agoMental Health Care

  • 6+ years of experience in a data engineering role building products, ideally in a fast-paced environment
  • Good foundations in Python and SQL.
  • Experience with Spark, PySpark, DBT, Snowflake and Airflow
  • Knowledge of visualization tools, such as Metabase, Jupyter Notebooks (Python)
  • A knack for simplifying data, expressing information in charts and tables
  • Collaborate on the design and improvements of the data infrastructure
  • Partner with product and engineering to advocate best practices and build supporting systems and infrastructure for the various data needs
  • Create data pipelines that stitch together various data sources in order to produce valuable business insights
  • Create real-time data pipelines in collaboration with the Data Science team

PythonSQLETLSnowflakeAirflowData engineeringRDBMSSparkRESTful APIsData visualizationData modeling

Posted 5 days ago
Apply
Apply

📍 United States

🧭 Full-Time

💸 114000.0 - 171599.0 USD per year

🔍 Fintech

  • Strong expertise in data pipeline development (ETL/ELT) and workflow automation.
  • Proficiency in Python, SQL, and scripting languages for data processing and automation.
  • Hands-on experience with Workato, Google Apps Script, and API-driven automation.
  • Automate customer support, success, and service workflows to improve speed, accuracy, and responsiveness.
  • Build and maintain scalable ETL/ELT pipelines to ensure real-time access to critical customer data.
  • Implement self-service automation to enable customers and internal teams to quickly access information.

PythonSQLETLJiraAPI testingData engineeringCI/CDRESTful APIsData visualizationScriptingCustomer Success

Posted 6 days ago
Apply
Apply
🔥 Staff Data Engineer
Posted 6 days ago

📍 United States

🏢 Company: ge_externalsite

  • Hands-on experience in programming languages like Java, Python or Scala and experience in writing SQL scripts for Oracle, MySQL, PostgreSQL or HiveQL
  • Exposure to industry standard data modeling tools (e.g., ERWin, ER Studio, etc.).
  • Exposure to Extract, Transform & Load (ETL) tools like Informatica or Talend
  • Exposure to industry standard data catalog, automated data discovery and data lineage tools (e.g., Alation, Collibra, etc., )
  • Experience with Big Data / Hadoop / Spark / Hive / NoSQL database engines (i.e. Cassandra or HBase)
  • Exposure to unstructured datasets and ability to handle XML, JSON file formats
  • Conduct exploratory data analysis and generate visual summaries of data. Identify data quality issues proactively.
  • Developing reusable code pipelines through CI/CD.
  • Hands-on experience of big data or MPP databases.
  • Developing and executing integrated test plans.
  • Be responsible for identifying solutions for complex data analysis and data structure.
  • Be responsible for creating digital thread requirements
  • Be responsible for change management of database artifacts to support next gen QMS applications
  • Be responsible for monitoring data availability and data health of complex systems
  • Understand industry trends and stay up to date on associated Quality and tech landscape.
  • Design & build technical data dictionaries and support business glossaries to analyze the datasets
  • This role may also work on other Quality team digital and strategic deliveries that support the business.
  • Perform data profiling and data analysis for source systems, manually maintained data, machine or sensor generated data and target data repositories
  • Design & build both logical and physical data models for both Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) solutions
  • Develop and maintain data mapping specifications based on the results of data analysis and functional requirements
  • Build a variety of data loading & data transformation methods using multiple tools and technologies.
  • Design & build automated Extract, Transform & Load (ETL) jobs based on data mapping specifications
  • Manage metadata structures needed for building reusable Extract, Transform & Load (ETL) components.
  • Analyze reference datasets and familiarize with Master Data Management (MDM) tools.
  • Analyze the impact of changes to downstream systems/products and recommend alternatives to minimize the impact.
  • Derive solutions and make recommendations from deep dive data analysis proactively.
  • Design and build Data Quality (DQ) rules.
  • Drives design and implementation of the roadmap.
  • Design and develop complex code in multiple languages.
  • This role may also work on other Quality team digital and strategic deliveries that support the business.

PostgreSQLPythonSQLData AnalysisETLHadoopJavaMySQLOracleData engineeringNosqlSparkCI/CDAgile methodologiesJSONScalaData visualizationData modeling

Posted 6 days ago
Apply
Apply
🔥 Staff Data Engineer
Posted 7 days ago

📍 United States

🧭 Full-Time

🔍 Software Development

🏢 Company: Apollo.io👥 501-1000💰 $100,000,000 Series D over 1 year agoSoftware Development

  • 8+ years of experience as a data platform engineer or a software engineer in data or big data engineer.
  • Experience in data modeling, data warehousing, APIs, and building data pipelines.
  • Deep knowledge of databases and data warehousing with an ability to collaborate cross-functionally.
  • Bachelor's degree in a quantitative field (Physical/Computer Science, Engineering, Mathematics, or Statistics).
  • Develop and maintain scalable data pipelines and build new integrations to support continuing increases in data volume and complexity.
  • Develop and improve Data APIs used in machine learning / AI product offerings
  • Implement automated monitoring, alerting, self-healing (restartable/graceful failures) features while building the consumption pipelines.
  • Implement processes and systems to monitor data quality, ensuring production data is always accurate and available.
  • Write unit/integration tests, contribute to the engineering wiki, and document work.
  • Define company data models and write jobs to populate data models in our data warehouse.
  • Work closely with all business units and engineering teams to develop a strategy for long-term data platform architecture.

PythonSQLApache AirflowApache HadoopCloud ComputingETLApache KafkaData engineeringFastAPIData modelingData analytics

Posted 7 days ago
Apply