Apply

Data Engineer

Posted 4 months agoViewed

View full description

πŸ’Ž Seniority level: Junior, 1-4 years

πŸ“ Location: United States, Latin America, India

πŸ” Industry: Data and artificial intelligence services

πŸ—£οΈ Languages: English

⏳ Experience: 1-4 years

πŸͺ„ Skills: AWSPythonSoftware DevelopmentSQLElasticSearchGCPHadoopJavaKafkaSnowflakeAirflowAzureCassandraNosqlSparkCommunication SkillsDocumentationScalaSoftware Engineering

Requirements:
  • 1-4 years experience as a Software Engineer, Data Engineer or Data Analyst.
  • Ability to develop end-to-end technical solutions into production.
  • Programming expertise in Java, Python, and/or Scala.
  • Core cloud data platforms knowledge including Snowflake, AWS, Azure, Databricks, and GCP.
  • SQL proficiency with the ability to write, debug, and optimize SQL queries.
  • A 4-year Bachelor's degree in Computer Science or a related field.
Responsibilities:
  • Develop end-to-end technical solutions into production.
  • Help ensure performance, security, scalability, and robust data integration.
  • Client-facing written and verbal communication skills and experience.
  • Create and deliver detailed presentations.
  • Produce detailed solution documentation including POCs, roadmaps, sequence diagrams, and logical system views.
Apply

Related Jobs

Apply

πŸ“ Brazil

🧭 Full-Time

πŸ” Digital Engineering and Modernization

🏒 Company: EncoraπŸ‘₯ 10001-10001πŸ’° $200,000,000 Private over 5 years agoBig DataCloud ComputingSoftware

  • Experience in data modeling.
  • Experience developing and maintaining data pipelines.
  • Proficiency in SQL.
  • Proficiency in Python.
  • Experience with AWS Redshift.
  • Experience with Apache Airflow.
  • Familiarity with BI tools.

  • Develop and maintain efficient and scalable data pipelines.
  • Model and transform data to meet analysis and reporting needs.
  • Collaborate closely with the customer, including BI and software engineering.
  • Lead other BI or DE team members.
  • Create and maintain detailed technical documentation.
  • Develop dashboards in AWS Quicksight with support from a BI Analyst.

PythonSQLApache AirflowBusiness IntelligenceData modeling

Posted 5 days ago
Apply
Apply

πŸ“ US, Europe

🧭 Full-Time

πŸ’Έ 175000.0 - 205000.0 USD per year

πŸ” Cloud computing and AI services

🏒 Company: CoreWeaveπŸ’° $642,000,000 Secondary Market about 1 year agoCloud ComputingMachine LearningInformation TechnologyCloud Infrastructure

  • 5+ years of experience with Kubernetes and Helm, with a deep understanding of container orchestration.
  • Hands-on experience administering and optimizing clustered computing technologies on Kubernetes, such as Spark, Trino, Flink, Ray, Kafka, StarRocks or similar.
  • 5+ years of programming experience in C++, C#, Java, or Python.
  • 3+ years of experience scripting in Python or Bash for automation and tooling.
  • Strong understanding of data storage technologies, distributed computing, and big data processing pipelines.
  • Proficiency in data security best practices and managing access in complex systems.

  • Architect, deploy, and scale data storage and processing infrastructure to support analytics and data science workloads.
  • Manage and maintain data lake and clustered computing services, ensuring reliability, security, and scalability.
  • Build and optimize frameworks and tools to simplify the usage of big data technologies.
  • Collaborate with cross-functional teams to align data infrastructure with business goals and requirements.
  • Ensure data governance and security best practices across all platforms.
  • Monitor, troubleshoot, and optimize system performance and resource utilization.

PythonBashKubernetesApache Kafka

Posted 7 days ago
Apply
Apply

πŸ“ US

🧭 Full-Time

πŸ’Έ 110000.0 - 125000.0 USD per year

πŸ” Beauty industry

  • Bachelor’s degree in data engineering or relevant discipline.
  • 5+ years of hands-on experience as a data engineer managing data pipelines.
  • Advanced proficiency in modern ETL/ELT stacks - expertise in Fivetran, DBT, and Snowflake.
  • Understanding of data analytics and tools, including Metabase and PowerBI.
  • Expert-level Python and SQL skills with deep experience in DBT transformations.
  • Strong understanding of cloud-native data architectures and modern data warehousing principles.
  • Familiarity with data security, governance, and compliance standards.
  • Adept at designing and delivering interactive dashboards.

  • Building and managing robust ETL/ELT pipelines using Fivetran, DBT, and Snowflake.
  • Developing and optimizing data models and analytical reports.
  • Collaborating with stakeholders to create data pipelines that align with BI needs.
  • Designing and developing SQL-based solutions to transform data.
  • Building the reporting infrastructure from ambiguous requirements.
  • Continuously improving customer experience and providing technical leadership.
  • Evangelizing best practices and technologies.

PythonSQLBusiness IntelligenceETLSnowflakeData engineeringData modeling

Posted 9 days ago
Apply
Apply

πŸ“ USA, Canada, Mexico

🧭 Full-Time

πŸ’Έ 175000.0 USD per year

πŸ” Digital tools for hourly employees

🏒 Company: TeamSenseπŸ‘₯ 11-50πŸ’° Seed 11 months agoInformation ServicesInformation TechnologySoftware

  • Bachelor's or Master's degree in Computer Science, Software Engineering, or a related technical field.
  • 7+ years of professional experience in software engineering including 5+ years of experience in data engineering.
  • Proven expertise in building and managing scalable data platforms.
  • Proficiency in Python.
  • Strong knowledge of SQL, data modeling, data migration and database systems such as PostgreSQL and MongoDB.
  • Exceptional problem-solving skills optimizing data systems.

  • As a Senior Data Engineer, your primary responsibility is to contribute to the design, development, and maintenance of a scalable and reliable data platform.
  • Analyze the current database and warehouse.
  • Design and develop scalable ETL/ELT pipelines to support data migration.
  • Build and maintain robust, scalable, and high-performing data platforms, including data lakes and/or warehouses.
  • Implement data engineering best practices and design patterns.
  • Guide design reviews for new features impacting data.

PostgreSQLPythonSQLETLMongoDBData engineeringData modeling

Posted 9 days ago
Apply
Apply

πŸ“ South Africa, Mauritius, Kenya, Nigeria

πŸ” Technology, Marketplaces

  • BSc degree in Computer Science, Information Systems, Engineering, or related technical field or equivalent work experience.
  • 3+ years related work experience.
  • Minimum of 2 years experience building and optimizing β€˜big data’ data pipelines, architectures and maintaining data sets.
  • Experienced in Python.
  • Experienced in SQL (PostgreSQL, MS SQL).
  • Experienced in using cloud services: AWS, Azure or GCP.
  • Proficiency in version control, CI/CD and GitHub.
  • Understanding/experience in Glue and PySpark highly desirable.
  • Experience in managing data life cycle.
  • Proficiency in manipulating, processing and architecting large disconnected data sets for analytical requirements.
  • Ability to maintain and optimise processes supporting data transformation, data structures, metadata, dependency and workload management.
  • Good understanding of data management principles - data quality assurance and governance.
  • Strong analytical skills related to working with unstructured datasets.
  • Understanding of message queuing, stream processing, and highly scalable β€˜big data’ datastores.
  • Strong attention to detail.
  • Good communication and interpersonal skills.

  • Suggest efficiencies and execute on implementation of internal process improvements in automating manual processes.
  • Implement enhancements and new features across data systems.
  • Improve streamline processes within data systems with support from Senior Data Engineer.
  • Test CI/CD process for optimal data pipelines.
  • Assemble large, complex data sets that meet functional / non-functional business requirements.
  • Highly efficient in ETL processes.
  • Develop and conduct unit tests on data pipelines as well as ensuring data consistency.
  • Develop and maintain automated monitoring solutions.
  • Support reporting and analytics infrastructure.
  • Maintain data quality and data governance as well as upkeep of overall maintenance of data infrastructure systems.
  • Maintain data warehouse and data lake metadata, data catalogue, and user documentation for internal business users.
  • Ensure best practice is implemented and maintained on database.

AWSPostgreSQLPythonSQLETLGitCI/CD

Posted 9 days ago
Apply
Apply

πŸ“ United States

πŸ’Έ 170000.0 - 220000.0 USD per year

πŸ” Healthcare technology

🏒 Company: Red Cell PartnersπŸ‘₯ 11-50Financial ServicesVenture CapitalFinance

  • Proven experience in a leadership role driving ML system development and optimization, preferably in healthcare or related fields.
  • Demonstrated expertise in training ML models and building robust training pipelines for healthcare applications.
  • Strong understanding of machine learning frameworks such as TensorFlow, PyTorch, or similar, with applications in healthcare.
  • Proficient in programming languages like Python or Go, with the ability to write efficient, clean, and maintainable code for healthcare systems.
  • Excellent written and verbal communication skills, with the ability to convey technical concepts to both technical and non-technical audiences in healthcare settings.
  • A track record of delivering impactful machine learning solutions that have been successfully deployed in real-world healthcare applications.
  • Familiarity with healthcare data privacy regulations and best practices for handling sensitive medical information.

  • Lead the team in architecting, building, and optimizing ML systems to deliver high-quality, real-world results in healthcare settings.
  • Design and implement robust training pipelines for machine learning models, ensuring efficiency and scalability for healthcare data.
  • Fine-tune ML models to meet specific healthcare needs and optimize their performance for various medical applications.
  • Develop and implement feedback mechanisms to continuously improve the accuracy and effectiveness of ML in healthcare contexts.
  • Collaborate with cross-functional teams to understand healthcare business requirements and translate them into actionable ML solutions.
  • Stay up-to-date with the latest advancements in machine learning and healthcare technology, implementing best practices to enhance our ML infrastructure.
  • Coach and mentor junior data engineers, fostering a culture of continuous learning and growth within the Lightbox Health team.
  • Communicate complex technical concepts and findings to non-technical stakeholders in a clear and concise manner, particularly in healthcare contexts.

LeadershipPythonMachine LearningPyTorchData engineeringTensorflow

Posted 10 days ago
Apply
Apply

πŸ“ US

🧭 Full-Time

πŸ’Έ 206700.0 - 289400.0 USD per year

πŸ” Social Media / Technology

  • MS or PhD in a quantitative discipline: engineering, statistics, operations research, computer science, informatics, applied mathematics, economics, etc.
  • 7+ years of experience with large-scale ETL systems and building clean, maintainable code (Python preferred).
  • Strong programming proficiency in Python, SQL, Spark, Scala.
  • Experience with data modeling, ETL concepts, and patterns for data governance.
  • Experience with data workflows, data modeling, and engineering.
  • Experience in data visualization and dashboard design using tools like Looker, Tableau, and D3.
  • Deep understanding of relational and MPP database designs.
  • Proven track record of cross-functional collaboration and excellent communication skills.

  • Act as the analytics engineering lead within Ads DS team contributing to data science data quality and automation initiatives.
  • Work on ETLs, reporting dashboards, and data aggregations for business tracking and ML model development.
  • Develop and maintain robust data pipelines for data ingestion, processing, and transformation.
  • Create user-friendly tools for internal team use, streamlining analysis and reporting processes.
  • Lead efforts to build a data-driven culture by enabling data self-service.
  • Provide technical guidance and mentorship to data analysts.

PythonSQLETLAirflowSparkScalaData visualizationData modeling

Posted 12 days ago
Apply
Apply

πŸ“ United States

🧭 Full-Time

πŸ’Έ 175000.0 - 215000.0 USD per year

πŸ” Mental Health Care

🏒 Company: Charlie HealthπŸ‘₯ 501-1000πŸ’° $850,000 Seed over 4 years agoMental Health Care

  • Bachelor’s degree in Computer Science, Mathematics, or other technical discipline, or equivalent practical experience.
  • 7+ years experience as a software engineer, with at least 5 years of experience in a data engineering role.
  • Deep expertise in SQL, understanding CTEs, aggregation functions, window functions, partitioning, and clustering.
  • High proficiency in Python and experience with common data engineering libraries such as Pandas, Numpy, and Great Expectations.
  • Experience with a modern data stack and tools such as FiveTran, Snowflake, DBT, Dagster, Hightouch, and Tableau.
  • Experience with data exploration, profiling, governance, visualization, and activation.
  • Proven ability to thrive in an ambiguous and rapidly changing environment.
  • Experience working with sensitive data in a regulated environment.
  • Expertise in healthcare is a plus.

  • Develop, release, and maintain high-quality data pipelines using Python, FiveTran, DBT, and Snowflake.
  • Own and guide the development of the data infrastructure.
  • Develop custom integrations using Dagster.
  • Configure reverse ETL integrations using Hightouch.
  • Identify bottlenecks and implement improvements to data engineering processes, tools, and procedures.
  • Promote collaboration and learning across teams through mentoring and knowledge sharing.
  • Participate in on-call rotation to ensure data availability.

PythonSQLNumpySnowflakePandas

Posted 17 days ago
Apply
Apply

πŸ“ Seattle, WA

🧭 Full-Time

πŸ’Έ 110000.0 - 185000.0 USD per year

πŸ” Life sciences

🏒 Company: Synthesize BioπŸ‘₯ 1-50

  • Fluency with data manipulation and processing in Python, R, and SQL (e.g., dplyr and pandas).
  • Experience managing and working with large datasets in the cloud (AWS and/or GCP).
  • Familiarity with cloud database solutions and emerging technologies (e.g., Athena, BigQuery, tileDB).
  • Familiarity with containers (Docker, Singularity) and workflow managers (Nextflow, WDL, CWL).
  • Bioinformatics experience, including working with common genetic and genomics data formats (e.g., BAM, FASTQ, VCF).
  • Excitement about generative AI and interest in employing AI to change how science happens.

  • Be foundational in the development of the roadmap for scalable, efficient, and future-proof data and bioinformatics infrastructure.
  • Collaborate with technical teams (Data, AI, Platform) to develop data solutions for improved performance and access.
  • Implement new and improve existing data processing pipelines with an emphasis on automation and robustness.

AWSDockerPythonSQLGCP

Posted 17 days ago
Apply
Apply

πŸ“ US

πŸ’Έ 103200.0 - 128950.0 USD per year

πŸ” Genetics and healthcare

🏒 Company: NateraπŸ‘₯ 1001-5000πŸ’° $250,000,000 Post-IPO Equity over 1 year agoπŸ«‚ Last layoff almost 2 years agoWomen'sBiotechnologyMedicalGeneticsHealth Diagnostics

  • BS degree in computer science or a comparable program or equivalent experience.
  • 8+ years of overall software development experience, ideally in complex data management applications.
  • Experience with SQL and No-SQL databases including Dynamo, Cassandra, Postgres, Snowflake.
  • Proficiency in data technologies such as Hive, Hbase, Spark, EMR, Glue.
  • Ability to manipulate and extract value from large datasets.
  • Knowledge of data management fundamentals and distributed systems.

  • Work with other engineers and product managers to make design and implementation decisions.
  • Define requirements in collaboration with stakeholders and users to create reliable applications.
  • Implement best practices in development processes.
  • Write specifications, design software components, fix defects, and create unit tests.
  • Review design proposals and perform code reviews.
  • Develop solutions for the Clinicogenomics platform utilizing AWS cloud services.

AWSPythonSQLAgileDynamoDBSnowflakeData engineeringPostgresSparkData modelingData management

Posted 19 days ago
Apply