Apply

Data Engineer

Posted about 16 hours agoViewed

View full description

📍 Location: United States

🔍 Industry: Sustainable Agriculture

🏢 Company: Agrovision

🗣️ Languages: English

🪄 Skills: PythonSQLApache HadoopBashETLData engineeringData scienceRDBMSPandasSparkCommunication SkillsAnalytical SkillsCollaborationProblem SolvingData modeling

Requirements:
  • Experience with RDBMS (e.g., Teradata, MS SQL Server, Oracle) in production environments is preferred
  • Hands-on experience in data engineering and databases/data warehouses
  • Familiarity with Big Data platforms (e.g., Hadoop, Spark, Hive, HBase, Map/Reduce)
  • Expert level understanding of Python (e.g., Pandas)
  • Proficient in shell scripting (e.g., Bash) and Python data application development (or similar)
  • Excellent collaboration and communication skills with teams
  • Strong analytical and problem-solving skills, essential for tackling complex challenges
  • Experience working with BI teams and tooling (e.g. PowerBI), supporting analytics work and interfacing with Data Scientists
Responsibilities:
  • Collaborate with data scientists to ensure high-quality, accessible data for analytical and predictive modeling
  • Design and implement data pipelines (ETL’s) tailored to meet business needs and digital/analytics solutions
  • Enhance data integrity, security, quality, and automation, addressing system gaps proactively
  • Support pipeline maintenance, troubleshoot issues, and optimize performance
  • Lead and contribute to defining detailed scalable data models for our global operations
  • Ensure data security standards are met and upheld by contributors, partners and regional teams through programmatic solutions and tooling
Apply

Related Jobs

Apply

📍 United States

💸 64000.0 - 120000.0 USD per year

  • Strong PL/SQL, SQL development skills
  • Proficient in multiple languages used in data engineering such as Python, Java
  • Minimum 3-5 years of experience in Data engineering working with Oracle and MS SQL
  • Experience with data warehousing concepts and technologies including cloud-based services (e.g. Snowflake)
  • Experience with cloud platforms like Azure and knowledge of infrastructure
  • Experience with data orchestration tools (e.g. Azure Data Factory, DataBricks workflows)
  • Understanding of data privacy regulations and best practices
  • Experience working with remote teams
  • Experience working on a team with a CI/CD process
  • Familiarity using tools like Git, Jira
  • Bachelor's degree in Computer Science or Computer Engineering
  • Design, implement and maintain scalable pipelines and architecture to collect, process, and store data from various sources.
  • Unit test and document solutions that meet product quality standards prior to release to QA.
  • Identify and resolve performance bottlenecks in pipelines due to data, queries and processing workflows to ensure efficient and timely data delivery.
  • Implement data quality checks and validations processes to ensure accuracy, completeness and consistency of data delivery.
  • Work with Data Architect and implement best practices for data governance, quality and security.
  • Collaborate with cross-functional teams to identify and address data needs.
  • Ensure technology solutions support the needs of the customer and/or organization.
  • Define and document technical requirements.

PythonSQLETLGitJavaOracleSnowflakeAzureData engineeringCI/CDRESTful APIs

Posted about 3 hours ago
Apply
Apply

📍 United States

💸 144000.0 - 180000.0 USD per year

🔍 Software Development

🏢 Company: Hungryroot👥 101-250💰 $40,000,000 Series C almost 4 years agoArtificial Intelligence (AI)Food and BeverageE-CommerceRetailConsumer GoodsSoftware

  • 5+ years of experience in ETL development and data modeling
  • 5+ years of experience in both Scala and Python
  • 5+ years of experience in Spark
  • Excellent problem-solving skills and the ability to translate business problems into practical solutions
  • 2+ years of experience working with the Databricks Platform
  • Develop pipelines in Spark (Python + Scala) in the Databricks Platform
  • Build cross-functional working relationships with business partners in Food Analytics, Operations, Marketing, and Web/App Development teams to power pipeline development for the business
  • Ensure system reliability and performance
  • Deploy and maintain data pipelines in production
  • Set an example of code quality, data quality, and best practices
  • Work with Analysts and Data Engineers to enable high quality self-service analytics for all of Hungryroot
  • Investigate datasets to answer business questions, ensuring data quality and business assumptions are understood before deploying a pipeline

AWSPythonSQLApache AirflowData MiningETLSnowflakeAlgorithmsAmazon Web ServicesData engineeringData StructuresSparkCI/CDRESTful APIsMicroservicesJSONScalaData visualizationData modelingData analyticsData management

Posted 1 day ago
Apply
Apply

📍 United States

💸 135000.0 - 155000.0 USD per year

🔍 Software Development

🏢 Company: Jobgether👥 11-50💰 $1,493,585 Seed about 2 years agoInternet

  • 8+ years of experience as a data engineer, with a strong background in data lake systems and cloud technologies.
  • 4+ years of hands-on experience with AWS technologies, including S3, Redshift, EMR, Kafka, and Spark.
  • Proficient in Python or Node.js for developing data pipelines and creating ETLs.
  • Strong experience with data integration and frameworks like Informatica and Python/Scala.
  • Expertise in creating and managing AWS services (EC2, S3, Lambda, etc.) in a production environment.
  • Solid understanding of Agile methodologies and software development practices.
  • Strong analytical and communication skills, with the ability to influence both IT and business teams.
  • Design and develop scalable data pipelines that integrate enterprise systems and third-party data sources.
  • Build and maintain data infrastructure to ensure speed, accuracy, and uptime.
  • Collaborate with data science teams to build feature engineering pipelines and support machine learning initiatives.
  • Work with AWS cloud technologies like S3, Redshift, and Spark to create a world-class data mesh environment.
  • Ensure proper data governance and implement data quality checks and lineage at every stage of the pipeline.
  • Develop and maintain ETL processes using AWS Glue, Lambda, and other AWS services.
  • Integrate third-party data sources and APIs into the data ecosystem.

AWSNode.jsPythonSQLETLKafkaData engineeringSparkAgile methodologiesScalaData modelingData management

Posted 1 day ago
Apply
Apply
🔥 Data Engineer, IT
Posted 3 days ago

📍 United States

🧭 Full-Time

💸 100000.0 - 120000.0 USD per year

🔍 IT

🏢 Company: Adswerve, Inc

  • Extensive experience with Google Cloud Platform (GCP) services, including BigQuery, Dataform, Cloud Storage, Cloud Functions, Cloud Composer, Dataflow, and Pub/Sub
  • In depth understanding of SQL, including the optimization of queries and data transformations
  • Comfortable with Javascript and Python programming languages
  • Excellent communication and collaboration skills
  • Solid understanding of data warehouse, data modeling and data design concepts
  • Experience with ELT processes and data transformation techniques
  • Experience with version control systems (e.g., Git)
  • Strong analytical and problem-solving skills as well as the ability to decompose complex problems
  • Proven track record of managing workloads to consistently meet project deadlines
  • Develop scalable and efficient data architectures to support enterprise applications, analytics, and reporting.
  • Design, develop, and maintain efficient and scalable ELT pipelines on the Google Cloud Platform
  • Partner with business stakeholders to understand their business needs and data requirements, and translate those needs into clear, actionable technical solutions that directly address business needs.
  • Create and maintain detailed documentation, including architecture diagrams, standards, and models.
  • Ensure data security, integrity, and availability across the organization.
  • Manage, maintain and develop custom built ELT pipelines for systems unsupported by the organizations integration platform
  • Manage, maintain and optimize the data infrastructure on the Google Cloud Platform
  • Implement and manage data validation and quality check to ensure accuracy, consistency and completeness of data across pipelines.
  • Set up and manage monitoring and alerting for data pipelines and infrastructure to proactively identify failures and performance issues.

PythonSQLGCPGitJavascriptData engineeringRESTful APIsData visualizationData modeling

Posted 3 days ago
Apply
Apply
🔥 Sr. Data Engineer
Posted 3 days ago

📍 United States

🔍 Software Development

🏢 Company: ge_externalsite

  • Exposure to industry standard data modeling tools (e.g., ERWin, ER Studio, etc.).
  • Exposure to Extract, Transform & Load (ETL) tools like Informatica or Talend
  • Exposure to industry standard data catalog, automated data discovery and data lineage tools (e.g., Alation, Collibra, TAMR etc., )
  • Hands-on experience in programming languages like Java, Python or Scala
  • Hands-on experience in writing SQL scripts for Oracle, MySQL, PostgreSQL or HiveQL
  • Experience with Big Data / Hadoop / Spark / Hive / NoSQL database engines (i.e. Cassandra or HBase)
  • Exposure to unstructured datasets and ability to handle XML, JSON file formats
  • Work independently as well as with a team to develop and support Ingestion jobs
  • Evaluate and understand various data sources (databases, APIs, flat files etc. to determine optimal ingestion strategies
  • Develop a comprehensive data ingestion architecture, including data pipelines, data transformation logic, and data quality checks, considering scalability and performance requirements.
  • Choose appropriate data ingestion tools and frameworks based on data volume, velocity, and complexity
  • Design and build data pipelines to extract, transform, and load data from source systems to target destinations, ensuring data integrity and consistency
  • Implement data quality checks and validation mechanisms throughout the ingestion process to identify and address data issues
  • Monitor and optimize data ingestion pipelines to ensure efficient data processing and timely delivery
  • Set up monitoring systems to track data ingestion performance, identify potential bottlenecks, and trigger alerts for issues
  • Work closely with data engineers, data analysts, and business stakeholders to understand data requirements and align ingestion strategies with business objectives.
  • Build technical data dictionaries and support business glossaries to analyze the datasets
  • Perform data profiling and data analysis for source systems, manually maintained data, machine generated data and target data repositories
  • Build both logical and physical data models for both Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) solutions
  • Develop and maintain data mapping specifications based on the results of data analysis and functional requirements
  • Perform a variety of data loads & data transformations using multiple tools and technologies.
  • Build automated Extract, Transform & Load (ETL) jobs based on data mapping specifications
  • Maintain metadata structures needed for building reusable Extract, Transform & Load (ETL) components.
  • Analyze reference datasets and familiarize with Master Data Management (MDM) tools.
  • Analyze the impact of downstream systems and products
  • Derive solutions and make recommendations from deep dive data analysis.
  • Design and build Data Quality (DQ) rules needed

AWSPostgreSQLPythonSQLApache AirflowApache HadoopData AnalysisData MiningErwinETLHadoop HDFSJavaKafkaMySQLOracleSnowflakeCassandraClickhouseData engineeringData StructuresREST APINosqlSparkJSONData visualizationData modelingData analyticsData management

Posted 3 days ago
Apply
Apply
🔥 Data Engineer II
Posted 4 days ago

📍 US

💸 100000.0 - 120000.0 USD per year

🔍 Education Technology

🏢 Company: Blueprint Test Prep

  • Experience in identifying, designing and implementing infrastructure for greater scalability, optimizing data delivery, and automating manual processes
  • Experience in backend databases and surrounding technologies such as Redshift, DynamoDB, Glue and S3
  • Experience in building BI models and visualizations such as Looker, Tableau or Power BI
  • Ability to create visualizations of complex data, experience with Looker preferred
  • Knowledge of modeling including proficiency in acquiring, organizing, and analyzing large amounts of data
  • Strong attention to detail and data accuracy, and the ability to think holistically
  • Some experience with the analysis of AI algorithms
  • Design innovative solutions that push the boundaries of the education technology space
  • Understand that quality data is the differentiator for our learners
  • Generate models and visualizations that explain stories and allow for data-driven solutions
  • Understand the KPIs that move the needle on the business side and that quality insights can be a huge differentiator for our learners
  • Resolve complex problems, break down complex data and propose creative solutions
  • Be a beacon of trust for everyone at Blueprint and provide analytical and logical solutions to problems

AWSPythonSQLCloud ComputingData AnalysisDynamoDBETLTableauAlgorithmsData engineeringAnalytical SkillsCI/CDProblem SolvingAttention to detailData visualizationData modelingData analytics

Posted 4 days ago
Apply
Apply
🔥 Data Engineer
Posted 5 days ago

📍 United States

💸 112800.0 - 126900.0 USD per year

🔍 Software Development

🏢 Company: Titan Cloud

  • 4+ years of work experience with ETL, Data Modeling, Data Analysis, and Data Architecture.
  • Experience operating very large data warehouses or data lakes.
  • Experience with building data pipelines and applications to stream and process datasets at low latencies.
  • MySQL, MSSQL Database, Postgres, Python
  • Design, implement, and maintain standardized data models that align with business needs and analytical use cases.
  • Optimize data structures and schemas for efficient querying, scalability, and performance across various storage and compute platforms.
  • Provide guidance and best practices for data storage, partitioning, indexing, and query optimization.
  • Developing and maintaining a data pipeline design.
  • Build robust and scalable ETL/ELT data pipelines to transform raw data into structured datasets optimized for analysis.
  • Collaborate with data scientists to streamline feature engineering and improve the accessibility of high-value data assets.
  • Designing, building, and maintaining the data architecture needed to support business decisions and data-driven applications. This includes collecting, storing, processing, and analyzing large amounts of data using AWS, Azure, and local tools and services.
  • Develop and enforce data governance standards to ensure consistency, accuracy, and reliability of data across the organization.
  • Ensure data quality, integrity, and completeness in all pipelines by implementing automated validation and monitoring mechanisms.
  • Implement data cataloging, metadata management, and lineage tracking to enhance data discoverability and usability.
  • Work with Engineering to manage and optimize data warehouse and data lake architectures, ensuring efficient storage and retrieval of structured and semi-structured data.
  • Evaluate and integrate emerging cloud-based data technologies to improve performance, scalability, and cost efficiency.
  • Assist with designing and implementing automated tools for collecting and transferring data from multiple source systems to the AWS and Azure cloud platform.
  • Work with DevOps Engineers to integrate any new code into existing pipelines
  • Collaborate with teams in trouble shooting functional and performance issues.
  • Must be a team player to be able to work in an agile environment

AWSPostgreSQLPythonSQLAgileApache AirflowCloud ComputingData AnalysisETLHadoopMySQLData engineeringData scienceREST APISparkCommunication SkillsAnalytical SkillsCI/CDProblem SolvingTerraformAttention to detailOrganizational skillsMicroservicesTeamworkData visualizationData modelingScripting

Posted 5 days ago
Apply
Apply

📍 United States

🧭 Full-Time

💸 108000.0 - 162000.0 USD per year

🔍 Insurance

🏢 Company: Openly👥 251-500💰 $100,000,000 Series D over 1 year agoLife InsuranceProperty InsuranceInsuranceCommercial InsuranceAuto Insurance

  • 1 to 2 years of data engineering and data management experience.
  • Scripting skills in one or more of the following: Python.
  • Basic understanding and usage of a development and deployment lifecycle, automated code deployments (CI/CD), code repositories, and code management.
  • Experience with Google Cloud data store and data orchestration technologies and concepts.
  • Hands-on experience and understanding of the entire data pipeline architecture: Data replication tools, staging data, data transformation, data movement, and cloud based data platforms.
  • Understanding of a modern next generation data warehouse platform, such as the Lakehouse and multi-data layered warehouse.
  • Proficiency with SQL optimization and development.
  • Ability to understand data architecture and modeling as it relates to business goals and objectives.
  • Ability to gain an understanding of data requirements, translate them into source to target data mappings, and build a working solution.
  • Experience with terraform preferred but not required.
  • Design, create, and maintain data solutions. This includes data pipelines and data structures.
  • Work with data users, data science, and business intelligence personnel, to create data solutions to be used in various projects.
  • Translating concepts to code to enhance our data management frameworks and services to strive towards providing a high quality data product to our data users.
  • Collaborate with our product, operations, and technology teams to develop and deploy new solutions related to data architecture and data pipelines to enable a best-in-class product for our data users.
  • Collaborating with teammates to derive design and solution decisions related to architecture, operations, deployment techniques, technologies, policies, processes, etc.
  • Participate in domain, stand ups, weekly 1:1's, team collaborations, and biweekly retros
  • Assist in educating others on different aspects of data (e.g. data management best practices, data pipelining best practices, SQL tuning)
  • Build and share your knowledge within the data engineer team and with others in the company (e.g. tech all-hands, tech learning hour, domain meetings, code sync meetings, etc.)

DockerPostgreSQLPythonSQLApache AirflowCloud ComputingETLGCPKafkaKubernetesData engineeringGoREST APICI/CDTerraformData modelingScriptingData management

Posted 5 days ago
Apply
Apply
🔥 Staff Data Engineer
Posted 6 days ago

📍 United States

🧭 Full-Time

🔍 Software Development

🏢 Company: Life360👥 251-500💰 $33,038,258 Post-IPO Equity over 2 years ago🫂 Last layoff about 2 years agoAndroidFamilyAppsMobile AppsMobile

  • Minimum 7 years of experience working with high volume data infrastructure.
  • Experience with Databricks and AWS.
  • Experience with dbt.
  • Experience with job orchestration tooling like Airflow.
  • Proficient programming in Python.
  • Proficient with SQL and the ability to optimize complex queries.
  • Proficient with large-scale data processing using Spark and/or Presto/Trino.
  • Proficient in data modeling and database design.
  • Experience with streaming data with a tool like Kinesis or Kafka.
  • Experience working with high volume event based data architecture like Amplitude and Braze.
  • Experience in modern development lifecycle including Agile methodology, CI/CD, automated deployments using Terraform, GitHub Actions, etc.
  • Knowledge and proficiency in the latest open source and data frameworks, modern data platform tech stacks and tools.
  • Always learning and staying up to speed with the fast moving data world.
  • You have good communication and collaboration skills and can work independently.
  • BS in Computer Science, Software Engineering, Mathematics, or equivalent experience.
  • Design, implement, and manage scalable data processing platforms used for real-time analytics and exploratory data analysis.
  • Manage our financial data from ingestion through ETL to storage and batch processing.
  • Automate, test and harden all data workflows.
  • Architect logical and physical data models to ensure the needs of the business are met.
  • Collaborate across the data teams, engineering, data science, and analytics, to understand their needs, while applying engineering best practices.
  • Architect and develop systems and algorithms for distributed real-time analytics and data processing.
  • Implement strategies for acquiring data to develop new insights.
  • Mentor junior engineers, imparting best practices and institutionalizing efficient processes to foster growth and innovation within the team.
  • Champion data engineering best practices and institutionalizing efficient processes to foster growth and innovation within the team.

AWSProject ManagementPythonSQLApache AirflowETLKafkaAlgorithmsData engineeringData StructuresREST APISparkCommunication SkillsAnalytical SkillsCollaborationCI/CDProblem SolvingAgile methodologiesMentoringTerraformData visualizationTechnical supportData modelingData analyticsData managementDebugging

Posted 6 days ago
Apply
Apply

📍 United States

💸 70000.0 - 105000.0 USD per year

🔍 Software Development

🏢 Company: VUHL

  • Relevant experience in data engineering or a related discipline.
  • Demonstrated ability to code effectively and a solid understanding of software engineering principles.
  • Experience using SQL or other query language to manage and process data.
  • Experience using Python to build ETL pipelines
  • Experience working with data from various sources and in various formats, including flat files, REST APIs, Excel files, JSON, XML, etc.
  • Experience with Snowflake, SQL Server, or related database technologies.
  • Experience using orchestration tools like Dagster (preferred), Apache Airflow, or similar.
  • Preference for Agile product delivery.
  • Familiarity with GIT, Change Management, and application lifecycle management tools.
  • Ability to influence others without positional control.
  • Create and deliver functional ETL pipelines and other data solutions using core technologies like SQL, Python, Snowflake, Dagster, and SSIS in an agile development environment. Apply sound database design principles and adhere to Clean Code practices.
  • Engage in whole team planning, retrospectives, and communication. Interact with Architects and Product Owners to translate requirements into actionable business logic.
  • Participate in proposing and adopting Engineering standards related to architectural considerations and non-functional requirements such as security, reliability, and stability. Ensure proper management and visibility of borrower data and the life of a loan. Contribute to data governance initiatives.
  • Actively contribute to strengthening the team and culture by taking on various duties as needed, excluding licensed activities.

PythonSQLAgileApache AirflowETLGitSnowflakeData engineeringREST APIJSONData modelingSoftware EngineeringData management

Posted 6 days ago
Apply