Apache Hadoop Jobs

Find remote positions requiring Apache Hadoop skills. Browse through opportunities where you can utilize your expertise and grow your career.

Apache Hadoop

14 jobs found. to receive daily emails with new job openings that match your preferences.

14 jobs found.

Set alerts to receive daily emails with new job openings that match your preferences.

Apply

🔥 Data Engineer

Posted about 14 hours ago

📍 United States

🧭 Full-Time

🔍 Sustainable Agriculture

🏢 Company: Agrovision

🔧 Requirements

Experience with RDBMS (e.g., Teradata, MS SQL Server, Oracle) in production environments is preferred
Hands-on experience in data engineering and databases/data warehouses
Familiarity with Big Data platforms (e.g., Hadoop, Spark, Hive, HBase, Map/Reduce)
Expert level understanding of Python (e.g., Pandas)
Proficient in shell scripting (e.g., Bash) and Python data application development (or similar)
Excellent collaboration and communication skills with teams
Strong analytical and problem-solving skills, essential for tackling complex challenges
Experience working with BI teams and tooling (e.g. PowerBI), supporting analytics work and interfacing with Data Scientists

💡 Responsibilities

Collaborate with data scientists to ensure high-quality, accessible data for analytical and predictive modeling
Design and implement data pipelines (ETL’s) tailored to meet business needs and digital/analytics solutions
Enhance data integrity, security, quality, and automation, addressing system gaps proactively
Support pipeline maintenance, troubleshoot issues, and optimize performance
Lead and contribute to defining detailed scalable data models for our global operations
Ensure data security standards are met and upheld by contributors, partners and regional teams through programmatic solutions and tooling

Python SQL Apache Hadoop Bash ETL Data engineering Data science RDBMS Pandas Spark Communication Skills Analytical Skills Collaboration Problem Solving Data modeling

Posted about 14 hours ago

Apply

🔥 Sr. Data Engineer

Posted 3 days ago

📍 United States

🔍 Software Development

🏢 Company: ge_externalsite

🔧 Requirements

Exposure to industry standard data modeling tools (e.g., ERWin, ER Studio, etc.).
Exposure to Extract, Transform & Load (ETL) tools like Informatica or Talend
Exposure to industry standard data catalog, automated data discovery and data lineage tools (e.g., Alation, Collibra, TAMR etc., )
Hands-on experience in programming languages like Java, Python or Scala
Hands-on experience in writing SQL scripts for Oracle, MySQL, PostgreSQL or HiveQL
Experience with Big Data / Hadoop / Spark / Hive / NoSQL database engines (i.e. Cassandra or HBase)
Exposure to unstructured datasets and ability to handle XML, JSON file formats

💡 Responsibilities

Work independently as well as with a team to develop and support Ingestion jobs
Evaluate and understand various data sources (databases, APIs, flat files etc. to determine optimal ingestion strategies
Develop a comprehensive data ingestion architecture, including data pipelines, data transformation logic, and data quality checks, considering scalability and performance requirements.
Choose appropriate data ingestion tools and frameworks based on data volume, velocity, and complexity
Design and build data pipelines to extract, transform, and load data from source systems to target destinations, ensuring data integrity and consistency
Implement data quality checks and validation mechanisms throughout the ingestion process to identify and address data issues
Monitor and optimize data ingestion pipelines to ensure efficient data processing and timely delivery
Set up monitoring systems to track data ingestion performance, identify potential bottlenecks, and trigger alerts for issues
Work closely with data engineers, data analysts, and business stakeholders to understand data requirements and align ingestion strategies with business objectives.
Build technical data dictionaries and support business glossaries to analyze the datasets
Perform data profiling and data analysis for source systems, manually maintained data, machine generated data and target data repositories
Build both logical and physical data models for both Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) solutions
Develop and maintain data mapping specifications based on the results of data analysis and functional requirements
Perform a variety of data loads & data transformations using multiple tools and technologies.
Build automated Extract, Transform & Load (ETL) jobs based on data mapping specifications
Maintain metadata structures needed for building reusable Extract, Transform & Load (ETL) components.
Analyze reference datasets and familiarize with Master Data Management (MDM) tools.
Analyze the impact of downstream systems and products
Derive solutions and make recommendations from deep dive data analysis.
Design and build Data Quality (DQ) rules needed

AWS PostgreSQL Python SQL Apache Airflow Apache Hadoop Data Analysis Data Mining Erwin ETL Hadoop HDFS Java Kafka MySQL Oracle Snowflake Cassandra Clickhouse Data engineering Data Structures REST API Nosql Spark JSON Data visualization Data modeling Data analytics Data management

Posted 3 days ago

Apply

🔥 Senior AI Engineer

Posted 8 days ago

📍 India

🏢 Company: BlackStone eIT👥 251-500 Augmented Reality Robotics Analytics Project Management

🔧 Requirements

5+ years of experience in AI or machine learning roles.
Strong proficiency in programming languages such as Python, Java, or C++.
Expertise with machine learning frameworks like TensorFlow or PyTorch.
In-depth knowledge of AI algorithms and techniques, including machine learning, deep learning, and NLP.
Experience with big data technologies like Hadoop or Spark.
Familiarity with cloud services (AWS, Google Cloud, Azure) for deploying AI models.

💡 Responsibilities

Design, develop, and implement advanced artificial intelligence solutions that enhance our products and services.
Work closely with data scientists, software engineers, and other stakeholders to transform business requirements into robust AI applications.

AWS Python Apache Hadoop Artificial Intelligence Java Machine Learning Microsoft Azure PyTorch C++Spark Tensorflow

Posted 8 days ago

Apply

🔥 Cloud Infrastructure Software Engineer

Posted 15 days ago

📍 United States

🧭 Full-Time

🔍 Software Development

🏢 Company: Worth AI👥 11-50💰 $12,000,000 Seed over 1 year agoArtificial Intelligence (AI)Business Intelligence Risk Management FinTech

🔧 Requirements

Bachelor's degree in Computer Science, Software Engineering, or a related field.
Proven experience as a Software Engineer, with a focus on infrastructure development and operations.
Strong programming skills in languages such as Python, Javascript, or Go.
Experience with cloud platforms (preferably AWS) and cost optimization strategies.
Familiarity with container orchestration (e.g., Kubernetes, Docker).
Expertise in Infrastructure as Code (IaC) tools, particularly Terraform (AWS CDK is a plus).
Design, scale, and maintain infrastructure to support Big Data workloads and real-time streaming systems such as Apache Spark, Hadoop, and Kafka.
Understanding of networking concepts, protocols, and security practices.
Proficiency in source control systems, especially Git.
Experience with CI/CD tools such as GitHub Actions and ArgoCD.
Familiarity with observability tools (Datadog, New Relic, etc.) for monitoring and logging.
Excellent problem-solving skills and the ability to work in a collaborative environment.
Strong communication skills to effectively share knowledge with team members.
Experience in the Risk, Underwriting, and/or Payments Industry is a plus.

💡 Responsibilities

Design and develop cloud infrastructure components and services to support our AI-driven platforms.
Collaborate with software engineers to integrate applications with underlying infrastructure.
Automate deployment processes and infrastructure management using Infrastructure as Code (IaC) practices.
Implement monitoring and logging strategies to optimize system performance and availability.
Optimize infrastructure for cost efficiency, ensuring resources are utilized effectively without compromising performance.
Coordinate with security teams to ensure the infrastructure is compliant with best practices and standards.
Troubleshoot and resolve infrastructure-related issues efficiently.
Continuously evaluate, recommend, and implement changes to improve system reliability and performance.
Maintain documentation for infrastructure services and processes.
Support on-call rotation as needed for critical infrastructure issues.
Other Duties as assigned

AWS Docker Python SQL Apache Hadoop Cloud Computing Git Hadoop Javascript Kafka Kubernetes Algorithms Apache Kafka Data Structures Go CI/CD RESTful APIs Linux DevOps Terraform Microservices Networking Software Engineering

Posted 15 days ago

Apply

🔥 Principal Cloud Engineer - BA-WFH 1578

Posted 26 days ago

📍 United States, Canada

🧭 Full-Time

🔍 Software Development

🏢 Company: Global InfoTek, Inc.

🔧 Requirements

10-12 years of experience in cloud engineering
Working knowledge of AWS, Azure, or Google Cloud
Experience with programming languages like Python, Java, or C#

💡 Responsibilities

Design and implement cloud infrastructure
Engineer integration of applications into cloud and hybrid environments
Monitor cloud system performance and optimize resources

AWS Docker Python Apache Hadoop Cybersecurity ElasticSearch Kubernetes C#Azure CI/CD Terraform Networking Ansible

Posted 26 days ago

Apply

🔥 Senior Software Engineer - Scoring

Posted about 1 month ago

📍 USA

💸 176000.0 - 207000.0 USD per year

🔍 Cybersecurity

🏢 Company: Abnormal Security👥 501-1000💰 $250,000,000 Series D 7 months agoArtificial Intelligence (AI)Email Information Technology Cyber Security Network Security

🔧 Requirements

5+ years of experience as a data engineer or similar role, with hands-on experience in building data-focused solutions.
Expertise in ETL, data pipeline design, and data engineering tools and technologies (e.g., Apache Spark, Hadoop, Airflow, Kafka).
Experience with maintaining real-time and near real-time data pipelines or streaming services at high scale.
Experience with maintaining large scale distributed systems on cloud platforms such as AWS, GCP, or Azure.
Background in implementing data quality frameworks, including validation, monitoring, and anomaly detection.
Proven ability to collaborate effectively with cross-functional teams.
Excellent problem-solving skills and ability to work independently in a fast-paced environment.

💡 Responsibilities

Architect, design, build, and deploy backend ETL jobs and infrastructure that support a world-class Detection Engine.
Ownership projects that enable us to meet ambitious goals, including scaling components of Detection’s Data Pipeline by 10x.
Own real-time, near real-time streaming pipelines and online feature serving services.
Collaborate closely with MLE and Data Science teams, distilling feedback and executing strategy.
Coach and mentor junior engineers through 1on1s, pair programming, code reviews, and design reviews.

AWS Apache Airflow Apache Hadoop ETL GCP Kafka Azure Data engineering

Posted about 1 month ago

Apply

🔥 Solution Architect, AI/ML Engineering Consultant

Posted about 1 month ago

📍 United States, India

🧭 Contract, Part-Time, Full-Time

🔍 Life sciences

🏢 Company: ValGenesis👥 501-1000💰 $24,000,000 Private almost 4 years agoPharmaceutical Medical Device Software

🔧 Requirements

Bachelor’s or Master’s in Computer Science, Data Science, or related field.
8+ years in AI/ML solution development.
Proven software development experience in life sciences or regulated industries.
Strong analytical thinking and problem-solving skills.
Excellent communication and collaboration abilities.
Knowledge of life sciences validation processes and regulatory compliance.

💡 Responsibilities

Build scalable AI/ML models for document classification, intelligent search, and predictive analytics.
Implement image processing solutions for visual inspections and anomaly detection.
Define AI architecture and select technologies from open-source and commercial offerings.
Deploy AI/ML solutions in cloud-based environments with a focus on high availability and security.
Mentor a team of AI/ML engineers, fostering collaborative research and development.

AWS Docker PostgreSQL Python SQL Apache Hadoop Artificial Intelligence Cloud Computing Git Image Processing Jenkins Kubernetes Machine Learning MongoDB Numpy OpenCV PyTorch Tableau Azure Pandas Spark Tensorflow CI/CD Compliance Data visualization

Posted about 1 month ago

Apply

🔥 Full Stack DevSecOps Engineer

Posted about 2 months ago

📍 United States, Canada

🧭 Full-Time

🔍 Information Technology

🔧 Requirements

5+ years of experience in IT focusing on full stack development
3+ years of software engineering for front-end and back-end applications
Experience with Apache Hadoop, Jenkins, and microservices architecture

💡 Responsibilities

Develop and deploy applications in AWS cloud environments
Create and manage CI/CD pipelines
Collaborate with functional teams on application development

AWS Apache Hadoop Java Javascript Jenkins Kotlin Kubernetes Spring Boot TypeScript Cassandra gRPC Prometheus Tomcat React Selenium Spark CI/CD Microservices JSON Scala

Posted about 2 months ago

Apply

🔥 Senior or Staff Software Engineer - Graph Analytics

Posted about 2 months ago

📍 United States

🧭 Full-Time

💸 200000.0 - 255000.0 USD per year

🔍 Blockchain intelligence and financial crime prevention

🏢 Company: TRM Labs👥 101-250💰 $70,000,000 Series B over 2 years agoCryptocurrency Compliance Blockchain Big Data

🔧 Requirements

Academic background in a quantitative field such as Computer Science, Mathematics, Engineering, or Physics.
Strong knowledge of algorithm design and data structures with practical application experience.
Experience optimizing large-scale distributed data processing systems like Apache Spark, Apache Hadoop, Dask, and graph databases.
Experience in converting academic research into products with a history of collaborating on feature releases.
Strong programming experience in Python and SQL.
Excellent communication skills for technical and non-technical audiences.
Delivery-oriented with the ability to lead feature development from start to finish.
Autonomous ownership of work, capable of moving swiftly and efficiently.
Knowledge of basic graph theory concepts.

💡 Responsibilities

Designing and implementing graph algorithms that analyze large cryptocurrency transaction networks at multi-blockchain scale.
Researching new graph-native technology to evaluate benefit to data science and data engineering teams at TRM.
Collaborating with cryptocurrency investigators to identify key user stories and requirements for new graph algorithms and features.
Understanding and refining TRM’s risk models to assign risk scores to addresses.
Communicating complex implementation details to various audiences from investigators to data engineers.
Integrating with diverse data inputs ranging from raw blockchain data to model outputs.

Python SQL Apache Hadoop Algorithms Data engineering Data science Data Structures

Posted about 2 months ago

Apply

🔥 Sr Machine Learning Engineer, Core Engineering

Posted 2 months ago

📍 US

💸 177309.0 - 310291.0 USD per year

🔍 Software Development

🏢 Company: Pinterest👥 5001-10000💰 Post-IPO Equity over 2 years ago🫂 Last layoff about 2 years agoInternet Social Network Software Social Media Social Bookmarking

🔧 Requirements

4+ years of industry experience applying machine learning methods (e.g., user modeling, personalization, recommender systems, search, ranking, natural language processing, reinforcement learning, and graph representation learning)
End-to-end hands-on experience with building data processing pipelines, large scale machine learning systems, and big data technologies (e.g., Hadoop/Spark)
MS/PhD in Computer Science, ML, NLP, Statistics, Information Sciences, related field, or equivalent experience.

💡 Responsibilities

Build cutting edge technology using the latest advances in deep learning and machine learning to personalize Pinterest
Partner closely with teams across Pinterest to experiment and improve ML models for various product surfaces (Homefeed, Ads, Growth, Shopping, and Search), while gaining knowledge of how ML works in different areas
Use data driven methods and leverage the unique properties of our data to improve candidates retrieval
Work in a high-impact environment with quick experimentation and product launches
Keeping up with industry trends in recommendation systems

Python Software Development Apache Hadoop Machine Learning Numpy PyTorch Algorithms Data Structures Spark Tensorflow

Posted 2 months ago

Apply

Shown 10 out of 14