Apache Hadoop Jobs

Find remote positions requiring Apache Hadoop skills. Browse through opportunities where you can utilize your expertise and grow your career.

Apache Hadoop
14 jobs found. to receive daily emails with new job openings that match your preferences.
14 jobs found.

Set alerts to receive daily emails with new job openings that match your preferences.

Apply
๐Ÿ”ฅ Data Engineer
Posted about 14 hours ago

๐Ÿ“ United States

๐Ÿงญ Full-Time

๐Ÿ” Sustainable Agriculture

๐Ÿข Company: Agrovision

  • Experience with RDBMS (e.g., Teradata, MS SQL Server, Oracle) in production environments is preferred
  • Hands-on experience in data engineering and databases/data warehouses
  • Familiarity with Big Data platforms (e.g., Hadoop, Spark, Hive, HBase, Map/Reduce)
  • Expert level understanding of Python (e.g., Pandas)
  • Proficient in shell scripting (e.g., Bash) and Python data application development (or similar)
  • Excellent collaboration and communication skills with teams
  • Strong analytical and problem-solving skills, essential for tackling complex challenges
  • Experience working with BI teams and tooling (e.g. PowerBI), supporting analytics work and interfacing with Data Scientists
  • Collaborate with data scientists to ensure high-quality, accessible data for analytical and predictive modeling
  • Design and implement data pipelines (ETLโ€™s) tailored to meet business needs and digital/analytics solutions
  • Enhance data integrity, security, quality, and automation, addressing system gaps proactively
  • Support pipeline maintenance, troubleshoot issues, and optimize performance
  • Lead and contribute to defining detailed scalable data models for our global operations
  • Ensure data security standards are met and upheld by contributors, partners and regional teams through programmatic solutions and tooling

PythonSQLApache HadoopBashETLData engineeringData scienceRDBMSPandasSparkCommunication SkillsAnalytical SkillsCollaborationProblem SolvingData modeling

Posted about 14 hours ago
Apply
Apply

๐Ÿ“ United States

๐Ÿ” Software Development

๐Ÿข Company: ge_externalsite

  • Exposure to industry standard data modeling tools (e.g., ERWin, ER Studio, etc.).
  • Exposure to Extract, Transform & Load (ETL) tools like Informatica or Talend
  • Exposure to industry standard data catalog, automated data discovery and data lineage tools (e.g., Alation, Collibra, TAMR etc., )
  • Hands-on experience in programming languages like Java, Python or Scala
  • Hands-on experience in writing SQL scripts for Oracle, MySQL, PostgreSQL or HiveQL
  • Experience with Big Data / Hadoop / Spark / Hive / NoSQL database engines (i.e. Cassandra or HBase)
  • Exposure to unstructured datasets and ability to handle XML, JSON file formats
  • Work independently as well as with a team to develop and support Ingestion jobs
  • Evaluate and understand various data sources (databases, APIs, flat files etc. to determine optimal ingestion strategies
  • Develop a comprehensive data ingestion architecture, including data pipelines, data transformation logic, and data quality checks, considering scalability and performance requirements.
  • Choose appropriate data ingestion tools and frameworks based on data volume, velocity, and complexity
  • Design and build data pipelines to extract, transform, and load data from source systems to target destinations, ensuring data integrity and consistency
  • Implement data quality checks and validation mechanisms throughout the ingestion process to identify and address data issues
  • Monitor and optimize data ingestion pipelines to ensure efficient data processing and timely delivery
  • Set up monitoring systems to track data ingestion performance, identify potential bottlenecks, and trigger alerts for issues
  • Work closely with data engineers, data analysts, and business stakeholders to understand data requirements and align ingestion strategies with business objectives.
  • Build technical data dictionaries and support business glossaries to analyze the datasets
  • Perform data profiling and data analysis for source systems, manually maintained data, machine generated data and target data repositories
  • Build both logical and physical data models for both Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) solutions
  • Develop and maintain data mapping specifications based on the results of data analysis and functional requirements
  • Perform a variety of data loads & data transformations using multiple tools and technologies.
  • Build automated Extract, Transform & Load (ETL) jobs based on data mapping specifications
  • Maintain metadata structures needed for building reusable Extract, Transform & Load (ETL) components.
  • Analyze reference datasets and familiarize with Master Data Management (MDM) tools.
  • Analyze the impact of downstream systems and products
  • Derive solutions and make recommendations from deep dive data analysis.
  • Design and build Data Quality (DQ) rules needed

AWSPostgreSQLPythonSQLApache AirflowApache HadoopData AnalysisData MiningErwinETLHadoop HDFSJavaKafkaMySQLOracleSnowflakeCassandraClickhouseData engineeringData StructuresREST APINosqlSparkJSONData visualizationData modelingData analyticsData management

Posted 3 days ago
Apply
Apply

๐Ÿ“ India

๐Ÿข Company: BlackStone eIT๐Ÿ‘ฅ 251-500Augmented RealityRoboticsAnalyticsProject Management

  • 5+ years of experience in AI or machine learning roles.
  • Strong proficiency in programming languages such as Python, Java, or C++.
  • Expertise with machine learning frameworks like TensorFlow or PyTorch.
  • In-depth knowledge of AI algorithms and techniques, including machine learning, deep learning, and NLP.
  • Experience with big data technologies like Hadoop or Spark.
  • Familiarity with cloud services (AWS, Google Cloud, Azure) for deploying AI models.
  • Design, develop, and implement advanced artificial intelligence solutions that enhance our products and services.
  • Work closely with data scientists, software engineers, and other stakeholders to transform business requirements into robust AI applications.

AWSPythonApache HadoopArtificial IntelligenceJavaMachine LearningMicrosoft AzurePyTorchC++SparkTensorflow

Posted 8 days ago
Apply
Apply

๐Ÿ“ United States

๐Ÿงญ Full-Time

๐Ÿ” Software Development

๐Ÿข Company: Worth AI๐Ÿ‘ฅ 11-50๐Ÿ’ฐ $12,000,000 Seed over 1 year agoArtificial Intelligence (AI)Business IntelligenceRisk ManagementFinTech

  • Bachelor's degree in Computer Science, Software Engineering, or a related field.
  • Proven experience as a Software Engineer, with a focus on infrastructure development and operations.
  • Strong programming skills in languages such as Python, Javascript, or Go.
  • Experience with cloud platforms (preferably AWS) and cost optimization strategies.
  • Familiarity with container orchestration (e.g., Kubernetes, Docker).
  • Expertise in Infrastructure as Code (IaC) tools, particularly Terraform (AWS CDK is a plus).
  • Design, scale, and maintain infrastructure to support Big Data workloads and real-time streaming systems such as Apache Spark, Hadoop, and Kafka.
  • Understanding of networking concepts, protocols, and security practices.
  • Proficiency in source control systems, especially Git.
  • Experience with CI/CD tools such as GitHub Actions and ArgoCD.
  • Familiarity with observability tools (Datadog, New Relic, etc.) for monitoring and logging.
  • Excellent problem-solving skills and the ability to work in a collaborative environment.
  • Strong communication skills to effectively share knowledge with team members.
  • Experience in the Risk, Underwriting, and/or Payments Industry is a plus.
  • Design and develop cloud infrastructure components and services to support our AI-driven platforms.
  • Collaborate with software engineers to integrate applications with underlying infrastructure.
  • Automate deployment processes and infrastructure management using Infrastructure as Code (IaC) practices.
  • Implement monitoring and logging strategies to optimize system performance and availability.
  • Optimize infrastructure for cost efficiency, ensuring resources are utilized effectively without compromising performance.
  • Coordinate with security teams to ensure the infrastructure is compliant with best practices and standards.
  • Troubleshoot and resolve infrastructure-related issues efficiently.
  • Continuously evaluate, recommend, and implement changes to improve system reliability and performance.
  • Maintain documentation for infrastructure services and processes.
  • Support on-call rotation as needed for critical infrastructure issues.
  • Other Duties as assigned

AWSDockerPythonSQLApache HadoopCloud ComputingGitHadoopJavascriptKafkaKubernetesAlgorithmsApache KafkaData StructuresGoCI/CDRESTful APIsLinuxDevOpsTerraformMicroservicesNetworkingSoftware Engineering

Posted 15 days ago
Apply
Apply

๐Ÿ“ United States, Canada

๐Ÿงญ Full-Time

๐Ÿ” Software Development

๐Ÿข Company: Global InfoTek, Inc.

  • 10-12 years of experience in cloud engineering
  • Working knowledge of AWS, Azure, or Google Cloud
  • Experience with programming languages like Python, Java, or C#
  • Design and implement cloud infrastructure
  • Engineer integration of applications into cloud and hybrid environments
  • Monitor cloud system performance and optimize resources

AWSDockerPythonApache HadoopCybersecurityElasticSearchKubernetesC#AzureCI/CDTerraformNetworkingAnsible

Posted 26 days ago
Apply
Apply

๐Ÿ“ USA

๐Ÿ’ธ 176000.0 - 207000.0 USD per year

๐Ÿ” Cybersecurity

๐Ÿข Company: Abnormal Security๐Ÿ‘ฅ 501-1000๐Ÿ’ฐ $250,000,000 Series D 7 months agoArtificial Intelligence (AI)EmailInformation TechnologyCyber SecurityNetwork Security

  • 5+ years of experience as a data engineer or similar role, with hands-on experience in building data-focused solutions.
  • Expertise in ETL, data pipeline design, and data engineering tools and technologies (e.g., Apache Spark, Hadoop, Airflow, Kafka).
  • Experience with maintaining real-time and near real-time data pipelines or streaming services at high scale.
  • Experience with maintaining large scale distributed systems on cloud platforms such as AWS, GCP, or Azure.
  • Background in implementing data quality frameworks, including validation, monitoring, and anomaly detection.
  • Proven ability to collaborate effectively with cross-functional teams.
  • Excellent problem-solving skills and ability to work independently in a fast-paced environment.
  • Architect, design, build, and deploy backend ETL jobs and infrastructure that support a world-class Detection Engine.
  • Ownership projects that enable us to meet ambitious goals, including scaling components of Detectionโ€™s Data Pipeline by 10x.
  • Own real-time, near real-time streaming pipelines and online feature serving services.
  • Collaborate closely with MLE and Data Science teams, distilling feedback and executing strategy.
  • Coach and mentor junior engineers through 1on1s, pair programming, code reviews, and design reviews.

AWSApache AirflowApache HadoopETLGCPKafkaAzureData engineering

Posted about 1 month ago
Apply
Apply

๐Ÿ“ United States, India

๐Ÿงญ Contract, Part-Time, Full-Time

๐Ÿ” Life sciences

๐Ÿข Company: ValGenesis๐Ÿ‘ฅ 501-1000๐Ÿ’ฐ $24,000,000 Private almost 4 years agoPharmaceuticalMedical DeviceSoftware

  • Bachelorโ€™s or Masterโ€™s in Computer Science, Data Science, or related field.
  • 8+ years in AI/ML solution development.
  • Proven software development experience in life sciences or regulated industries.
  • Strong analytical thinking and problem-solving skills.
  • Excellent communication and collaboration abilities.
  • Knowledge of life sciences validation processes and regulatory compliance.
  • Build scalable AI/ML models for document classification, intelligent search, and predictive analytics.
  • Implement image processing solutions for visual inspections and anomaly detection.
  • Define AI architecture and select technologies from open-source and commercial offerings.
  • Deploy AI/ML solutions in cloud-based environments with a focus on high availability and security.
  • Mentor a team of AI/ML engineers, fostering collaborative research and development.

AWSDockerPostgreSQLPythonSQLApache HadoopArtificial IntelligenceCloud ComputingGitImage ProcessingJenkinsKubernetesMachine LearningMongoDBNumpyOpenCVPyTorchTableauAzurePandasSparkTensorflowCI/CDComplianceData visualization

Posted about 1 month ago
Apply
Apply

๐Ÿ“ United States, Canada

๐Ÿงญ Full-Time

๐Ÿ” Information Technology

  • 5+ years of experience in IT focusing on full stack development
  • 3+ years of software engineering for front-end and back-end applications
  • Experience with Apache Hadoop, Jenkins, and microservices architecture
  • Develop and deploy applications in AWS cloud environments
  • Create and manage CI/CD pipelines
  • Collaborate with functional teams on application development

AWSApache HadoopJavaJavascriptJenkinsKotlinKubernetesSpring BootTypeScriptCassandragRPCPrometheusTomcatReactSeleniumSparkCI/CDMicroservicesJSONScala

Posted about 2 months ago
Apply
Apply

๐Ÿ“ United States

๐Ÿงญ Full-Time

๐Ÿ’ธ 200000.0 - 255000.0 USD per year

๐Ÿ” Blockchain intelligence and financial crime prevention

๐Ÿข Company: TRM Labs๐Ÿ‘ฅ 101-250๐Ÿ’ฐ $70,000,000 Series B over 2 years agoCryptocurrencyComplianceBlockchainBig Data

  • Academic background in a quantitative field such as Computer Science, Mathematics, Engineering, or Physics.
  • Strong knowledge of algorithm design and data structures with practical application experience.
  • Experience optimizing large-scale distributed data processing systems like Apache Spark, Apache Hadoop, Dask, and graph databases.
  • Experience in converting academic research into products with a history of collaborating on feature releases.
  • Strong programming experience in Python and SQL.
  • Excellent communication skills for technical and non-technical audiences.
  • Delivery-oriented with the ability to lead feature development from start to finish.
  • Autonomous ownership of work, capable of moving swiftly and efficiently.
  • Knowledge of basic graph theory concepts.
  • Designing and implementing graph algorithms that analyze large cryptocurrency transaction networks at multi-blockchain scale.
  • Researching new graph-native technology to evaluate benefit to data science and data engineering teams at TRM.
  • Collaborating with cryptocurrency investigators to identify key user stories and requirements for new graph algorithms and features.
  • Understanding and refining TRMโ€™s risk models to assign risk scores to addresses.
  • Communicating complex implementation details to various audiences from investigators to data engineers.
  • Integrating with diverse data inputs ranging from raw blockchain data to model outputs.

PythonSQLApache HadoopAlgorithmsData engineeringData scienceData Structures

Posted about 2 months ago
Apply
Apply

๐Ÿ“ US

๐Ÿ’ธ 177309.0 - 310291.0 USD per year

๐Ÿ” Software Development

๐Ÿข Company: Pinterest๐Ÿ‘ฅ 5001-10000๐Ÿ’ฐ Post-IPO Equity over 2 years ago๐Ÿซ‚ Last layoff about 2 years agoInternetSocial NetworkSoftwareSocial MediaSocial Bookmarking

  • 4+ years of industry experience applying machine learning methods (e.g., user modeling, personalization, recommender systems, search, ranking, natural language processing, reinforcement learning, and graph representation learning)
  • End-to-end hands-on experience with building data processing pipelines, large scale machine learning systems, and big data technologies (e.g., Hadoop/Spark)
  • MS/PhD in Computer Science, ML, NLP, Statistics, Information Sciences, related field, or equivalent experience.
  • Build cutting edge technology using the latest advances in deep learning and machine learning to personalize Pinterest
  • Partner closely with teams across Pinterest to experiment and improve ML models for various product surfaces (Homefeed, Ads, Growth, Shopping, and Search), while gaining knowledge of how ML works in different areas
  • Use data driven methods and leverage the unique properties of our data to improve candidates retrieval
  • Work in a high-impact environment with quick experimentation and product launches
  • Keeping up with industry trends in recommendation systems

PythonSoftware DevelopmentApache HadoopMachine LearningNumpyPyTorchAlgorithmsData StructuresSparkTensorflow

Posted 2 months ago
Apply
Shown 10 out of 14