Applyπ United States
π Software Development
- Exposure to industry standard data modeling tools (e.g., ERWin, ER Studio, etc.).
- Exposure to Extract, Transform & Load (ETL) tools like Informatica or Talend
- Exposure to industry standard data catalog, automated data discovery and data lineage tools (e.g., Alation, Collibra, TAMR etc., )
- Hands-on experience in programming languages like Java, Python or Scala
- Hands-on experience in writing SQL scripts for Oracle, MySQL, PostgreSQL or HiveQL
- Experience with Big Data / Hadoop / Spark / Hive / NoSQL database engines (i.e. Cassandra or HBase)
- Exposure to unstructured datasets and ability to handle XML, JSON file formats
- Work independently as well as with a team to develop and support Ingestion jobs
- Evaluate and understand various data sources (databases, APIs, flat files etc. to determine optimal ingestion strategies
- Develop a comprehensive data ingestion architecture, including data pipelines, data transformation logic, and data quality checks, considering scalability and performance requirements.
- Choose appropriate data ingestion tools and frameworks based on data volume, velocity, and complexity
- Design and build data pipelines to extract, transform, and load data from source systems to target destinations, ensuring data integrity and consistency
- Implement data quality checks and validation mechanisms throughout the ingestion process to identify and address data issues
- Monitor and optimize data ingestion pipelines to ensure efficient data processing and timely delivery
- Set up monitoring systems to track data ingestion performance, identify potential bottlenecks, and trigger alerts for issues
- Work closely with data engineers, data analysts, and business stakeholders to understand data requirements and align ingestion strategies with business objectives.
- Build technical data dictionaries and support business glossaries to analyze the datasets
- Perform data profiling and data analysis for source systems, manually maintained data, machine generated data and target data repositories
- Build both logical and physical data models for both Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) solutions
- Develop and maintain data mapping specifications based on the results of data analysis and functional requirements
- Perform a variety of data loads & data transformations using multiple tools and technologies.
- Build automated Extract, Transform & Load (ETL) jobs based on data mapping specifications
- Maintain metadata structures needed for building reusable Extract, Transform & Load (ETL) components.
- Analyze reference datasets and familiarize with Master Data Management (MDM) tools.
- Analyze the impact of downstream systems and products
- Derive solutions and make recommendations from deep dive data analysis.
- Design and build Data Quality (DQ) rules needed
AWSPostgreSQLPythonSQLApache AirflowApache HadoopData AnalysisData MiningErwinETLHadoop HDFSJavaKafkaMySQLOracleSnowflakeCassandraClickhouseData engineeringData StructuresREST APINosqlSparkJSONData visualizationData modelingData analyticsData management
Posted 2 days ago
Apply