Junior Data Engineer

Posted about 1 month agoViewed

💎 Seniority level: Junior, 1 year

📍 Location: Worldwide

💸 Salary: 90000.0 - 105000.0 CAD per year

🔍 Industry: Blockchain

🗣️ Languages: English

⏳ Experience: 1 year

🪄 Skills: PythonSQLCloud ComputingData AnalysisGitSnowflakeData engineeringTroubleshootingData visualization

At least 1 year of experience of IT experience (including co-op and internship experience)
Proficiency in SQL and at least one mature programming language (ideally Python)
Strong communication skills
Strong ability to write clear, concise and accurate documentation.
Strong experience with investigating and resolving data quality issues.
Strong skills in data analysis and visualization.
Ensure accuracy and reliability in data reporting and analysis.
Thrive in a fast-paced environment, adapting to new challenges as they arise.
Must have a passion/desire to work in and learn this space because will be working with blockchain data and block explorers on a daily basis.

Develop and maintain dashboards and reports.
Query databases, review and process data to support data-driven decision-making.
Investigating new chains and their corresponding block explorers in order to figure out ways of collecting data from these chains.
Write instructions for our Data Entry Team and assist Engineering Manager with coordination of Data Entry Team.
Review ingested data to flag issues and drive the investigation of any data quality issues.
When not working on manual and hybrid processes that will be required to do the above, will be working on automating them to be able to have growing capacity to work on other opportunities.
Work with internal teams like Engineering, Product, Finance, and Customer Success to deliver tailored data solutions.

Posted 6 days ago

📍 LatAm

🧭 Full-Time

🔍 E-Learning

🔧 Requirements

1-3 years of experience working with PySpark and Apache Spark in Big Data environments.
Experience with SQL and relational and NoSQL databases (PostgreSQL, MySQL, MongoDB, etc.).
Knowledge of ETL processes and data processing in distributed environments.
Familiarity with Apache Hadoop, Hive, or Delta Lake.
Experience with cloud storage (AWS S3, Google Cloud Storage, Azure Blob).
Proficiency in Git and version control.
Strong problem-solving skills and a proactive attitude.
A passion for learning and continuous improvement.

💡 Responsibilities

Design, develop, and optimize data pipelines using PySpark and Apache Spark.
Integrate and process data from multiple sources (databases, APIs, files, streaming).
Implement efficient data transformations for Big Data in distributed environments.
Optimize code to improve performance, scalability, and efficiency in data processing.
Collaborate with Data Science, BI, and DevOps teams to ensure seamless integration.
Monitor and debug data processes to ensure quality and reliability.
Apply best practices in data engineering and maintain clear documentation.
Stay up to date with the latest trends in Big Data and distributed computing.

PostgreSQLSQLApache HadoopCloud ComputingETLGitMongoDBMySQLApache Kafka

Posted 6 days ago