Azure DataBricks Engineer

Posted 5 months agoViewed

💎 Seniority level: Solidne doświadczenie w pracy w roli data engineera

📍 Location: USA, America/New_York, NOT STATED

🏢 Company: Crodu

🗣️ Languages: English

⏳ Experience: Solidne doświadczenie w pracy w roli data engineera

🪄 Skills: PythonSQLMicrosoft AzureMySQLOracleAzureNosqlSpark

Solidne doświadczenie w pracy w roli data engineera.
Bardzo dobra znajomość platformy DataBricks oraz Apache Spark.
Bardzo dobra znajomość Python.
Doświadczenie w przeprowadzaniu migracji danych.
Doświadczenie w pracy w środowisku Microsoft Azure (np. Data Factory, Synapse, Logic Apps, Data Lake).
Bardzo dobra znajomość baz SQL i noSQL.
Umiejętności interpersonalne i zespołowe.
Umiejętność podejmowania inicjatywy i samodzielność.
Angielski na poziomie umożliwiającym swobodną komunikację w zespole.

Zaplanowanie i przeprowadzenie migracji baz danych.
Integracja baz danych w czasie zbliżonym do rzeczywistego.
Ścisła współpraca z zespołem data engineerów, data scientistów oraz architektów.

Posted 2 months ago

📍 United States

🔍 Consulting

🔧 Requirements

2+ years of hands-on experience with Azure Databricks and Apache Spark for large-scale data processing.
Strong programming skills in Python, Scala, and SQL.
Expertise in Azure Data Lake, Blob Storage, Azure Synapse, and Azure Data Factory.
Hands-on experience with CDC tools and frameworks, including Debezium, SQL Server CDC, or similar technologies.
Expertise in configuring and managing CDC pipelines within Azure cloud.
Experience with Delta Lake architecture for data reliability and performance.
Knowledge of Spark job performance tuning and optimization strategies.
Strong understanding of data security and governance in cloud environments.
Experience with CI/CD for Databricks and Infrastructure as Code (Terraform, ARM templates).

💡 Responsibilities

Design and implement data ingestion and transformation pipelines using Azure Databricks and other Azure data services.
Develop ETL/ELT processes for structured, semi-structured, and unstructured data.
Optimize and tune Apache Spark jobs for performance and cost efficiency.
Build and manage scalable data lakehouse solutions using Delta Lake and Azure Data Lake Storage.
Integrate Databricks with Azure Synapse Analytics, Data Factory, and other Azure resources.
Implement security best practices: role-based access control, encryption, and data masking.
Collaborate with data scientists and analysts to operationalize machine learning models using MLflow.
Automate workflows with Databricks Jobs and CI/CD pipelines.
Monitor and troubleshoot performance issues in Databricks clusters and Spark applications.

PythonSQLCI/CDTerraformScala

Posted 2 months ago