Apply📍 United States
🔍 Consulting
- 2+ years of hands-on experience with Azure Databricks and Apache Spark for large-scale data processing.
- Strong programming skills in Python, Scala, and SQL.
- Expertise in Azure Data Lake, Blob Storage, Azure Synapse, and Azure Data Factory.
- Hands-on experience with CDC tools and frameworks, including Debezium, SQL Server CDC, or similar technologies.
- Expertise in configuring and managing CDC pipelines within Azure cloud.
- Experience with Delta Lake architecture for data reliability and performance.
- Knowledge of Spark job performance tuning and optimization strategies.
- Strong understanding of data security and governance in cloud environments.
- Experience with CI/CD for Databricks and Infrastructure as Code (Terraform, ARM templates).
- Design and implement data ingestion and transformation pipelines using Azure Databricks and other Azure data services.
- Develop ETL/ELT processes for structured, semi-structured, and unstructured data.
- Optimize and tune Apache Spark jobs for performance and cost efficiency.
- Build and manage scalable data lakehouse solutions using Delta Lake and Azure Data Lake Storage.
- Integrate Databricks with Azure Synapse Analytics, Data Factory, and other Azure resources.
- Implement security best practices: role-based access control, encryption, and data masking.
- Collaborate with data scientists and analysts to operationalize machine learning models using MLflow.
- Automate workflows with Databricks Jobs and CI/CD pipelines.
- Monitor and troubleshoot performance issues in Databricks clusters and Spark applications.
PythonSQLCI/CDTerraformScala
Posted 2 months ago
Apply