Data Engineer – Databricks & Lakehouse
Inactive
Remote opportunity based in IndiaFull-TimeSenior
This job is no longer active. We keep the page for reference, but the employer may not accept new applications.
Salary not disclosed
Job Details
- Experience
- 8+ years
- Required Skills
- PythonSQLMicrosoft Power BISparkCI/CDDevOpsData modelingDatabricks
Requirements
- 8+ years of experience in data engineering or related roles within large-scale enterprise environments.
- Strong hands-on experience with Databricks, Apache Spark, and Delta Lake.
- Advanced proficiency in SQL and Python for data processing and pipeline development.
- Experience working with Power BI or similar BI and visualization tools.
- Solid understanding of data modeling, business logic translation, and enterprise data architecture.
- Experience with cloud data platforms such as Azure Data Factory, Synapse, or data lake environments.
- Familiarity with CI/CD pipelines, DevOps practices, and automated deployment workflows.
- Strong analytical and problem-solving skills.
- Exposure to AI-assisted development tools is a plus.
Responsibilities
- Design, build, and maintain scalable data pipelines using Databricks, Spark, and Delta Lake within a Lakehouse architecture.
- Develop and manage bronze, silver, and gold layer transformations, ensuring optimized performance, reliability, and cost efficiency.
- Integrate data from enterprise systems such as ERP, CRM, APIs, and other internal platforms into unified data models.
- Build curated, business-ready datasets aligned with standardized definitions and Power BI semantic models.
- Implement data quality checks, validation rules, and testing frameworks to ensure production-grade reliability.
- Monitor pipeline performance, troubleshoot issues, and maintain consistency across development, testing, and production environments.
- Collaborate with BI teams, business stakeholders, and governance functions to ensure accurate and well-documented data models.
- Contribute to CI/CD practices, deployment processes, and data lineage tracking for improved transparency and control.