5+ years of hands-on experience with Apache Spark using Scala exclusively. Proven track record of optimizing large-scale data pipelines for performance and cost. Strong AWS EMR experience, including fleet management and instance optimization. Proficiency in AWS Step Functions. Deep understanding of distributed computing principles and resource management. Experience debugging and tuning multi-terabyte daily workloads. Comfort working across Scala, Python, and SQL as needed. Experience with probabilistic data structures for high-cardinality data processing. Advanced troubleshooting abilities in distributed systems. Strong understanding of data skew mitigation strategies. Metrics-first mindset. Root cause analysis expertise. Cost-conscious mindset. Balance between innovation and operational excellence. Proactive in optimization. Communicate early and often, especially around risks and blockers. Ability to interpret needs beyond stated requirements.