Foundational understanding of distributed systems (partitioning, replication, fault tolerance) Experience or curiosity with columnar formats (Parquet, ORC) and low-level data encoding Familiarity with metadata-driven architectures or data query planning Exposure to or hands-on use of Spark, Flink, or similar distributed engines on cloud storage Proficiency in Java, Rust, Go, or C++ Curiosity about how compression, entropy, and representation shape system efficiency and learning