Distributed MapReduce Framework
C++ MapReduce engine deployed on EC2 with custom shuffle and performance tuning.
Identified and explained the reducer lock bottleneck behind major non-barrier slowdowns during AWS performance sweeps.
- Built a distributed execution pipeline with mapper coordination, reducer services, and HDFS-backed output handling
- Implemented custom TCP shuffle with Protobuf serialization, bounded buffers, and optional barrier-based backpressure
- Ran AWS experiments across mapper counts, thread counts, and buffer sizes to isolate a reducer lock bottleneck

