Lab 1: MapReduce

The template for this repo is forked from git://g.csail.mit.edu/6.5840-golabs-2025, but the implementation is mine.
The repo will contain all the lab assingments for MIT-6.824: Distributed Systems.

Lab 1: MapReduce

Overview

A Go implementation of MapReduce, a programming model for processing large datasets across distributed systems. MapReduce simplifies distributed data processing by abstracting the complexities of parallelization, fault-tolerance, data distribution, and load balancing behind a clean programming interface.
This implementation is designed to understand the fundamentals of distributed computing, fault tolerance, and parallel processing.
This project was made using this paper as reference.,br> My implementation passes all the test cases except for the early-exit, and so far I haven't been able to pinpoint why exactly it is failing.

Architecture

The MapReduce system consists of two main components:

Master (Coordinator)

Coordinates the entire MapReduce operation
Distributes map and reduce tasks to workers
Monitors worker health and handles failures
Manages intermediate files and task scheduling
Maintains state information about task progress

Workers

Infitely requests the coordinator for tasks, until the coordinator exits.
Executes map and reduce tasks as assigned by the master
Handles input/output operations for their assigned tasks
Communicates with master about task completion and status
Can be dynamically added or removed from the cluster

Relevant Directories

src/
├── main/           # Main applications and test programs (provided in template )
├── mrapps/         # MapReduce applications (map and reduce functions) (provided in template )
└── mr/             # Core MapReduce library implementation (implemented by me)

Map Phase

Already partitioned input is distributed to map workers
Each map worker applies the map function to its input chunk
Map function outputs key-value pairs
Output is partitioned into nReduce buckets using hash function
Intermediate files are written to disk inter_{map-task-id}_{bucket-id}

Reduce Phase

Reduce workers read intermediate files from all map tasks
Data is sorted by key to group related values
Reduce function is applied to each unique key and its values
Final output is written to result files mr-out-{bucket-id}

Coordination

Master assigns tasks to available workers
Workers report task completion back to master
Master tracks task states and handles failures
Master detects failed workers through timeouts and automatically reassigns them to healthy workers
System proceeds to reduce phase after all map tasks complete

Specific Design Choices

Providing Map and Reduce functions dynamically at runtime via plugins.

Concept	Why It Matters	Lab Context
Dynamic Loading	Enables swapping Map/Reduce logic without recompiling workers	Lets you test different apps (wc, indexer) with same worker code
Type Assertion	Ensures loaded functions match expected signatures	Prevents runtime errors
Plugin System	Go-specific way to share compiled code	Alternative to RPC-based distribution

This specific implementation is meant for a single machine with threads cosplaying as worker nodes. Unix sockets seems like a better fit. They are faster and lighter for communication between processes on the same machine, because they avoid the full network stack.

Unix Domain Socket	TCP Socket
Communicates via a file on disk	Communicates via an IP/port
Only works for processes on same host	Can work across machines
Lower overhead, faster for local IPC	Slightly more overhead, network stack involved
Access controlled by filesystem permissions	Access controlled by firewall, IP, etc.

References

MapReduce: Simplified Data Processing on Large Clusters - Original Google paper
MIT 6.824 Distributed Systems - Course materials and assignments
Go RPC Package Documentation - RPC implementation details

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
src		src
.check-build		.check-build
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Lab 1: MapReduce

Overview

Architecture

Master (Coordinator)

Workers

Relevant Directories

Map Phase

Reduce Phase

Coordination

Specific Design Choices

References

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

Shashwat-Nautiyal/Distributed_Systems

Folders and files

Latest commit

History

Repository files navigation

Lab 1: MapReduce

Overview

Architecture

Master (Coordinator)

Workers

Relevant Directories

Map Phase

Reduce Phase

Coordination

Specific Design Choices

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages