Skip to content

ShivamJha2436/go-mapreduce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Go MapReduce

This repository contains a fully functional implementation of the MapReduce programming model in Go, inspired by Google’s seminal MapReduce paper. The project demonstrates a clean, concurrent, and modular design for processing large amounts of data in a distributed-style workflow, even though it runs locally. This implementation is intended both as a learning exercise and as a foundation for understanding the inner workings of MapReduce frameworks.

The core idea behind MapReduce is to process large datasets by dividing the work into two phases: a Map phase, where input data is transformed into intermediate key-value pairs, and a Reduce phase, where those intermediate results are aggregated to produce the final output. In between, a Shuffle phase groups all values by key, ensuring that the reduce function receives all relevant data for each key. This project faithfully follows this architecture while leveraging Go’s concurrency primitives for parallel execution.

The Map phase in this framework reads each input file and applies a user-defined Map function to transform raw data into intermediate key-value pairs. Each file is processed concurrently using goroutines, enabling multiple map tasks to run in parallel. The framework ensures safe accumulation of intermediate results with a mutex, preventing race conditions during concurrent writes. Once all map tasks are complete, the Shuffle phase consolidates intermediate pairs by key, effectively preparing them for the Reduce phase.

The Reduce phase executes user-defined Reduce functions for each key concurrently. Each goroutine sums or processes all values associated with a specific key and writes the results in a thread-safe manner to the final output. This approach mirrors the worker-based execution model used in distributed MapReduce frameworks while remaining lightweight and easy to understand in Go.

For demonstration, this project includes a Word Count application, one of the canonical examples of MapReduce. The Map function splits text files into words and emits (word, 1) pairs, while the Reduce function sums all counts for each unique word. The framework reads files from the data/ directory, executes the map, shuffle, and reduce phases concurrently, and writes the aggregated results into the output/result.txt file. This modular design allows users to replace the Map and Reduce functions with custom logic for other types of computation.

The project is structured with a focus on clarity, modularity, and concurrency best practices. The internal/mr package contains the core MapReduce engine, including task execution, shuffling, and utility functions for output. The cmd directory hosts the example application demonstrating the framework in action. All files are designed with generic, reusable functions, making it easy to extend the framework for more complex or large-scale workloads in the future.

By building this project from scratch in Go, readers gain a deep understanding of the MapReduce lifecycle, including how tasks are scheduled, how intermediate data is shuffled, and how concurrency is handled safely and efficiently. It also serves as a practical guide to applying Go’s goroutines, channels, mutexes, and slices to solve real-world data processing problems in a clean and idiomatic way.

How to Run

To try out this MapReduce framework:

  • Clone the repository.
  • Add some text files inside the data/ directory.
  • Run the example Word Count program:
go run cmd/main.go
  • View the results in the console or in the output file output/result.txt.

This project is ideal for anyone wanting to understand distributed data processing, concurrency in Go, and the architecture behind large-scale frameworks like Hadoop and Spark, all in a compact and educational implementation.

About

A concurrent MapReduce framework implemented in Go, inspired by Google’s MapReduce paper.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages