Understanding how to design and reason about systems that run across multiple machines. Covers consistency, fault tolerance, replication, distributed consensus, and the trade-offs inherent in distributed system design. Essential knowledge for modern cloud-based and scalable applications.
Required:
- Computer Networking - Understanding network communication is fundamental
- Operating Systems - Concurrency and system-level concepts apply to distributed systems
Recommended:
- Databases - Many distributed systems concepts overlap with distributed databases
- Algorithms and Data Structures - Distributed algorithms build on fundamental algorithmic concepts
- Any modern language (Intermediate) - Go, Java, Python, or others are suitable for implementing distributed systems
- Focus on understanding distributed system concepts rather than language specifics
-
Designing Data-Intensive Applications by Martin Kleppmann
- Highly readable, practitioner-oriented treatment
- Covers replication, partitioning, transactions, consistency, and distributed data systems
- Best modern introduction to distributed systems
- Level: Intermediate
- ISBN: 978-1449373320
-
Distributed Systems by Maarten van Steen and Andrew Tanenbaum (3rd Edition)
- Traditional textbook approach
- More formal than DDIA
- Level: Intermediate
- Available online
- Instructor: Robert Morris
- Institution: MIT
- Platform: MIT course site
- URL: Course site
- Description: Graduate-level course with excellent labs implementing distributed systems (MapReduce, Raft, key-value store)
- Challenging but highly regarded
- Readings include important papers in distributed systems
- MIT 6.824 readings - Essential papers in distributed systems
- Papers We Love - Distributed Systems - Community discussions of important papers
- Jepsen analyses - Real-world distributed system consistency analyses
-
Source: MIT 6.824 labs
-
Difficulty: Advanced
-
Estimated time: 100-200 hours
-
Topics: MapReduce, Raft consensus, fault-tolerant key-value store, sharded systems
-
Source: Papers from MIT 6.824 reading list
-
Difficulty: Intermediate to Advanced
-
Topics: MapReduce, GFS, Raft, Spanner, Dynamo, and more
-
Activity: Read papers and implement concepts
- Implement Raft consensus algorithm
- Build distributed key-value store
- Complete MIT 6.824 lab sequence
[Personal notes and key insights can be added here during study]
- Completed Designing Data-Intensive Applications
- Watched MIT 6.824 lectures
- Read essential distributed systems papers
- Completed MIT 6.824 labs or similar projects
- Review and reinforcement