Class Time and Place: T F 12:00pm - 1:50pm, Low 3051
Instructor: Stacy Patterson sep AT cs.rpi.edu
Office Hours: T F 2:00pm - 3:00pm or by appointment, Lally 301
TA: Tim Castiglia castit AT rpi.edu
TA Office Hours: W 1:00pm - 3:00pm, AE 127
This course explores the principles of distributed systems, emphasizing fundamental issues underlying the design of such systems: communication, coordination, synchronization, and fault-tolerance. We will study key algorithms and theoretical results and explore how these foundations play out in modern systems and applications like cloud computing, edge computing, and peer-to-peer systems.
Submitty course page
Project 0 - No due date
Your projects will evaluated on the CS department's servers using Docker containers. We will use Submitty for autograding of some test cases. Additional evaluation will be performed in live demonstrations of your code.
- Basic instructions for using Docker for this class are here.
- Instructions for managing a Docker Network are here
- Example knownhosts.json file
- Examples projects with build.sh and run.sh scripts: Python Java
Project 1 description - due Sunday, Oct 20, 2019 at 11:00pm
Project 2 description - due Tuesday, Nov 26, 2019 at 11:00pm
Some papers are behind a pay wall and can only be accessed from the RPI network.
(8/30/19) Time, clocks, and the ordering of events in a distributed system,
Leslie Lamport, Communications of the ACM, 1978.
The version of the algorithm presented in class can be found in Section 7 of Virtual time and global states of distributed systems, Friedemann Mattern, Parallel and Distributed Algorithms, 1989
A paper that presents a similar algorithm is Timestamps in Message-Passing Systems That Preserve the Partial Ordering, Colin J. Fidge, 1987. The paper by Fidge gives a slightly different algorithm and uses a different method to compare vector timestamps.
(9/10/19) Coulouris, et al. Sections 14.3
(9/13/19) Efficient solutions to the replicated log and dictionary problems, Gene T.J. Wuu and Arthur R. Berntsein, Principles of Distributed Computing, 1984.
(9/13/19) Intro to Mutual Exclusion. Coulouris, Sec. 15.2.
(9/20/19) A optimal algorithm
for mutual exclusion in computer networks, Glenn Ricart and Ashok K. Agrawala, Communications of the ACM, 1981.
Also see pages 561-562 in the first paper by L. Lamport
(9/27/19) A tree-based algorithm for distributed mutual exclusion, Kerry Raymond, ACM Transactions on Computer Systems, 1989.
(10/1/19) A √ N Algorithm for mutual exclusion in decentralized systems, Mamoru Maekawa, ACM Transactions on Computer Systems, 1985.
The Information Structure of Distributed Mutual Exclusion Algorithms, Beverly Sanders, ACM Transactions on Computer Systems, 1987
(10/4/19) Distributed snapshots: determining global states of distributed systems, K. Chandy and L. Lamport, ACM Transactions on Computer Systems, 1985.
(also Coulouris Sec. 14.5)
(10/21/19) Paxos Made Simple, L. Lamport, ACM SIGACT News, 2001.
The Part-Time Parliament, L. Lamport, ACM Transactions on Computer Systems, 1998.
(11/1/19) Concurrency Control and Recovery in Database Systems, Bernstein, Hadzilacos, and Goodman (2PC and 3PC in Chapter 7)
(11/8/19) Impossibility of Distributed Consensus with One Faulty Process, M. Fischer, N. Lynch, and M. Paterson, Journal of the ACM, 1985.
(11/15/19) The Byzantine Generals Problem, L. Lamport, R. Shostak, and M. Pease, ACM Transactions on Programming Languages and Systems, 1982.
(12/3/19) Perspectives on the CAP Theorem, S. Gilbert and N. Lynch, Computer, 2012
Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services, S. Gilbert and N. Lynch, ACM SIGACT News, 2002.
(12/3/19) Chord: a scalable peer-to-peer lookup protocol for Internet applications, I. Stoica, et al., IEEE/ACM Transactions on Networking, 2003.
(12/6/19) Dynamo: Amazon's Highly Available Key-value Store, G. DeCandia et al., Symposium on Operating Systems Principles, 2007.