Professor: | David Choffnes |
Room: | Behrakis Health Sciences Cntr 007 |
Time: | Mondays and Thursdays 11:45AM-1:25 PM |
Office Hours: | By appointment |
Teaching Assistants: | N/A |
TA Email: | cs7780fa15@ccs.neu.edu |
Lab Hours: | N/A |
Class Forum: | On Piazza |
Paper List: | Here |
In today’s increasingly connected world, distributed systems permeate our daily lives. This course will cover fundamental principles and theory of distributed systems, and will discuss the design and implementation of systems from industry that incorporate them. This course will be hands-on with projects based on real-world systems. Key topics include understanding and managing concurrency, consistency/consensus, availability, partition/fault tolerance, time and logical clocks, scalability/performance, and security. We will discuss design and implementation concepts that include distributed file systems (e.g. NFS, HDFS), caches and distributed hash tables (e.g. Dynamo, Cassandra), distributed computing frameworks (Map/Reduce, Spark), overlay networks (Bittorrent, BitCoin) and content delivery networks (Akamai, Limelight), data center architectures and protocols. Students will learn about these concepts and practices both from textbook/lecture material and hands-on experience through projects that include building, deploying, and evaluating working distributed systems.
The basic requirement is that you are a Ph.D. student who has taken an upper-level networking and/or systems course, preferably both. Exceptions will be handled on a case-by-case basis.
Note for MS/Undergrad students You may sign up for this course only with prior consent from the instructor (me!). To be considered for this class, you must be interested in research in the field, and have done exceptionally well in my FCN course.
The class forum is on Piazza. Why Piazza? Because they have a nice web interface, as well as iPhone and Android apps. Piazza is the best place to ask questions about projects, programming, debugging issues, exams, etc. If you have questions while in lecture feel free to post to folders for the lecture. I will also use Piazza to broadcast announcements to the class. Bottom line: unless you have a private problem, post to Piazza before writing me an e-mail.
Meeting Date | Slides | Piazza Folder | Readings | Presenters | Comments |
---|---|---|---|---|---|
Sept. 10 | Welcome: Overview and Challenges Slides |
#meeting1 | Overview: 1, 2, 3 | DRC | Join Piazza |
Sept. 14 | No class: Rosh Hashanah | ||||
Sept. 17 | Core Topics: CAP, Time/Clocks Slides |
#meeting2 | Overview: 4, 5 Time: 3, 4 |
DRC | |
Sept. 21 | Core topic: Consistency/Consensus: 2PC, Paxos, and in between
Slides |
#meeting3 | Consensus: 6, 9, 8
Consistency: 4 |
DRC | |
Sept. 24 | NSDI Deadline | #meeting4 | Project proposal due Friday | ||
Sept. 28 | Core topic: Fault Tolerance: RAID, Erasure encoding, Byzantine Fault Tolerance | #meeting5 | Fault tolerance/Consistency: 3 File systems: 5, 9 |
||
Sept. 31 | Core topic: Availability: Redundancy, Distribution | #meeting6 | Fault tolerance / Consistency: 6, 7, 10 | ||
Oct. 5 | Application: Distributed and remote processing: RPC, MPI, MR, Spark | #meeting7 | RPC: 1 Distributed computation: 4, 2 |
||
Oct. 8 | Application: Distributed caches: Memcache | #meeting8 | Distributed Cache: 1, 2 | ||
Oct. 12 | No Class: Columbus Day | ||||
Oct. 15 | Application: DHTs: Dynamo, Chord, Kademlia | #meeting9 | Overlay/P2P: 2, 6 Consistency: 6 | ||
Oct. 19 | Field trip to NENS at BU | ||||
Oct. 22 | Application: Distributed File Systems: Early attempts (NFS, ...) | #meeting10 | File systems: 1, 2 | ||
Oct. 26 | Application: Distributed File Systems: Modern systems (HDFS, Spanner, f4 ...) | #meeting11 | File systems: 6, 7 | Mid-term report due | |
Oct. 29 | No Class: IMC | ||||
Nov. 2 | Guest Lecture | #meeting12 | TBD | ||
Nov. 5 | Application: Datacenter Networks: Topologies and augmentation | #meeting13 | DCNs: 1, 2 | ||
Nov. 9 | Application: The Internet: Routing, DNS |
#meeting14 | Internet: 3, 5, 6 | ||
Nov. 12 | Application: CDNs: Akamai, Anycast | #meeting15 | CDN: 1, 4, 5 | ||
Nov. 16 | Field trip to DTL | #meeting16 | |||
Nov. 19 | Application: Privacy/Anonymity: BitCoin, Tracking, Hiding | #meeting17 | Overlay/P2P: 5, 4, Herd | ||
Nov. 23 | Application: Overlays: Unstructured, Structured | #meeting18 | Overlay/P2P: 1, 8 | ||
Nov. 26 | No Class: Thanksgiving | ||||
Nov. 30 | Application: Management: SDNs and New Protocols | #meeting19 | Internet: 8 New environments: 1 |
||
Dec. 3 | Application: Security: Botnets, DDoS, PKI | #meeting20 | Overlay/P2P: 7, IMC PKI paper | ||
Dec. 7 | Presentations | #meeting21 | |||
Dec. 10 | Presentations | #meeting22 | |||
Dec. 14 | Reports due |
The focus of this course will be on lectures, presentation of research papers, discussion, and original reseach projects/presentations. Thus, I do not require that you get a textbook. However, a textbook may be useful if you are not totally comfortable with fundamentals, or if you just want to have a handy reference book. I will post suggestions soon.
As previously mentioned, a large component of this course will be reading important papers from the distributed sytems research community. Some of these papers are classics: older, but intrumental in guiding the design of today's networks. Other papers will be more contemporary, and focus on improving existing networks, or even replacing them entirely. All the papers can be found here (Link TBD).
One to two papers will be assigned as reading before each meeting. Each paper will be presented by one student, after which we will have a discussion about questiosn raised during the presentation, the merit/impact of the work, and other comments inspired by the research.
During class, students may be called at random to answer questions about papers. Thus, although attendance in lectures is not required, if you get called and you are not present (or you haven't read the paper), then you are busted.
25% of your final grade will be based on participation in the form of giving presentations and participating in discussions.
Each student will take on an original research project that entails designing, building, evaluating, and/or measuring a distributed system. As you might imagine, the variety of topics that fit this descritpoin is vast. This project will culminate with a project presentation in front of your peers (20') and a writeup of results that is no shorter than 6 pages (in standard sig-alternate format). The project will be worth 75% of your grade: 10% for the proposal, 25% for the presentation, 40% for the writeup.
Students must provide the instructor with a project proposal no more than 2 pages in length, and must obtain approval for the instructor. Failure to do so will result in a zero for the project. The proposal must include: (1) a brief project summary, (2) A description of the problem being solved, (3) a brief outline of research questions you will answer during your work, (4) a short description of how you will address these questions, and (5) a timeline for completing the work described in (4).
I expect students to work alone on projects, but will make exceptions if the research requires a small group.
There will be no exams. Instead, your project presentation will serve as an oral exam.
Project proposal: | 10% |
Project presentation and writeup: | 25% and 40% |
Participation: | 25% |
To calculate final grades, I simply sum up the points obtained by each student (the points will sum up to some number x out of 100) and then use the following scale to determine the letter grade: [0-60] F, [60-62] D-, [63-66] D, [67-69] D+, [70-72] C-, [73-76] C, [77-79] C+, [80-82] B-, [83-86] B, [87-89] B+, [90-92] A-, [93-100] A. I do not curve the grades in any way.
This is a pretty straightforward seminar. Please don't turn in your project late, or miss your presentation. If I can't grade you before the grade-submission deadline, I'll have to give you a zero.
Projects must be entirely the work of the students turning them in, i.e. you and your group members. Copying code or text from other students (past or present) or websites is strictly prohibited. If you have any questions about using a particular resource, ask the course staff or post a question to the class forum.
All students are subject to the Northeastern University Academic Integrity Policy. All cases of suspected plagiarism or other academic dishonesty will be referred to the Office of Student Conduct and Conflict Resolution (OSCCR).
Consequences of Violating Academic Integrity Policy
Someone may already have written a program that does part of what you'll need to do for your assignments. For your assignments, however, you are expected to write all of the source code yourself, without copying source code from any other program, even if there are programs out there that would allow you to copy their source code. You also should not post your work for others to obtain.
You may discuss problems with other students, but you should not share or show code to anyone other than your assigned partner.
You are responsible for keeping your code hidden from all other students. If you keep your local repository on our CCS servers, make sure that it is protected 600. Leaving it group- or world- readable means that anyone can steal your work. Your home directory includes, by default, a directory called classes that is readable only by you. Put all your class work here. If you put class material in some unprotected directory, and somebody else copies it, you will be held responsible.