CS 7780 - Seminar on Distributed Systems

General Information

Professor:David Choffnes
Room:Behrakis Health Sciences Cntr 007
Time:Mondays and Thursdays 11:45AM-1:25 PM
Office Hours:By appointment
Teaching Assistants:N/A
TA Email:cs7780fa15@ccs.neu.edu
Lab Hours:N/A
Class Forum:On Piazza
Paper List:Here

Course Description

In today’s increasingly connected world, distributed systems permeate our daily lives. This course will cover fundamental principles and theory of distributed systems, and will discuss the design and implementation of systems from industry that incorporate them. This course will be hands-on with projects based on real-world systems. Key topics include understanding and managing concurrency, consistency/consensus, availability, partition/fault tolerance, time and logical clocks, scalability/performance, and security. We will discuss design and implementation concepts that include distributed file systems (e.g. NFS, HDFS), caches and distributed hash tables (e.g. Dynamo, Cassandra), distributed computing frameworks (Map/Reduce, Spark), overlay networks (Bittorrent, BitCoin) and content delivery networks (Akamai, Limelight), data center architectures and protocols. Students will learn about these concepts and practices both from textbook/lecture material and hands-on experience through projects that include building, deploying, and evaluating working distributed systems.

Prerequisites

The basic requirement is that you are a Ph.D. student who has taken an upper-level networking and/or systems course, preferably both. Exceptions will be handled on a case-by-case basis.

Note for MS/Undergrad students You may sign up for this course only with prior consent from the instructor (me!). To be considered for this class, you must be interested in research in the field, and have done exceptionally well in my FCN course.

Class Forum

The class forum is on Piazza. Why Piazza? Because they have a nice web interface, as well as iPhone and Android apps. Piazza is the best place to ask questions about projects, programming, debugging issues, exams, etc. If you have questions while in lecture feel free to post to folders for the lecture. I will also use Piazza to broadcast announcements to the class. Bottom line: unless you have a private problem, post to Piazza before writing me an e-mail.

Schedule, Meeting Slides, and Assigned Readings/Presentations

Meeting DateSlidesPiazza FolderReadingsPresentersComments
Sept. 10 Welcome: Overview and Challenges
Slides
#meeting1 Overview: 1, 2, 3 DRC Join Piazza
Sept. 14 No class: Rosh Hashanah
Sept. 17 Core Topics: CAP, Time/Clocks
Slides
#meeting2 Overview: 4, 5
Time: 3, 4
DRC
Sept. 21 Core topic: Consistency/Consensus: 2PC, Paxos, and in between
Slides
#meeting3 Consensus: 6, 9, 8 Consistency: 4
DRC
Sept. 24 NSDI Deadline #meeting4 Project proposal due Friday
Sept. 28 Core topic: Fault Tolerance: RAID, Erasure encoding, Byzantine Fault Tolerance #meeting5 Fault tolerance/Consistency: 3
File systems: 5, 9
Sept. 31 Core topic: Availability: Redundancy, Distribution #meeting6 Fault tolerance / Consistency: 6, 7, 10
Oct. 5 Application: Distributed and remote processing: RPC, MPI, MR, Spark #meeting7 RPC: 1
Distributed computation: 4, 2
Oct. 8 Application: Distributed caches: Memcache #meeting8 Distributed Cache: 1, 2
Oct. 12 No Class: Columbus Day
Oct. 15 Application: DHTs: Dynamo, Chord, Kademlia #meeting9 Overlay/P2P: 2, 6 Consistency: 6
Oct. 19 Field trip to NENS at BU
Oct. 22 Application: Distributed File Systems: Early attempts (NFS, ...) #meeting10 File systems: 1, 2
Oct. 26 Application: Distributed File Systems: Modern systems (HDFS, Spanner, f4 ...) #meeting11 File systems: 6, 7 Mid-term report due
Oct. 29 No Class: IMC
Nov. 2 Guest Lecture #meeting12 TBD
Nov. 5 Application: Datacenter Networks: Topologies and augmentation #meeting13 DCNs: 1, 2
Nov. 9 Application: The Internet: Routing, DNS
#meeting14 Internet: 3, 5, 6
Nov. 12 Application: CDNs: Akamai, Anycast #meeting15 CDN: 1, 4, 5
Nov. 16 Field trip to DTL #meeting16
Nov. 19 Application: Privacy/Anonymity: BitCoin, Tracking, Hiding #meeting17 Overlay/P2P: 5, 4, Herd
Nov. 23 Application: Overlays: Unstructured, Structured #meeting18 Overlay/P2P: 1, 8
Nov. 26 No Class: Thanksgiving
Nov. 30 Application: Management: SDNs and New Protocols #meeting19 Internet: 8
New environments: 1
Dec. 3 Application: Security: Botnets, DDoS, PKI #meeting20 Overlay/P2P: 7, IMC PKI paper
Dec. 7 Presentations #meeting21
Dec. 10 Presentations #meeting22
Dec. 14 Reports due

Textbook

The focus of this course will be on lectures, presentation of research papers, discussion, and original reseach projects/presentations. Thus, I do not require that you get a textbook. However, a textbook may be useful if you are not totally comfortable with fundamentals, or if you just want to have a handy reference book. I will post suggestions soon.

Reading and Participation

As previously mentioned, a large component of this course will be reading important papers from the distributed sytems research community. Some of these papers are classics: older, but intrumental in guiding the design of today's networks. Other papers will be more contemporary, and focus on improving existing networks, or even replacing them entirely. All the papers can be found here (Link TBD).

One to two papers will be assigned as reading before each meeting. Each paper will be presented by one student, after which we will have a discussion about questiosn raised during the presentation, the merit/impact of the work, and other comments inspired by the research.

During class, students may be called at random to answer questions about papers. Thus, although attendance in lectures is not required, if you get called and you are not present (or you haven't read the paper), then you are busted.

25% of your final grade will be based on participation in the form of giving presentations and participating in discussions.

Projects

Each student will take on an original research project that entails designing, building, evaluating, and/or measuring a distributed system. As you might imagine, the variety of topics that fit this descritpoin is vast. This project will culminate with a project presentation in front of your peers (20') and a writeup of results that is no shorter than 6 pages (in standard sig-alternate format). The project will be worth 75% of your grade: 10% for the proposal, 25% for the presentation, 40% for the writeup.

Students must provide the instructor with a project proposal no more than 2 pages in length, and must obtain approval for the instructor. Failure to do so will result in a zero for the project. The proposal must include: (1) a brief project summary, (2) A description of the problem being solved, (3) a brief outline of research questions you will answer during your work, (4) a short description of how you will address these questions, and (5) a timeline for completing the work described in (4).

I expect students to work alone on projects, but will make exceptions if the research requires a small group.

Exams

There will be no exams. Instead, your project presentation will serve as an oral exam.

Grading

Project proposal:10%
Project presentation and writeup:25% and 40%
Participation:25%

To calculate final grades, I simply sum up the points obtained by each student (the points will sum up to some number x out of 100) and then use the following scale to determine the letter grade: [0-60] F, [60-62] D-, [63-66] D, [67-69] D+, [70-72] C-, [73-76] C, [77-79] C+, [80-82] B-, [83-86] B, [87-89] B+, [90-92] A-, [93-100] A. I do not curve the grades in any way.

Late Policy

This is a pretty straightforward seminar. Please don't turn in your project late, or miss your presentation. If I can't grade you before the grade-submission deadline, I'll have to give you a zero.

Cheating Policy

Projects must be entirely the work of the students turning them in, i.e. you and your group members. Copying code or text from other students (past or present) or websites is strictly prohibited. If you have any questions about using a particular resource, ask the course staff or post a question to the class forum.

All students are subject to the Northeastern University Academic Integrity Policy. All cases of suspected plagiarism or other academic dishonesty will be referred to the Office of Student Conduct and Conflict Resolution (OSCCR).

Consequences of Violating Academic Integrity Policy

  1. All students who are caught cheating will be referred to the Office of Student Conduct and Conflict Resolution (OSCCR). Students who have been referred to OSCCR will be given the opportunity to accept responsibility for their infraction or to request a hearing before a student conduct board. If a student accepts responsibility for a minimum sanction of deferred suspension will follow. A second violation will meet with expulsion from the University.
  2. All students who are caught cheating will receive a -100% for the item on which cheating occurred. Cheating is worse than not turning in the item.

Someone may already have written a program that does part of what you'll need to do for your assignments. For your assignments, however, you are expected to write all of the source code yourself, without copying source code from any other program, even if there are programs out there that would allow you to copy their source code. You also should not post your work for others to obtain.

You may discuss problems with other students, but you should not share or show code to anyone other than your assigned partner.

You are responsible for keeping your code hidden from all other students. If you keep your local repository on our CCS servers, make sure that it is protected 600. Leaving it group- or world- readable means that anyone can steal your work. Your home directory includes, by default, a directory called classes that is readable only by you. Put all your class work here. If you put class material in some unprotected directory, and somebody else copies it, you will be held responsible.