Research: David Choffnes, Ph.D.

Research Vision (last updated March 18, 2019)

My research approach is to combine science and engineering to understand and improve the performance, reliability, and security of Internet systems. With respect to science, I empirically measure computer systems that interact over the Internet to understand how well they match existing models and assumptions, then investigate the root causes for violations of those models/assumptions—often then leading to the design of new models. In many cases, our observations also suggest the design of systems that exploit previously unknown information about how our Internet-enabled systems work, and as an engineer I build and evaluate such systems in a way that other researchers, users, and policy makers can benefit from the result.

My research agenda embraces a broad view of Internet systems, comprising core Internet routers and interconnections, home networks, mobile and Internet of Things (IoT) devices, cellular networks, and infrastructure servers such as DNS, content delivery networks (CDNs), and web servers; further, the subtopics of performance, reliability, and security include latency, loss, quality of experience, availability, anonymity, privacy, and vulnerability analysis. A common thread throughout my research is to focus on topics that impact the user and identify novel, deployable solutions to address pressing problems in today’s Internet ecosystem. As such, I tend to develop user-facing systems, gather data from large-scale user studies, and focus on regions of the network where most activity occurs (e.g., mobile networks) or where there is great impact (e.g., core Internet routing and CDNs).

My research has been published at venues such as SIGCOMM, IMC, Mobisys, CCS, USENIX Security, IEEE S&P, NDSS, and WWW. Beyond publications, my research has found broad impact through film, popular press, television, technology demonstrations, bug fixes, legislative guidance, and evidence for regulators. Specifically, my ReCon project became the foundation for a documentary film, Harvest, that appeared at numerous prestigious film festivals in 2017 and was selected as Vimeo’s pick of the month. ReCon has helped developers find and fix more than 30 privacy and security vulnerabilities in apps (including Pinterest and Match), the Federal Trade Commission uses ReCon to investigate deceptive business practices that violate user privacy, and my team has collaborated with investigative journalists to identify shady behavior in popular apps. My Wehe project for identifying net neutrality violations attracted substantial attention in the popular press due in part to Apple’s initial attempt to reject our app. As part of the public backlash, there were dozens of articles written about the topic internationally and our tool has been downloaded and used by more than 100,000 users, collectively running more than 1 million tests. We are also under contract with the French national telecom regulator, Arcep, to provide Wehe as a technology for auditing net neutrality violations (which are illegal in France).

My research has been highly collaborative, including not only faculty and students at Northeastern, but also colleagues at UMass, USC, Columbia, Inria, Akamai, Duke, Maryland, Northwestern, MPI, EPFL, University of Helsinki, and LUMS. From my perspective, the best research often comes from multiple perspectives on the same problem, so I frequently seek collaborations where there is mutual interest.

In the sections below, I highlight a subset of the research areas I have explored while at Northeastern.

ReCon: Improving Privacy by Interposing on Network Traffic

In the ReCon project, we develop strategies to improve user privacy in mobile and IoT environments by interposing on network traffic to detect and mitigate personal information (PI) leaks. The goals of ReCon are as follows: (1) Accurately identify PI in network flows, without requiring knowledge of users’ PI a priori. (2) Improve awareness of PI leaks by presenting this information to users. (3) Automatically improve the classification of sensitive PI based on user feedback. (4) Enable users to change these flows by modifying or removing PI. To achieve the first three goals, we determine what PI is leaked in network flows using network trace analysis, machine learning, and crowd-sourced user feedback. We achieve the last goal by providing users with an interface to block or modify the PI shared over the network.

Our work in this area has led to publications at MobiSys, IMC, NDSS, and PETS. Namely, we found that machine learning is an effective tool to identify privacy leaks across multiple mobile platforms and those leaks can have serious privacy implications, that the kind of information leaked by apps and websites differs even for the same exact service from the same company, and that over time the number of leaks and parties receiving this data is growing, making privacy worse. In a study that recent appeared in PETS, we identified leaks of media (non-text) data such as images and videos, and discovered a new, critical, and exploited vulnerability in how Android permits third-party libraries to capture screen contents. This project is supported by grants from the Department of Homeland Security, the Data Transparency Lab, and the Comcast Innovation Fund. This research led to responsible disclosure of more 30 security and privacy vulnerabilities in apps, was used by hundreds of volunteers in our user study, became the basis for an acclaimed short documentary film Harvest, and appears in numerous articles and popular news outlets.

In ongoing work, we are applying these ideas to the IoT environment using a unique testbed at Northeastern, the Mon(IoT)r Lab, which provides a fully functional studioapartment environment (sans bedroom and bathroom). The key challenges we are tackling include how to identify PI contained in encrypted flows without decrypting them, how to reliably identify privacy risks from statistical patterns in network traffic from IoT devices, and how to intercept and modify network traffic to improve privacy without loss of functionality.

A Principled Approach to Identifying Traffic Differentiation

In this line of research we have developed a systematic approach to understanding, exposing, and evading traffic differentiation policies, e.g., those that violate net neutrality. Our key insights are that identifying such policies require reproducing application-generated traffic, there is a need for rigorous statistical techniques that flag differentiation correctly even in noisy wireless environments, and that it is possible to systematically reverse engineer the devices that implement differentiation and use this to develop targeted, efficient strategies to evade them. This work has led to four IMC papers, a PAM paper, and a SIGCOMM workshop publication. It also produced the Wehe app, which has garnered more than 100,000 users running 1 million tests, and is being used to audit net neutrality violations worldwide.

In our initial work, we developed statistical techniques and empirically validated approaches for reliably identifying traffic differentiation and proxies in mobile networks. We identified several cases of differentiation that were forbidden under the 2015 Open Internet Order, and that disappeared soon after the order took effect. However, later that same year T-Mobile introduced a program for zero-rating and throttling certain video traffic, and we conducted an in-depth study of how they did so and what were the risks of their approach (both from false positives and negatives). We subsequently generalized our approach for reverse engineering T-Mobile’s traffic-classification rules and found that other providers use similarly brittle rules. Most recently, we leveraged what we learned from out studies of traffic-classifying middleboxes to develop highly efficient, unilateral approaches for evading such devices.

This research has been funded by the National Science Foundation, a Google Research Award, a sponsored project with Verizon Labs, and a contract with Arcep (the national telecom regulator in France). A key product of our research is our Wehe app, which allows average users to test for net neutrality violations from their phone. It has tests from more than 100,000 users, received substantial attention in the press in early and late 2018, and has been selected by Arcep to be the tool of choice for auditing net neutrality violations in France (where such violations are illegal). I have also testified to the Massachusetts state legislature to report on our findings and how to craft legislation that encourages net neutral behavior, I explained our findings to the FCC in person and via a public comment, advised Ofcom (the telecom regulator in the UK), and the Massachusetts Department of Telecommunications and Cable.

Towards Strong Anonymity in Computer Networks

In this line of research, we demonstrate that there exist anonymity network designs that are resilient to traffic analysis and that exhibit an acceptable benefit-cost ratio under some set of realistic assumptions. In our work, we leverage three key ideas to build anonymity networks that are low-cost, high-performance and resistant to traffic analysis. First, we combine trusted infrastructure (for mixing traffic) with untrusted P2P nodes (for scalability). Second, we incorporate the notion of zones to gives users the ability to select the jurisdiction in which they trust their proxies to run. Third, instead of using a one-size-fits-all solution, we leverage empirically observed properties of communication workloads to design optimized anonymity networks that provide acceptable performance/cost trade-offs.

This line of research, which is a collaboration with the Max Planck Institute for Software Systems (MPI-SWS) and the École Polytechnique Fédérale de Lausanne (EPFL) has resulted in two SIGCOMM publications. We are currently investigating how to build systems that permit anonymous communication during an Internet blackout (e.g., those used for censorship) and those that enable real-time, strongly anonymous group audio and video calls. This research is funded by the National Science Foundation.

Personal Virtual Networks

Our mobile devices (e.g. cellphones) regularly encounter and connect to multiple networks to maintain seamless communication, enabling the variety of services we increasingly rely on, such as Internet access, text messaging, social media access, and entertainment. However, such ubiquitous network connections raise a number of important concerns. For example, our devices regularly send data over networks they do not fully trust and that are not under our control, which can lead to security vulnerabilities, poor service, and privacy violations— many of which are described in the sections above. I argue that instead of focusing on point solutions to these problems, there is an opportunity to rethink how we connect to and interact with the networks that provide us with Internet connectivity. Specifically, I propose developing personal virtual networks, or PVNs, that provide each device with its own network within a network provider that can provide customizable security and privacy.

In my HotNets paper, I introduced the idea of ubiquitous PVNs that can be configured to provide privacy, security and performance across untrusted heterogeneous networks. PVNs will allow devices to deploy their own trusted network configurations inside of network providers, define policies for network traffic, and even deploy limited code that interposes on their traffic using a software middlebox environment. In short, PVNs provide the device with a single user-defined and user-controlled network configuration, wherever the device happens to connect. By making a network provider’s in-network resources available to devices via a secure and flexible interface, PVNs can enable more secure, flexible, private, and performant network experiences for users.