The Internet has grown to reach a wide range of devices such phones,
cars and televisions, but our ability to measure, optimize and evolve
these increasingly complex networks has not kept pace. As a result, we
have networks that continue to experience frequent failures and deliver
I posit that a central reason for these persistent problems is that it is difficult to gain visibility into network behavior at the expanding edges of the Internet, leaving most researchers with unrepresentative modeling, simulation and test-bed experiments [CoNEXT 2009, CCR 2010, SIGCOMM 2011]. My research aims to address these limitations by focusing on the following challenges: 1) recruiting end users to participate in large-scale Internet measurement efforts, 2) using these end-system measurements to understand the issues limiting performance and reliability and 3) developing solutions to address them.
First, my work crowdsources measurement and performance evaluation of Internet systems by deploying software to users at the scale of tens or hundreds of thousands of users. To date, more than 1 million users have installed software produced from my research. With users' permission, I collect measurement data from this software to understand network performance at the Internet's edges.
Second, my research uses this large-scale empirical data to identify Internet performance and reliability issues that impact a large portion of the population. My work has focused on problems with cross-ISP traffic in P2P systems, network performance degradation for users at the edge of the network and Internet outages affecting content and service providers.
Third, my work addresses these issues via software systems that users and network administrators can immediately deploy inside their networks. Below, I discuss how I developed a highly scalable system for reducing costs and improving performance in peer-to-peer (P2P) systems. Next, I describe how I crowdsourced network monitoring to identify Internet performance problems and automatically repaired such problems using existing network protocols in unintended ways.
Last, I discuss my recent work focusing on crowdsourcing monitoring of mobile networks. I describe my current vision for building a research platform that aims to improve transparency and control for mobile networking traffic.
Over the past decade, the peer-to-peer (P2P) model for building distributed systems has enjoyed incredible success and popularity, forming the basis for a wide variety of important Internet applications such as file sharing, voice-over-IP (VoIP) and video streaming. This success has not been universally welcomed. Internet Service Providers (ISPs) and P2P systems, for example, have developed a complicated relationship that has been the focus of much media attention. While P2P bandwidth demands have yielded significant revenues for ISPs, as users upgrade to broadband for improved P2P performance, P2P systems are one of their greatest and costly traffic engineering challenges because peers establish connections largely independent of the Internet routing. To address these issues, I developed Ono, an extension to a popular BitTorrent client that biases P2P connections to avoid much of these costs without sacrificing -- and potentially improving -- BitTorrent performance [SIGCOMM 2008].
The Ono software, which has been installed more than 1,000,000 times, provides an alternative approach to the unsustainable cat-and-mouse game where ISPs would block P2P traffic and P2P software would circumvent these measures. In addition to addressing the problem of cross-ISP traffic in P2P systems, Ono provides a clear instance of reusing information made available by existing long-running services [SIGCOMM 2006]. In this case, we used dynamic CDN redirections as hints regarding network proximity: if two peers are sent to the same CDN replica servers, they are likely to be close to those servers, and by transition, close to each other [ICDCS 2008]. Finally, by providing an immediately deployable system that locates nearby peers to improve P2P users' performance, we showed that the right user incentives are essential for a successful approach to reducing costs for ISPs.
Despite decades of improvement in performance and reliability, the public Internet remains insufficiently robust to support critical functionality such as remote health care monitoring. This occurs because today's Internet consists of multiple layers of protocols and services that occasionally enter conflicting states, preventing the normal flow of network traffic between endpoints. There is a need to detect, isolate and determine the root causes of these network events so operators can resolve them in a timely manner, minimizing their impact on revenue and reputation.
A central challenge is how to quickly and accurately detect that a network problem has occurred. Most network problems occur near the edges of the network; however, they are largely invisible to network operators. The reason is that most existing approaches to monitoring simply do not scale to the large number of network elements at the edge. My thesis work proposed that the most effective way to detect network problems is by monitoring the end systems where the services are used [SIGCOMM 2010]. This approach detects network performance problems by crowdsourcing network monitoring Đ achieving scalable, real-time network coverage by pushing monitoring to end systems at the network edge. I used probability theory, extensive traces from BitTorrent users and ground-truth information from ISPs to design and build a system that detects network problems effectively, quickly and reliably. Its current implementation for BitTorrent, called the Network Early Warning System (NEWS), has been installed more than 50,000 times.
Another key challenge is how to automatically fix network problems once they have been detected. My postdoc work (in collaboration with researchers at UW, Georgia Tech and UPMC Sorbonne) focused on cases where Internet destinations are unreachable from service and content providers. Unfortunately, the Internet does not currently provide a standard way to notify other networks when an ISP is not forwarding traffic properly. Ideally, we would like to send a message telling all other ISPs in the Internet to avoid a broken network link. To approximate this "avoid" message, we use the standard Border Gateway Protocol (BGP) in an unintended way. We showed that BGP's built-in loop prevention functionality can be exploited to force ISPs to avoid a broken network link, and this allowed us to restore connectivity to previously unreachable networks in a large number of cases. Having demonstrated the effectiveness of this approach, we suggested a new protocol message for future versions of the BGP [HotNets 2011, SIGCOMM 2012].
Internet performance and reliability in mobile systems (e.g., smartphones) is at least a decade behind its fixed-line counterpart. I posit that many problems in this area persist due to a lack of visibility into what network traffic is generated by mobile devices and a lack of control over this traffic.
The key challenge here is that most mobile devices operate in a closed, locked-down environment that encompasses apps, operating systems and the mobile network carrying Internet traffic. To address this challenge, I propose building a system that redirects all mobile-device network traffic to a server outside the carrier's network, thus providing a point of control where one can characterize, modify or block this traffic before sending it to the intended destination.
I argue that we can do this today without any additional support for devices or carriers: the key idea is to combine software middleboxes (e.g., packet filters, proxies, etc.) with virtual private networks (VPNs), an approach I call Meddle. Meddle exploits widespread support for VPN tunnels to redirect mobile traffic for nearly all devices and networks. Meddle uses VPNs to capture a continuous and comprehensive view of how mobile devices interact with the Internet. Once packets arrive at a Meddle VPN server, we use a variety of middlebox approaches to interpose on mobile-device traffic.
I am currently leading a collaborative effort with researchers at UW, UC Berkeley and INRIA (along with support from Google) to explore opportunities enabled by Meddle. We are recruiting users to contribute to an IRB-approved study with the goal of characterizing mobile network traffic and experimenting with new techniques for modifying traffic to improve performance, security and reliability.