formal methods, programming languages,

Making Networks Safe and Agile with Formal Methods and Programming Abstractions: Future Directions

Nate Foster Nate Foster View Nov 27, 2023 · 8 mins read
Arvind Krishnamurthy Arvind Krishnamurthy View Nov 27, 2023 · 8 mins read
Ratul Mahajan Ratul Mahajan View Nov 27, 2023 · 8 mins read
Todd Millstein Todd Millstein View Nov 27, 2023 · 8 mins read
David Walker David Walker View Nov 27, 2023 · 8 mins read
Anduo Wang Anduo Wang View Nov 27, 2023 · 8 mins read
Pamela Zave Pamela Zave View Nov 27, 2023 · 8 mins read
Making Networks Safe and Agile with Formal Methods and Programming Abstractions: Future Directions

For years, networks have been seen as hard to manage and hard to evolve. They are hard to manage because even small networks are complex, with multiple devices and protocols interacting to provide basic services, and human operators usually have few tools to help them reason about end-to-end behavior. They are hard to evolve because true innovation usually requires buy-in from vendors and standards bodies, and researchers and practitioners often do not have the stamina to wait the 3-4 years required to go from idea to fruition.

In recent years, however, research at the intersection of formal methods, programming languages, and networking has enabled researchers and practitioners to experiment with new architectures and abstractions. In a nutshell, this research enables operating on a different trade-off curve between agility (i.e., how quickly can you change a network) and safety (how confident can you be that the changes made are correct). These ideas have been developed in the context of software-defined networking (SDN), programmable data planes (P4, eBPF, etc.) as well as conventional networks (e.g., distributed routing protocols, fixed-function devices, etc.)

While there has been admirable progress over the past decade, many hard challenges remain–challenges that only approaches based on formal methods and programming languages can address. For agility, efforts like SDN, P4, eBPF, etc. provide the basic building blocks for programmable networks, but they largely leave open the question of what software architectures, languages, and APIs we should actually use to specify and implement real-world functionality. A lack of investment in this area risks: a return to ossification, fragmenting the community as proprietary abstractions get entrenched, and networks again become complex monoliths that are hard to program, stifling innovation. This concern is especially important in emerging areas like SmartNICs as well as network support for ML and AI, where hardware vendors are currently engaged in intense competition to dominate the marketplace.

For safety, experience has shown that even the most sophisticated tech companies (i.e., hyperscalers) struggled to provide the kind of reliability they need to support their businesses. There are growing efforts to apply ideas from formal methods and software engineering to networking – e.g., most cloud companies now have network verification teams (in some form) and there have also been startups in this space (Intentionet, Forward, VeriFlow, etc.) But it’s still early days. Most of these tools only operate in certain niches, which limits their applicability and hinders transfer of ideas between researchers working on similar topics. To truly move the community forward, we need new approaches and the research community needs access to representative scenarios and benchmarks.

The rest of this document sketches four four directions for future research in the area.

Modular design and reasoning

The global network is composed of many overlapping domains. These include not just traditional ISPs and enterprise networks but also virtual networks that span physical and (multiple) clouds, IoT networks, telco networks with different standards, etc. Despite this compositional structure, the perspective of almost all current tools for network synthesis and verification tends to be that of a single (homogeneous) domain. This makes verification and synthesis an all-or-nothing proposition—it does not admit fast local checking and enforcement of the local rules that lead to strong end-to-end properties. As developers update and maintain components of a network, they get no feedback about the effects of changes on neighboring components.

A failure to recognize the modular structure also makes it hard to scale verification and develop robust specifications. A “domain” or “virtual network” has its own purpose and abstractions, as well as its own network design, including namespace, topology, routing, and session protocols. Only with recognition of these specialized concepts is it possible to write strong specifications of network services as experienced by their users. Further, the only way to ensure that lower levels of the network do not interfere unnecessarily with applications is to properly separate their concerns.

We should make modularity and composition first-order concerns in the next generation of synthesis and verification tools. This research will require defining interfaces between domains and, while reasoning about one domain, handling uncertainty in the behavior of other domains. What information should such interfaces contain? How should it be expressed? Modular and compositional reasoning can facilitate proofs of end-to-end reachability, security, reliability, and performance guarantees across heterogeneous domains or systems and formalize how those domains depend on each other. In the control plane, they can help deliver greater reliability, fault isolation, and a means to support richer policies and algorithms.

Along with leveraging the modular structure in specification and verification, we should be encouraging modular structure in network implementations. Even without verification, it can lead to simplification, reuse, and increased reliability. We should also investigate how modularity can be exploited to solve practical problems experienced by network operators and application developers, such as untangling the “tangle of tunnels” in IP networks.

Verifying modern control planes

Network control planes used to (primarily) be the configuration of distributed routing protocols, and they used to be self-contained with no external dependencies. But with the rise of SDN and network virtualization, the network control planes have gotten richer and depend on many other systems (e.g., for performance and demand data) to determine how the network forwards traffic. In some cases, these dependencies are circular as well, where a distributed database depends on the network, and the network depends on the database service. Such control planes, while highly flexible, can pose a serious threat to network reliability, as was demonstrated by the global Facebook outage in Oct 2021 that lasted for over seven hours and other large, cascading failures.

Most current verification technology has not targeted such modern control planes. Doing so effectively will require carefully formalizing interfaces between components and their interactions. The modular reasoning abilities mentioned earlier are a prerequisite here. Once we have the dependencies formalized, we can verify that no combination of K failures can cascade to a system-wide outage. We can also verify that the core control plane, which directly interacts with the network devices, is statically stable – that is, it can function (in degraded mode) even if all external dependencies are missing and it can recover from disasters.

Accessible network testing

While network verification has had undeniable success inside the largest of networks, such as those of hyperscalers, its penetration beyond them is still fairly limited. We think this state of affairs is down to the difficulty of using the current technologies for most network engineers. This difficulty stems from a host of factors, including (1) the lack of precise specifications that verification tools need, (2) mismatch in user expertise (e.g., network engineers cannot be expected to write logic queries), and (3) verification tools being conceptually different from traditional tools such as traceroute and ping (e.g., reasoning about large sets of packets instead of concrete packets). To make the vast majority of networks more reliable, we need to build tools that are easy to use for 95% of engineers building non-hyperscale industrial networks.

We should take inspiration from the software domain. Here, while verification is still not commonplace, many capabilities to improve the reliability of software have become commonplace over the last two decades. These include type-safe languages such as Rust and Java, test frameworks such as junit, and programming environments that make it easy to employ such capabilities. We believe that networking can take a similar journey, where we turn focus on the accessibility of automatic testing instead of focusing solely on verification technologies that can provide strong safety guarantees. Example explorations: What does a type system for networks look like? How can we develop a test framework for networks that enables both proactive (before changes are pushed to the network) testing and continuous monitoring?

Thriving ecosystem of networking languages

A language-oriented approach to designing networks has many advantages. Languages with precise semantics as well as strong abstraction and composition features facilitate communication, reuse, and reorganization of ideas and algorithms. These ideas may be shared between developers and organizations, and accelerate network development. P4 is a good example, and it has enabled a broad range of hardware and application innovation. But there is significant unexplored territory, not just for languages that describe packet forwarding inside a switch but other aspects of networks such as the control plane, network-wide forwarding, middleboxes, access control, and layer7 networks such as those built using service meshes.

The research agenda here goes beyond developing surface language for different tasks. It includes developing one or more intermediate languages (ILs) that provide a common representation for multiple surface languages, in the same way, LLVM provides a platform for compiling and analyzing many general-purpose languages. Such ILs will provide a common basis for verification and optimization, without needing to do this work for each surface language.

Further, to facilitate verification and synthesis, it is usually helpful for languages to be designed with these goals in mind from the get-go. Languages facilitate the verification process through “correct by construction design”, which eliminates failure modes. Linguistic features such as embedded assertions, constraints, types, proofs, and symbolic values allow programmers to write, and more importantly, maintain proofs as they construct and maintain, synthesize, or re-synthesize programs. Equally important are the development environments in which programmers operate. Such environments can facilitate code construction and analysis of all kinds, as well as maintain proof state.

This article is a byproduct of discussions amongst the authors in the context of NSF Community Workshop on Long-term Research Directions in Wired Networking

Nate Foster
Author Nate Foster Follow
Nate Foster is a Professor of Computer Science at Cornell University and a Platform Architect at Intel. The goal of his research is to develop languages and tools that make it easy for programmers to build secure and reliable systems. His current work focuses on the design and implementation of languages for programming software-defined networks. In the past he has also worked on bidirectional languages (also known as “lenses”), database query languages, data provenance, type systems, mechanized proof, and formal semantics. He received a PhD in Computer Science from the University of Pennsylvania, an MPhil in History and Philosophy of Science from Cambridge University, and a BA in Computer Science from Williams College. His awards include a Sloan Research Fellowship, an NSF CAREER Award, the SIGCOMM Rising Star Award, a Most Influential POPL Paper Award, a most Influential ICFP Paper Award, a Tien ‘72 Teaching Award, several Google Research Awards, a Yahoo! Academic Career Enhancement Award, a Cornell Engineering Research Excellence Award, and the Morris and Dorothy Rubinoff Award.
Arvind Krishnamurthy
Author Arvind Krishnamurthy Follow
Arvind Krishnamurthy is the Short-Dooley Professor in the Paul G. Allen School of Computer Science & Engineering. His research interests are in building effective and robust computer systems in the context of both data centers and Internet-scale systems. More recently, his research has focussed on programmable networks and systems for machine learning. He is an ACM fellow, a past program chair of ACM SIGCOMM and Usenix NSDI, is the Vice President of Usenix, and serves on the ICSI and CRA boards.
Ratul Mahajan
Author Ratul Mahajan Follow
Ratul Mahajan is an Associate Professor at the University of Washington (Paul G. Allen School of Computer Science) and a Co-founder and CEO of Intentionet, a company that is enabling safe, rapid evolution of computer networks using formal analysis and high-level design approaches. Prior to that, he was a Principal Researcher at Microsoft Research. He got his PhD at the University of Washington and B.Tech at Indian Institute of Technology, Delhi, both in Computer Science and Engineering.
Todd Millstein
Author Todd Millstein
Todd Millstein is a Professor of Computer Science at UCLA, as well as a Co-founder and Chief Scientist at Intentionet. He has also been an Academic Visitor at Oxford University, a Visiting Researcher at Microsoft Research, and a Visiting Fellow at Princeton University. Todd received a Ph.D. from the University of Washington and an A.B. from Brown University, both in Computer Science.
David Walker
Author David Walker
David Walker is a Professor of Computer Science at Princeton University. He received his doctoral and master’s degrees in computer science from Cornell, and his bachelor’s from Queen’s University in Kingston, Ontario. During sabbaticals from Princeton, he has served as a visiting researcher at Microsoft Research in Redmond (2008) and in Cambridge (2009), and as Associate Visiting Faculty at the University of Pennsylvania (2015-2016).
Anduo Wang
Author Anduo Wang
Anduo Wang is an Associate Professor at Temple University, where she specializes in improving network state management using formal methods, databases, knowledge representation and reasoning, and logic programming. She received her Ph.D. from UPenn in 2013 and was a postdoctoral researcher at UIUC with Matthew Caesar and Brighten Godfrey before joining Temple University in 2016. Anduo has been the recipient of three NSF grants as a single-PI investigator, including the CRII in 2017, CNS core small in 2019, and CAREER in 2022. She has served in various networking venues, including as the General Chair for the ACM SOSR conference in 2020 and as a member of the ACM SIGCOMM Executive Committee as the SIGCOMM Information Services Director from 2017 to 2021.
Pamela Zave
Author Pamela Zave
Pamela Zave is a researcher in the Computer Science Department of Princeton University. She received a PhD degree in computer sciences from the University of Wisconsin--Madison and an AB degree in English from Cornell University. She spent 36 years in the research divisions of Bell Labs and AT&T Labs, doing interdisciplinary work at the intersection of formal methods and network services. Dr. Zave is an ACM Fellow, an AT&T Fellow, and the 2017 recipient of the IEEE Harlan D. Mills Award for the development of sound software-engineering theory and sustained, impactful applications to practice. In addition to her research publications, she holds 32 patents.