Adaptive Learned Distributed Systems (VLDB’23)
Using reinforcement learning to optimize distributed systems, starting with our case study on blockchain architectures in VLDB’23.
Declarative Smart Contracts (ESEC/FSE’22)
Domain-specific language for implementing smart contracts that is easier to verify and optimize.
Developing the next generation scalable blockchain system, with tunable consensus protocols, strong privacy and correctness guarantees.
NetSpec (IEEE/ACM Transactions on Networking’22)
NetSpec synthesizes network specifications in a declarative logic programming language from input-output examples. NetSpec aims to accelerate the adoption of formal verification in networking practice, by reducing the effort and expertise required to specify network models or properties.
Big data meets startup analytics (IJSC Journal’22, CIKM’17, ExploreDB’16)
A fun side-project I did with several business school professors (Wharton, Harvard, UMass-Boston) on using big data analytics to predict success of startups.
Data-intensive Systems on Disaggregated Data Centers (CIDR’20, VLDB’20, CIDR’22, SIGMOD’22)
A big data systems project with several other colleagues on developing next-generation data-intensive systems on emerging hardware based on disaggregated data centers. Qizhen Zhang received the best thesis award in the Computer and Information Science department at the University of Pennsylvania for this work.
Perséphone is a kernel-bypass OS scheduler designed to minimize tail latency for applications executing at microsecond-scale and exhibiting wide service time distributions. Perséphone integrates a new scheduling policy, Dynamic Application-aware Reserved Cores (DARC), that reserves cores for requests with short processing times.
Sharingan uses program synthesis techniques to generate network classification programs at the session layer. Sharingan accepts raw network traces as inputs and reports potential patterns of the target traffic in NetQRE, a domain specific language designed for specifying session-layer quantitative properties
Flightplan is a tool-chain for the flexible decomposition of P4 (Programming Protocol-Independent Packet Processors) programs and their allocation to heterogeneous hardware to improve performance, reliability and utilization of software-defined networks. We also develop tools to understand and showcase the behavior of both the decomposed parts of the program and the distributed whole. Code release: Flightplan
Provenance for Probabilistic Logic Programming (EDBT’20 Best Paper)
P3 (Provenance for Probabilistic logic Programs) is a novel provenance model and system for analyzing probabilistic logic programs. P3 enables four types of provenance queries: traditional explanation queries, queries for finding the set of most important derivations within an approximate error, top-K most influential queries, and modification queries that enable us to modify tuple probabilities with fewest modifications to program or input data. This work received best paper award at EDBT 2020.
GraphRex is an efficient, robust, scalable, and easy-to-program framework for graph processing on datacenter infrastructure. To users, GraphRex presents a declarative, Datalog-like interface that is natural and expressive. Underneath, it compiles those queries into efficient implementations.
Network protocols such as TLS can be customized through a vast array of options with unpredictable interactions and dangerous security implications. We are developing accountable protocol customization, a framework that combines synthesis, testing, and verification techniques to extract backward-compatible, lean protocol subsets while meeting realistic functional and security requirements. Our ultimate goal is to target widely used standards for network infrastructure, secure messaging, federated authentication, and the Internet of Things.
Prevalent software engineering practices have significantly increased the complexity and bloat of today’s software. This bloat has led to decreased performance and increased security vulnerabilities. The ASPIRE project seeks to develop a toolkit and methodology to effectively counter this trend. Bloat reduction poses significant challenges that ASPIRE endeavors to address. ASPIRE aims for a fully automated reduction process that can handle legacy source and binary code, and scale to large, complex programs that implement protocols.
STRANDS is a declarative programming-based framework for verification and debugging of distributed systems, allowing for static analysis of the system specification and runtime analysis of the system implementation. STRANDS relies on Network Datalog (NDlog) for system modeling and programming. NDlog is a declarative programming language for network protocols, extending the classical query language Datalog.
The goal of the DeDOS project is to create fundamentally new defenses against distributed denial-of-service (DDoS) attacks that can provide far greater resilience to these attacks compared to existing solutions. Today’s responses to DDoS attacks largely rely on old-school network-based filtering or scrubbing, which are slow and manual, and cannot handle new attacks. DeDOS takes a radically different approach that combines techniques from declarative programming, program analysis, and real-time resource allocation in the cloud. Code release: DeDOS
The overall research objective is to combine big data analytics with econometric techniques and social psychology user studies, to enhance our fundamental understanding of how early-stage high-tech startups operate – and how we might model (and optimize) their behavior and decision processes. This project is in collaboration with Wharton School.
NetPoirot: Failure Diagnosis for Data-center Networks
Today, root cause analysis of failures in data centers is mostly done through manual inspection. More often than not customers blame the network as the culprit. However, other components of this system might have caused these failures. This project aims to develop a lightweight, accurate, non-intrusive tool for infering about failures in the data center using TCP statistics collected at endpoints. Behnaz Arzani received the best thesis award in the Computer and Information Science department at the University of Pennsylvania for this work.
NetEgg is a programming tool that allows network operators who may not be trained in programming to develop network policies by describing representative example behaviors. Given these scenarios, a synthesis algorithm automatically infers the controller state that needs to be maintained along with the rules to process network events and update state.
The goal of ExCAPE is to transform the way programmers develop software by advancing the theory and practice of software synthesis. In the proposed paradigm, a programmer can express insights through a variety of forms such as incomplete programs, example behaviors, and high-level requirements, and the synthesis tool generates the implementation relying on powerful analysis algorithms and programmer collaboration.
The goal of this project is to provide secure network provenance, that is, the ability to correctly explain system states even when (and especially when) the system is faulty or under attack. Towards this goal, we are substantially extending and generalizing the concept of network provenance by adding capabilities needed in a forensic setting, we are developing techniques for securely storing provenance without trusted components, and we are designing methods for efficiently querying secure provenance. We are evaluating our techniques in the context of concrete applications, such as Hadoop MapReduce or BGP interdomain routing.
NEBULA is a future Internet architecture that is intrinsically more secure and addresses threats to the emerging computer utility capabilities (called cloud computing) while meeting the challenges of flexibility, extensibility and economic viability.
Interdomain routing protocol stability depends on the absence of policy conflicts between autonomous systems; but since most policy is kept private, it is hard to ensure that conflicts are avoided. We show that even limited information can be used to help guide network (re-)configuration, by automated tools that assist network operators. This work is based on an underlying formalism of partially specified policy configurations, which has related applications in network optimization, resilience, and giving insight.
Cologne is a declarative optimization platform that enables constraint optimization problems (COPs) to be declaratively speciï¬ed and incrementally executed in distributed systems. Cologne integrates the RapidNet declarative networking engine with the Gecode constraint solver. Cologne uses the Colog declarative language that combines distributed Datalog used in declarative networking with language constructs for concisely specifying goals and constraints used in COPs. The Cologne platform has tremendous practical value in facilitating extensible distributed systems optimizations.
PUMA is a declarative constraint solving platform for policy-based routing and channel selection in multi-radio wireless mesh networks. We have developed a prototype of the PUMA system using the RapidNet declarative networking system deployed on the ORBIT testbed.
The FVR project addresses a long-standing challenge in networking research — bridging the gap between formal theories and actual implementations. We present the FSR (Formally Safe Routing) toolkit, that unifies research in routing algebras with declarative networking to produce provably-correct distributed implementations for inter-domain routing. In addition to the FSR toolkit, the FVR project has also explored the use of theorem proving and rewriting logic techniques to verify routing protcols.
Operators of distributed systems often find themselves needing to answer forensic questions, to perform a variety of managerial tasks. We present NetTrails, a novel provenance-based approach that provides the fundamental functionality required for answering forensic questions — the capability to “explain” the existence (or change) of a distributed system state at a given time in a potentially adversarial environment. Wenchao Zhou received honorable mention for the SIGMOD dissertation award for this work.
RapidNet is a development toolkit for rapid simulation, implementation and experimentation of network protocols. RapidNet utilizes declarative networking, a declarative, database-inspired extensible infrastructure that uses query languages to specify behavior. The long term goal of RapidNet is to provide a unified platform for rapid prototyping, synthesis, and deployment of new network protocols.
Code release: RapidNet
The DS2 project explores a unified declarative platform for specifying, implementing, and analyzing secure extensible distributed systems. Our work is motivated by the proliferation of large-scale network information systems currently deployed for a variety of application domains including network monitoring infrastructures, cloud computing, content distribution networks, and network routing.
SAFEST: Selectable Anonymity for Enabling SAFER Telecommunications (DARPA SAFER progran)
Older Research Projects
- Networked Systems Programming Projects in ns-3
- A3: Application Aware Anonymity (Micah Sherr received the best thesis award in the Computer and Information Science department at the University of Pennsylvania for this work.)
- MOSAIC: Declarative Platform for Dynamic Overlay Composition
- OntoNet: Scalable Knowledge-based Middleware for Networked Systems
- Veracity: Fully-decentralized Secure Coordinate Systems