Augmenting Modern Software Systems

I build systems that transform other systems. My systems lift the capabilities of programmers called upon to deal with the complexity of modern software systems. They automate away inessential complexity and automate in desired features — securing programs that use hundreds of software dependencies, bolting distribution onto existing applications, and parallelizing large-scale pipelines built out of multi-language components. I characterize the behavior of my systems using real workloads seen in practice, often paired with mathematical models and proofs of key properties of interest.

Join us!
To students and potential collaborators: I enjoy creative work that is highly collaborative and has positive impact to as many people as possible. Please skim the research thrusts below and email to chat!

Automating Protections Against Software Supply-Chain Threats

Modern software incorporates thousands of dependencies as a means of accelerating its development and reducing its cost—at a significant risk to safety and security for both developers and end-users. We have built a series of systems targeting the JavaScript dependency ecosystem—the largest such ecosystem out there—automating the analysis, transformation, and synthesis of dependencies across a variety of threat models.

Papers: Our PLOS17 paper highlights the problem with third-party libraries. Our NDSS18 paper proposes automated transformations that use operating-system protection mechanisms to isolate selected libraries. Our FSE21 paper proposes language-based instrumentation techniques applied to the context around each library, offering Turing-complete analysis and protection at a reduced runtime cost. Our CCS21a paper proposes a RWX permission model applied at the library boundary, combined with static and load-time program analysis that automates permission inference. Our CCS21b paper uses active learning and regeneration to synthesize vulnerability-free replacement libraries that fall under certain computational domains. Ongoing research (i) develops the model behind library recontextualization and its proofs of soundness properties, and (ii) develops techniques for protecting against memory-unsafe native addons, such as ones written in C/C++ or available in binary.

Software: Lya (GitHub) is a system for dynamic program analysis and instrumentation at the boundaries of JavaScript libraries. It forms the basis for much of our runtime security work around JavaScript. Mir (GitHub) a system for static analysis at the boundaries of JavaScript libraries.

Require Security is our company transitioning some of these and other supply-chain security technologies to industry.

Collaborators: Achilles Benetopoulos, Alizee Schoen, André DeHon, Ben Karel, Cristian-Alexandru Staicu, Grigoris Ntousakis, Jiasi Shen, Jonathan M. Smith, Konstantinos Kallas, Martin Rinard, Michael Pradel, Nathan Dautenhahn, Nick Roessler, Shivam Handa, and Veit Heller.

Automating Shell Script Parallelization/Distribution

Shell scripting is used pervasively, partly due to its simplicity in combining components (commands) written in multiple languages. Unfortunately, this language-agnostic composition hinders automated parallelization and distribution, often forcing developers to manually rewrite shell programs (and their components) in other languages that support these features. We have several projects that, combined, offer automated parallelization (and, soon, distribution) of Unix/Linux shell scripts—along with serious correctness and compatibility guarantees.

Papers: Our HotOS15 paper outlines the problem with today's distributed computing software and offers a vision for the future. Our EuroSys21 paper describes our PaSh system for parallelizing commands. Our ICFP21 paper formalizes the model sitting at the core of PaSh and proves its parallelizing transformations correct. Our HotOS21 paper outlines a vision for the future of the shell, and our HotOS21 panel discussed future avenues for cross-discipline shell-related research captured in the panel report. Our OSDI22 paper tackles POSIX-compliant parallelization in the presence of fully dynamic behavior pervasive in the shell—via just-in-time compilation, intermixing evaluation and optimization of individual expressions. Ongoing research tackles automated generation of critical runtime components, through a combination of active learning and program synthesis

Software: PaSh (website, GitHub) is an award-winning shell parallelization system that forms the basis for all our shell-related research. PaSh is open-source, hosted by the Linux Foundation, and under heavy development.

Collaborators: Konstantinos Kallas, Michael Greenberg, Achilles Benetopoulos, Thurston Dang, Shivam Handa, Dimitris Karnikis, Konstantinos Mamouras, Lazar M. Cvetković, Tammam Mustafa, Martin Rinard, Jiasi Shen, and several open-source contributors.

Automated Transformation Towards Secure Scalable Computing Paradigms

Recent trends are pushing developers towards new paradigms of secure and scalable computing—e.g., confidential computing, microservices, serverless computing, edge computing, etc. Transforming a conventional program to leverage these paradigms is a laborious manual process that can lead to suboptimal performance and in many cases even break the program. We are developing systems supporting this kind of decomposition, often leveraging special hardware capabilities when possible.

Papers: Our PLDI paper presents a module-level decomposition, resource awareness, and scale-out of bottlenecked components. Our EdgeSys, APSys, and APNet papers describe several components of this vision. Ongoing work tackles (i) automated application decomposition of monolithic applications towards confidential computing, and (ii) automated application decomposition of monolithic applications towards microservice and serverless computing.

Software: Atlas (GitHub) is a new runtime environment that support seamless offloading and scale-out, including over Trusted Execution Environments such as Intel SGX. Atlas forms the basis upon which we build a lot of our distributed infrastructure.

People: Ricardo A. Baratto, Ben Karel, Dimitris Karnikis, Henri Maxime Demoulin, Yash Palkhiwala, Martin Rinard, and Felix Stutz.