Bibtex file

Nikos Vasilakis

This file can be accessed in both bibtex and markdown forms, from which it gets generated.

General-purpose Distributed Environments

From Lone Dwarfs to Giant Superclusters: Rethinking Operating System Abstractions for the Cloud

Unix took a rich smorgasbord of operating system features from its predecessors and pared it down to a small but powerful set of abstractions: files, processes, pipes, and the shell to glue the system together. In the intervening forty years, the common-case computational substrate has evolved from a lone PDP-11 minicomputer to vast clouds of virtualized computational resources. Contemporary distributed systems are being built by adding layer upon layer atop the foundation established by Unix’s chosen abstractions. Unfortunately, the resulting mess has lost the “simplicity, elegance, and ease of use” that was a hallmark of the original Unix design. To cope with distribution at astronomic scale, we must take our operating systems back to the drawing board. We are living in a new world, and it is time to be brave.

The Web As a Distributed Computing Platform

Perceived as a vast, interconnected graph of content, the reality of the web is very different. Immense computational resources are used to deliver this content and associated services. An even larger pool of computing power is comprised by edge user devices. This latent potential has gone unused. Ar~frames the web as a distributed computing platform, unifying processing and storage infrastructure with a core programming model and a common set of browser-provided services. By exposing the inherent capacities to programmers, a far more powerful capability has been unleashed, that of the Internet as a distributed computing system. We have implemented a prototype system that, while modest in scale, fully illustrates what can be realized.

Ignis: Light-touch Scale-out of Distribution-oblivious Systems

Distributed systems can speed up computations, mitigate resource-exhaustion attacks, improve fault-tolerance, and balance load during spikes. However, current approaches require developers to identify and rewrite bottlenecked components or systems, a process quite different from how they normally compose software.

Light-touch distribution is a new approach, introduced as a drop-in replacement of a language runtime’s module system, that converts legacy systems into distributed ones using automated transformations that operate at the boundaries of bottlenecked modules. Transformations are parametrizable by optional distribution recipes, lightweight annotations that guide the intended semantics of the distributed systems. Transformations and recipes operate at runtime, adapting systems to current load patterns by scaling out only saturated components. Experiments with our Ignis prototype show substantial speedups, attractive elasticity characteristics, and memory gains over full system replication, achieved via small backward-compatible code changes.

Andromeda: A Distributed Userspace

Distributed Partitioning Data Structures

Query-efficient Partitions for Dynamic Data

Large-scale data storage requirements have led to the development of distributed, non-relational databases (NoSQL). Single-dimension NoSQL achieves scalability by partitioning data over a single key space. Queries on primary (“key”) properties are made efficient at the cost of queries on other properties. Multidimensional NoSQL systems attempt to remedy this inefficiency by creating multiple key spaces. Unfortunately, the structure of data needs to be known a priori and must remain fixed, eliminating many of the original benefits of NoSQL.

This paper presents three techniques that together enable query-efficient partitioning of dynamic data. First, unispace hashing (UH) extends multidimensional hashing to data of unknown structure with the goal of improving queries on secondary properties. Second, compression formulas leverage user insight to address UH’s inefficiencies and further accelerate lookups by certain properties. Third, formula spaces use UH to simplify compression formulas and accelerate queries on the structure of objects. The resulting system supports dynamic data similar to single-dimension NoSQL systems, efficient data queries on secondary properties, and novel intersection, union, and negation queries on the structure of dynamic data.


BreakApp: Automated, Flexible Application Compartmentalization

Developers of large-scale software systems may use third-party modules to reduce costs and accelerate release cycles, at some risk to safety and security. BreakApp exploits module boundaries to automate compartmentalization of systems and enforce security policies, enhancing reliability and security. BreakApp transparently spawns modules in protected compartments while preserving their original behavior. Optional high-level policies decouple security assumptions made during development from requirements imposed for module composition and use. These policies allow fine-tuning trade-offs such as security and performance based on changing threat models or load patterns. Evaluation of BreakApp with a prototype implementation for JavaScript demonstrates feasibility by enabling simplified security hardening of existing systems with low performance overhead.

Towards Fine-grained, Automated Application Compartmentalization

The rise of language-specific, third-party packages simplifies application development. However, relying on untrusted code poses a threat to security and reliability.

In this work, we propose exploiting module boundaries – and the general trend towards many, small modules – to achieve fine-grained compartmentalization. Automated transformations can hide compartment boundaries and minimize developer effort. Optional policy expressions can decouple security assumptions at development time from requirements during composition and runtime. Using JavaScript’s flourishing ecosystem, we discuss a wide range of risks and sketch how the use of language-level solutions coupled with systemic mechanisms can protect against them.

Detecting Asymmetric Application-layer Denial-of-Service Attacks In-Flight with Finelame

Denial of service (DoS) attacks increasingly exploit algorithmic, semantic, or implementation characteristics dormant in victim applications, often with minimal attacker resources. Practical and efficient detection of these asymmetric DoS attacks requires us to (i) catch offending requests in-flight, before they consume a critical amount of resources, (ii) remain agnostic to the application internals, such as the programming language or runtime system, and (iii) introduce low overhead in terms of both performance and programmer effort.

This paper introduces Finelame, a language-independent framework for detecting asymmetric DoS attacks. Finelame leverages operating system visibility across the entire software stack to instrument key resource allocation and negotiation points. It leverages recent advances in the Linux extended Berkeley Packet Filter virtual machine to attach application-level interposition probes to key request processing functions, and lightweight resource monitors—user/kernel-level probes—to key resource allocation functions. The data collected is used to train a model of resource utilization that occurs throughout the lifetime of individual requests. The model parameters are then shared with the resource monitors, which use them to catch offending requests in-flight, inline with resource allocation. We demonstrate that Finelame can be integrated with legacy applications with minimal effort, and that it is able to detect resource abuse attacks much earlier than their intended completion time while posing low performance overheads.


Active Learning for Software Engineering

Software applications have grown increasingly complex to deliver the features desired by users. Software modularity has been used as a way to mitigate the costs of developing such complex software. Active learning-based program inference provides an elegant framework that exploits this modularity to tackle development correctness, performance and cost in large applications. Inferred programs can be used for many purposes, including generation of secure code, code re-use through automatic encapsulation, adaptation to new platforms or languages, and optimization. We show through detailed examples how our approach can infer three modules in a representative application. Finally, we outline the broader paradigm and open research questions.


TMC: Pay-as-you-Go Distributed Communication

We revisit the gap between what distributed systems need from the transport layer and what protocols in wide deployment provide. Such a gap complicates the implementation of distributed systems and impacts their performance. We introduce Tunable Multicast Communication (TMC), an abstraction that allows developers to easily specialize communication channels in distributed systems. TMC is presented as a deployable and extensible user-space library that exposes high-level tunable guarantees. TMC has the potential of improving the performance of distributed applications with minimal-to-zero development and deployment effort.

Programmable Metadata Processing

Architectural Support for Software-Defined Metadata Processing

Optimized hardware for propagating and checking software-programmable metadata tags can achieve low runtime overhead. We generalize prior work on hardware tagging by considering a generic architecture that supports software-defined policies over metadata of arbitrary size and complexity; we introduce several novel microarchitectural optimizations that keep the overhead of this rich processing low. Our model thus achieves the efficiency of previous hardware-based approaches with the flexibility of the software-based ones. We demonstrate this by using it to enforce four diverse safety and security policies—spatial and temporal memory safety, taint tracking, control-flow integrity, and code and data separation—plus a composite policy that enforces all of them simultaneously. Experiments on SPEC CPU2006 benchmarks with a PUMP-enhanced RISC processor show modest impact on runtime (typically under 10%) and power ceiling (less than 10%), in return for some increase in energy usage (typically under 60%) and area for on-chip memory structures (110%).

PUMP: A Programmable Unit for Metadata Processing

We introduce the Programmable Unit for Metadata Processing (PUMP), a novel software-hardware element that allows flexible computation with uninterpreted metadata alongside the main computation with modest impact on runtime performance (typically 10–40% for single policies, compared to metadata-free computation on 28 SPEC CPU2006 C, C++, and Fortran programs). While a host of prior work has illustrated the value of ad hoc metadata processing for specific policies, we introduce an architectural model for extensible, programmable metadata processing that can handle arbitrary metadata and arbitrary sets of software-defined rules in the spirit of the time-honored 0-1-∞ rule. Our results show that we can match or exceed the performance of dedicated hardware solutions that use metadata to enforce a single policy, while adding the ability to enforce multiple policies simultaneously and achieving flexibility comparable to software solutions for metadata processing. We demonstrate the PUMP by using it to support four diverse safety and security policies—spatial and temporal memory safety, code and data taint tracking, control-flow integrity including return-oriented-programming protection, and instruction/data separation—and quantify the performance they achieve, both singly and in combination.

Internet of Things

Developing Multiplayer Pervasive Games and Networked Interactive Installations Using Ad Hoc Mobile Sensor Nets

We present here Fun in Numbers (FinN,, a framework for developing pervasive applications and interactive installations for entertainment and educational purposes. Using ad hoc mobile wireless sensor network nodes as the enabling devices, FinN allows for the quick prototyping of applications that utilize input from multiple physical sources (sensors and other means of interfacing), by offering a set of programming templates and services, such as topology discovery, localization and synchronization, that hide the underlying complexity. We present the target application domains of FinN, along with a set of multiplayer games and interactive installations. We describe the overall architecture of our platform and discuss some key implementation issues of the application domains. Finally, we present the experience gained by deploying the applications developed with our platform.

Demo: Multiplayer Pervasive Games and Networked Interactive Installations Using Ad Hoc Mobile Sensor Networks

In this work, we showcase a set of implemented multiplayer games and interactive installations based on Fun in Numbers (FinN). FinN allows the quick prototyping of applications that utilize input from multiple physical sources (sensors and other means of interfacing), by offering a set of programming templates and services, such as proximity, localization and synchronization, that hide the underlying complexity.

Using wireless sensor networks to develop pervasive multi-player games

In this work we present two mobile, locative and collaborative distributed games that are played using wireless sensor devices. We briefly present the architecture of the two games and demonstrate their capabilities. The key characteristic of these games is that players interact with each other and their surrounding environment by moving, running and gesturing as a means to perform game related actions, using sensor devices. We demonstrate our system’s implementation, which uses a combination of JAVA Standard and Mobile editions.

A software platform for developing multi-player pervasive games using small programmable object technologies

As of 2008, the total number of mobile phone subscribers has well surpassed the number of 3 billion. Along with the increase in the number of subscribers, there has been an increase of the capabilities of such devices. The vast majority of the current generation of mobile phones are capable of executing J2ME applications. Moreover, manufacturers have started integrating various kinds of sensors into their handsets, e.g., accelerometers or thermistors. Therefore, there is already an existing user base using such devices, that is continually growing. It is our belief that there is great potential in combining sensors and mobile devices to produce exciting entertainment applications. Games have been a major part of the computer industry for the last decades, and are generally recognized as a means of pushing the technological boundaries, both in hardware and software. We expect that pervasive games will transform into a major application field for wireless sensor networks.


A Novel Application of Ubiquitous Computing Using Interactive Installations


Transactions in a Decentralized Control Plane of a Computing System


Network Function Virtualization: Don’t Give up on Least Privilege!

HandsFree: Next Generation Sequence Processing, Mapping and Analysis Made Easy