20% Projects

The following are some ideas for ways in which Googlers can get involved with AutoSRS. Some of these projects are already underway and would welcome additional help. Please contact Prof. George Candea, for more information.

Infrastructure

Constraint Solving as a Service

SMT/SAT constraint solving plays a key role in many software verification techniques. Unfortunately, state-of-the-art constraint solvers are too slow for practical program analysis in large software (we are particularly interested in symbolic execution). This is in part because constraint solvers are optimized for winning competitions instead of solving really fast real-world constraints. The goal of this project is to offer constraint solving through a standardized service interface, which will then allow unlimited systems optimizations (caching, parallelization, layering, etc.) behind this interface, as well as employing portfolios of solvers and other unorthodox ideas. The solver service can transparently get better over, while the users of this service (such as program analysis tools) are decoupled and can automatically take advantage of the improvements. Click here for more details.



Parallel Symbolic Execution with S2E

S2E is our framework for multi-path in vivo analysis of binary software systems. Using S2E, we built DDT, a device driver tester that found many bugs in Microsoft-certified drivers, RevNIC, a reverse engineering tool that ports various closed-source NIC drivers to different platforms, as well as a performance analysis utility. S2E systematically tests large binary programs by selective symbolic execution: Relevant parts of a program are executed with symbolic instead of concrete values, allowing coverage unachievable by random testing.

However, S2E is currently still single-threaded. Your goal will be to parallelize S2E and to integrate it with Cloud9, our existing parallel symbolic execution engine that scales to large clusters. Your work will allow to unleash the full power of parallel symbolic execution, finding bugs in commercial software and ultimately making it more reliable.



Graphical User Interface for S2E

S2E is a platform for a full system software analysis that enables developers to explore all potential behavior of their program, by automatically identifying and exercising every possible execution path. The S2E platform was successfully applied by multiple research groups for a variety of tasks ranging from automatic device driver reverse engineering (RevEng) and testing, to multi-path performance profiling. S2E is currently configured prior to execution and has little interaction with its users during execution. S2E users have a hard time monitoring and controlling the evolution of their tests. The purpose of this project is to integrate S2E with existing development platforms, such as gdb and Eclipse. This project will bring an entirely new dimension to software debugging in Eclipse. Imagine performing queries on the debugged program, such as: “find all test scenarios for my program such that x < 15”, or “insert a breakpoint at each possible segmentation fault in my program”. Students who choose this project will gain insight in state-of-the art research (S2E obtained the best paper award at a top-tier conference in 2011), as well as experience in large system development (S2E is based on the QEMU virtual machine).


Applications

Testing PNaCl Applications with Cloud9

PNaCl is a portable version of Google Native Client. PNaCl applications are distributed as LLVM bitcode and are translated from LLVM to native code on the client before execution. Being already available as LLVM bytecode and having well-defined external APIs makes PNaCl applications a perfect target for testing in Cloud9. Such testing can verify properties like absence of crashes and failed assertions, or check how the application handles private information (such as user's location). The project would require implementing a model of NaCl API in Cloud9 and developing an infrastructure (and, preferably, a web interface) to plug a PNaCl application into Cloud9, test it and present the results of the testing to the user.



Classifying Data Races Reported by ThreadSanitizer

Google ThreadSanitizer is a dynamic race detector for C/C++ programs that is used to test Chromium and other key products. As with all dynamic race detectors, ThreadSanitizer reports lots of harmless races. The goal of this project is to integrate ThreadSanitizer with Portend, our race classification tool, which can classify data races according to their severity and the impact they might have during execution. In this way, developers can focus their attention on those data races that matter the most.



Symbolic Debugger

It would be great to be able to debug a program symbolically, especially given our Cloud9 and ESD tools. We would like to interactively step not through just one but through all possible executions originating from a given point; print variable values not as concrete numbers but as mathematical expressions showing how they depend on program inputs; etc. We would like to do all this from the familiar LLDB environment (the new LLVM debugger).


LLVM Debugger

It would be very useful to have an LLVM debugger front end (similar to lldb in XCode) for Linux, which could help, for example, debug the Cloud9 POSIX model or any other LLVM library. This could be written as a Cloud9 module at first and then expanded to be integrated with a development environment. It will require understanding LLVM debug information (which has changed quite a bit over the last few versions of LLVM).




Linux Instrumentation Toolkit for S2E

S2E is a platform for a full system software analysis that enables developers to explore all potential behavior of their program, by automatically identifying and exercising every possible execution path. The S2E platform was successfully applied by multiple research groups for a variety of tasks ranging from automatic device driver reverse engineering (RevEng) and testing (DDT), to multi-path performance profiling. These tools work by looking at the guest OS’s kernel and driver state in order to analyze their behavior, detect bugs, etc. Unfortunately, using S2E to perform any analysis that requires deep introspection or modification of system state is hard as the introspection API provided by S2E is very basic and low-level. The aim of this project is to provide a high-level introspection API for Linux targets in S2E (potentially based on the insight-vmi tool). Such API would significantly simplify the use of S2E for Linux software analysis and enable new applications, such as testing of multi-threaded software, performing security analysis, etc.



Web Interface for Device Driver Testing

Device drivers are the plague of modern software. Consumers have no idea whether the driver that comes with the latest gadget they bought is trustworthy or not. Worse, there is no independent service that would allow them to thoroughly check a driver's reliability. Your goal will be to create such a service by interfacing with DDT, our automated testing tool for binary closed-source device drivers. On the website, consumers should be able to upload their drivers and get a test report summarizing all the bugs found as well as the steps to reproduce them. The challenges include automating the upload process, generating understandable test results, scaling to many clients, and coming up with metrics for grading a driver's reliability.



Debugging Based on Execution Synthesis

Debugging is one of the hardest and most time-consuming activities in developing system software. Bug reports rarely provide much more than a coredump of the failed program, and the developer is required to turn into a detective to determine how the program arrived to the failure point. ESD is a tool for automating most of this detective work. Without requiring any program modifications or execution tracing, ESD finds a feasible execution of the program that exhibits the reported bug. This project's aim is to extend ESD with the ability to efficiently synthesize execution paths that lead to bugs caused by data races.