The goal of this project is to statically analyze the program to
verify that its data structures remain consistent. We are interested
in two kinds of consistency:
- Local Consistency: Each individual data structure
(such as a tree, list, or hash table) maintains its representation
invariant. For example, the representation invariant for a doubly-linked
list states that following the next pointer then the prev pointer
produces a pointer to the original list element. The representation
invariant for a hash table states that each entry is correctly indexed.
Representation invariants are typically quite detailed and
universally true for all instantiations of the given data structure.
- Global Consistency: Global consistency properties cut
across traditional encapsulation boundaries to involve multiple data
structures. A global consistency property might state, for example,
that two data structures contain disjoint sets of objects, or that
one data structure contains a subset of the objects present in another
data structure. Global consistency properties often capture some
key requirement of the application domain that must be
reflected in the way that the application represents data from that
domain.
Two problems complicate the realization of this goal. The first
problem is scalability - analyses that are capable of
verifying our target class of sophisticated consistency
requirements simply do not scale to sizable programs and there
is little hope that they will ever do so.
The second problem is diversity. There is an enormous range of
data structures with very different consistency properties.
Consider, for example, a list and a hash table.
The list consistency properties involve multiple objects linked together
with pointers, while the hash table consistency properties involve
arrays and array indices. These different kinds of properties
require analyses that focus on very different kinds of basic
properties.
Our project adopts a new perspective on the data structure
consistency problem: we instead propose a technique that developers can use to
apply multiple pluggable analyses to the same program, with each analysis
applied to the modules for which it is appropriate.
The key features of this technique include:
- Modular Analysis and Shared Objects:
In our approach, the program contains a collection of modules, each of
which encapsulates the implementation of some of the data structures.
Instead of attempting to analyze the entire program,
each analysis operates on a single module.
By focusing each analysis on only those parts of the program
that are relevant for the properties it is designed to verify,
we enable the application of precise analyses to
programs composed of multiple modules.
One factor that complicates this approach is the need for
objects to participate in multiple data structures and
therefore the need to share objects between modules analyzed by
different algorithms.
To eliminate the possibility that one module may corrupt another's
data structures (and to ensure that each algorithm analyzes all of the
relevant code), modules encapsulate fields {\em (and not objects):}
the underlying language prevents one module from accessing the
fields of another module. Each module therefore encapsulates all of the
fields required to implement
its data structure; objects that participate in multiple data
structures from multiple modules contain fields from each of these
modules.
- Sets and Interfaces: Each module has an
implementation and an interface. It is the responsibility of the
analysis to verify that the implementation correctly implements the
interface. Instead of exposing the implementation details of the
encapsulated data structure, the interface uses a collection of
abstract sets to summarize the effect of each procedure. This
collection of sets characterizes how objects participate in various
data structures. For example, the interface for a linked list
might have an abstract set that contains all of the objects in the
linked list. The interface for the insert procedure would
indicate that the procedure adds the inserted object into the set; the
interface for the remove procedure would indicate a corresponding
removal from the set.
- Abstraction Modules and Data Structure
Consistency: Each analysis uses an abstraction module to establish
the connection between the module's implementation and its interface.
This abstraction module enables each analysis to
translate the set membership properties of those objects that cross
module boundaries back into concrete data structure
properties. These concrete properties are often crucial for preserving
the internal consistency of the data structures.
For example, a list insert procedure may
require that an object to be inserted must not already be a member of the
abstract set containing all of the objects in the list. This
set membership property, in turn, may enable the analysis of the insert procedure to
recognize that the object is not already present in the list, which may be
the precondition required to preserve the internal consistency of the list
data structure.
- Properties Involving Multiple Modules:
Systems often maintain important data structure
consistency invariants that involve
multiple data structures and cross encapsulation
boundaries.
For example, some systems
may require the sets of objects that participate in two
data structures to be disjoint; others may require
every object present in one data structure to also be
present in another. Such invariants are typically violated
(temporarily and legitimately) as modules execute a
coordinated sequence of data structure updates.
Because these invariants involve objects that participate
in different data structures encapsulated in different
modules analyzed by different analyses, the
analyses must somehow interoperate if they are to successfully
verify the invariant.
Our approach uses analysis scopes to identify
invariants that involve sets from multiple modules.
Some of these modules are
exported and can be invoked from outside the scope; the other
modules may be invoked only from within the scope. When our analysis
processes an exported module it ensures that, if the invariant holds upon
entry to the exported modules, then it is correctly restored upon exit.
For invariants that involve multiple modules,
the set abstraction enables different analyses to combine
their results to verify that the program properly coordinates its
data structure operations to preserve the invariant. This approach
therefore enables the application of arbitrarily sophisticated
analyses to the modules that require the precision such analyses
can bring, while supporting the use of simpler, more efficient
analyses in the remainder of the program.
Together, these features enable the focused application of a
full range of precise analyses to programs
that contain multiple data structures encapsulated in
multiple modules. They promote the development of a range of
pluggable analyses that developers can deploy as necessary
to verify important data structure consistency properties.
Abstract sets enable
different analyses to communicate and interoperate to verify
properties that cross module
boundaries to involve multiple data structures.
Our approach supports the appropriately verified participation
of objects in multiple data structures, including patterns
in which objects migrate sequentially through different data
structures and patterns in which objects participate in
multiple data structures simultaneously.