9 Library of CIL Modules
We are developing a suite of modules that use CIL for program analyses and
transformations that we have found useful. You can use these modules directly
on your code, or generally as inspiration for writing similar modules. A
particularly big and complex application written on top of CIL is CCured
(../ccured/index.html).
9.1 Points-to Analysis
The module ptranal.ml contains a context-sensitive, interprocedural
points-to analysis for CIL. The analysis has the following
characteristics:
-
Not based on C types (inferred pointer relationships are sound
despite most kinds of C casts)
- One level of subtyping
- One level of context sensitivity
- Monomorphic type structures
- Field insensitive (fields of structs are conflated)
- Demand-driven (points-to queries are solved on demand)
- Handles function pointers
The analysis itself is factored into two components: ptranal.ml,
which walks over the CIL file and generates constraints, and
golf.ml, which solves the constraints. The analysis is invoked
with the function Ptranal.analyze_file: Cil.file -> unit. This
function builds the points-to graph for the CIL file and stores it
internally. There is currently no facility for clearing internal
state, so Ptranal.analyze_file should only be called once.
The constructed points-to graph supports several kinds of queries,
including alias queries (may two expressions be aliased?) and
points-to queries (to what set of locations may an expression point?).
The main interface with the alias analysis is as follows:
-
Ptranal.may_alias : Cil.exp -> Cil.exp -> bool. If true, the
two expressions may have the same value.
- Ptranal.resolve_funptr : Cil.exp -> (Cil.fundec list). Returns
the list of functions to which the given expression may point.
The precision of the analysis can be customized by changing the values
of several flags:
-
Ptranal.no_sub : bool ref. If true, subtyping is
disabled.
- Ptranal.analyze_mono : bool ref. If true, context
sensitivity is disabled and the analysis is effectively monomorphic.
- Ptranal.smart_aliases : bool ref. If true, ``smart''
disambiguation of aliases is enabled. Otherwise, aliases are computed
by intersecting points-to sets (this is an experimental feature).
In practice, the best precision/efficiency tradeoff is achieved by
setting Ptranal.no_sub to false, Ptranal.analyze_mono to
true, and Ptranal.smart_aliases to false. These are the
default values of the flags.
There are also a few flags that can be used to inspect or serialize
the results of the analysis:
-
Ptranal.print_constraints : bool ref. If true, the
analysis will print each constraint as it is generated.
- Ptranal.print_types : bool ref. If true, the analysis
will print the inferred type of each variable in the program. If
Ptranal.analyze_mono and Ptranal.no_sub are both true,
this output is sufficient to reconstruct the points-to graph. One nice
feature is that there is a pretty printer for recursive types, so the
print routine doesn't loop.
- Ptranal.compute_results : bool ref. If true, the
analysis will print out the points-to set of each variable in the
program. This will essentially serialize the points-to graph.
9.2 StackGuard
The module heapify.ml contains a transformation similar to the one
described in ``StackGuard: Automatic Adaptive Detection and Prevention of
Buffer-Overflow Attacks'', Proceedings of the 7th USENIX Security
Conference. In essence it modifies the program to maintain a separate
stack for return addresses. Even if a buffer overrun attack occurs the
actual correct return address will be taken from the special stack.
Although it does work, this CIL module is provided mainly as an example of
how to perform a simple source-to-source program analysis and
transformation. As an optimization only functions that contain a dangerous
local array make use of the special return address stack.
For a concrete example, you can see how cilly --dostackGuard
transforms the following dangerous code:
int dangerous() {
char array[10];
scanf("%s",array); // possible buffer overrun!
}
int main () {
return dangerous();
}
See the CIL output for this
code fragment
9.3 Heapify
The module heapify.ml also contains a transformation that moves all
dangerous local arrays to the heap. This also prevents a number of buffer
overruns.
For a concrete example, you can see how cilly --doheapify
transforms the following dangerous code:
int dangerous() {
char array[10];
scanf("%s",array); // possible buffer overrun!
}
int main () {
return dangerous();
}
See the CIL output for this
code fragment
9.4 One Return
The module oneret.ml contains a transformation the ensures that all
function bodies have at most one return statement. This simplifies a number
of analyses by providing a canonical exit-point.
For a concrete example, you can see how cilly --dooneRet
transforms the following code:
int foo (int predicate) {
if (predicate <= 0) {
return 1;
} else {
if (predicate > 5)
return 2;
return 3;
}
}
See the CIL output for this
code fragment
9.5 Control-Flow Graphs
CIL can reduce high-level C control-flow constructs like switch and
continue to lower-level gotos. This completely eliminates some
possible classes of statements from the program and may make the result
easier to analyze (e.g., it simplifies data-flow analysis).
For a concrete example, you can see how cilly --domakeCFG
transforms the following code (note the fall-through in case 1):
int foo (int predicate) {
int x = 0;
switch (predicate) {
case 0: return 111;
case 1: x = x + 1;
case 2: return (x+3);
case 3: break;
default: return 222;
}
return 333;
}
See the CIL output for this
code fragment
9.6 Partial Evaluation and Constant Folding
The partial.ml module provides a simple interprocedural partial
evaluation and constant folding data-flow analysis and transformation. This
transformation requires the --domakeCFG option.
For a concrete example, you can see how cilly --domakeCFG --dopartial
transforms the following code (note the eliminated if branch and the
partial optimization of foo):
int foo(int x, int y) {
int unknown;
if (unknown)
return y+2;
return x+3;
}
int main () {
int a,b,c;
a = foo(5,7) + foo(6,7);
b = 4;
c = b * b;
if (b > c)
return b-c;
else
return b+c;
}
See the CIL output for this
code fragment
9.7 Simple Memory Operations
The simplemem.ml module allows CIL lvalues that contain memory accesses
to be even futher simplified via the introduction of well-typed
temporaries. After this transformation all lvalues involve at
most one memory reference.
For a concrete example, you can see how cilly --dosimpleMem
transforms the following code:
int main () {
int ***three;
int **two;
***three = **two;
}
See the CIL output for this
code fragment
9.8 Simple Three-Address Code
The simplify.ml module further reduces the complexity of program
expressions and gives you a form of three-address code. After this
transformation all expressions will adhere to the following grammar:
basic::=
Const _
Addrof(Var v, NoOffset)
StartOf(Var v, NoOffset)
Lval(Var v, off), where v is a variable whose address is not taken
and off contains only "basic"
exp::=
basic
Lval(Mem basic, NoOffset)
BinOp(bop, basic, basic)
UnOp(uop, basic)
CastE(t, basic)
lval ::=
Mem basic, NoOffset
Var v, off, where v is a variable whose address is not taken and off
contains only "basic"
In addition, all sizeof and alignof forms are turned into
constants. Accesses to arrays and variables whose address is taken are
turned into "Mem" accesses. All field and index computations are turned
into address arithmetic.
For a concrete example, you can see how cilly --dosimplify
transforms the following code:
int main() {
struct mystruct {
int a;
int b;
} m;
int local;
int arr[3];
int *ptr;
ptr = &local;
m.a = local + sizeof(m) + arr[2];
return m.a;
}
See the CIL output for this
code fragment