[klee-dev] Using KLEE in LLVM Pass

Thu Jul 20 15:08:51 BST 2017

Hi,

On 19 July 2017 at 13:55, Khaled Yakdan <khaled.yakzan at gmail.com> wrote:
> Hi,
>
> I am working on an LLVM analysis pass where I need to reason about whether
> two values are equal. I am wondering if KLEE can be used for this purpose.
> One idea would be to symbolically execute the given function and then ask
> the solver if two expressions of interest are actually equal. Can KLEE be
> used as previously described? If yes, I would be grateful if you can provide
> some hints how this can be realized.

Unfortunately at this point in time only certain parts of KLEE can
used as a library but it's header layout is a complete mess.
An attempt was made to fix this in [1] but this change was rejected.

It's not impossible to do but you may find the current state of KLEE
makes this a challenge.

You might want to consider whether the reasoning you wish to perform
can be done directly using an SMT solver directly (e.g. Z3) can solve
your problem.
You could also consider doing what the Souper project [2] does. It
uses's KLEE's Expression language which has a close resemblance to
some parts of LLVM IR
and constructs queries and dispatches them to an external solver.

However if you want to go the KLEE route you would need to construct
LLVM IR that effectively does the following

```
// Functions to check
extern int foo0(int arg0);
extern int foo1(int arg0);

int main() {
 int branch = klee_range(0, 2); // symbolic variable constrained to be
in range [0,1]
 int arg0 = 0;
 klee_make_symbolic(&arg0, sizeof(arg0), "arg0");
 int result0 = 0;
 int result1 = 0;
 switch (branch) {
   case 0:
   // First order
   result0 = foo0(arg0);
   result1 = foo1(arg1);
   break;
   case 1:
   // Second order
   result1 = foo1(arg1);
   result0 = foo0(arg0);
   break;
 }
 // Check output equivalence
 // Equivalence is proved if and only the abort()
 // below is never called and path exploration is
 // exhaustive
 if (result0 != result1)
   abort();
}
```

This is essentially a "driver" that sets up the symbolic arguments to
the functions (in the example `foo0()` and `foo1()`) and calls them
appropriately
and checks whether or not they are equivalent.

You might be wondering why the `branch` variable is created (which
itself is symbolic) and then is used to control the order `foo0()` and
`foo1()` are called in.
The reason for this is a bit a subtle. If we call `foo0()` first it
could be the case that for some inputs the function gets terminated
(e.g. `if (arg0 == 0) abort();`).
That would mean that for all paths leaving `foo0()` some values for
`arg0` would be disallowed when executing `foo1()`, even if `foo1()`
does something special
for those values. This means we wouldn't be fully confirming
equivalence for all inputs so it is necessary to also consider calling
the `foo1()` and `foo2()` in the other order.

[1] https://github.com/klee/klee/pull/478
[2] https://github.com/google/souper

HTH,
Dan