[klee-dev] Option for generating MC/DC testcases

Wed May 11 08:34:31 BST 2016

Hi Damir,

On 9 May 2016 at 01:55, Damir <lost404 at gmail.com> wrote:
>
> Hello everyone!
>
> I've recently started using KLEE, and I'm wondering how hard it would be to
> implement full MC/DC testcases generation in KLEE.
>
> You can read about MC/DC on Wiki
>
> https://en.wikipedia.org/wiki/Modified_condition/decision_coverage
>
> I've done some analysis, and figured out how it can be done in klee,
> theoretically,

Your analysis here isn't really correct here. If KLEE worked directly
on C code your analysis might be in some cases but that is not how
KLEE is implemented. KLEE works on LLVM IR so that needs to be
considered when thinking about implementing new features. I also don't
think you are trying to do will work in general as it assumes
"conditions" are simple expressions that don't contain forking.

> As I see it, implementing such feature would require tampering with forking
> mechanism in KLEE, in following ways:
>
> 1) Allow breaking a complex decision into conditions, for example
>
>     if (A && B && C)
> the decision is (A && B && C) here, and conditions are A, B and C.
>
>  All conditions are functions of symbolic variables, returning boolean, and
> combined into decision using only boolean logic.

This is wrong in several ways.

First in the LLVM IR that KLEE interprets the conditions will likely
appear as separate branch instructions as they are short-circuiting
operators (they might not depending on if LLVM's optimizer modifies
the branches)

Second your conditions in general might not be plain expressions. They
could be function calls for example which may contain forking. E.g.

```
void foo(const char* a) {
  if ( strlen(a) == 5 && a[0]='h' && (a[1] == 'i] || a[1] == 'x')) {
      // do something
  }

}
```

one or more of the conditionals may change the state of the program too.

> 2) Instead of forking on whether decision is true or false, forks are based
> on conditions.
>
> Before (current KLEE):
>
> fork 1: Decision(A, B, C) == false
> fork 2: Decision(A, B, C) == true

As stated above KLEE does not work this way.

> MC/DC way:
>
> Before forking, set of conditions (in addition to previous ones) must be
> solved:
>
> For example, for condition A:
>
> A(x1, y1, ... z1) == false
> A(x2, y2, ... z2) == true
>
> B(x1, y1, ... z1) == Bfixed
> B(x2, y2, .... z2) == Bfixed
>
> C(x1, y1, ... z1) == Cfixed
> C(x2, y2, .... z2) == Cfixed
>
> Decision(x1, y1, ... z1) != Decision(x2, y2, ... z2)
>
> If there is a solution (Bfixed, Cfixed) to those constraints, two forks are
> created with following conditions on them:
>
> fork 1: A == true, B == Bfixed, C == Cfixed
> fork 2: A == false, B == Bfixed, C == Cfixed

As stated above this will not work in general because your conditions
may not be simple expressions.

However if your conditions are simple expressions it would be
technically possible to change the forking behaviour to target MC/DC
coverage. You would need to perform some sort of LLVM pass before
running that

- labels the different conditions (probably using LLVM's metadata) so
that later we can work out which expressions correspond to what
conditions
- Merge the basic blocks and branching used in the short circuit
operators so there is only a single branch instruction corresponding
to the ``if()``

For these merged branches we would handle them specially by trying for
each condition to fork on that condition while holding the other
conditions as the having the same value on either side of the fork. If
for a particular condition we can't fork then we know that the
condition can't affect the branch and that for the particular path we
are on that condition is irrelevant. Note I said "particular path"
here! There could be another path where the condition could affect the
branch. I'm not sure if MC/DC says anything about this (i.e. whether a
condition must independently effect a branch for all paths leading to
the condition or if just a single path must do this).