Memcheck: runtime typechecking for Ocaml

Warning: This is a legacy page that is only displayed for historical reasons. I stopped developing Memcheck in 2007 because Olmar development stopped then and Memcheck was only developed for Olmar. In contrast to Olmar there exist no alternatives to Memcheck. In 2016 I updated the sources to compile with camlp5 and OCaml 4.03.

Memcheck can check at runtime whether a given value has a certain type. This is useful

after unmarshalling data from disk
after creating an ocaml value in a C program

Memcheck is very similar to SafeUnmarshal. I wrote Memcheck as a substitute for SafeUnmarshal when I could not longer live with the deficiencies, performance problems and the seemingly dead status of SafeUnmarshal (see this message and this message on the ocaml mailing list).

Strictly speaking Memcheck (and SafeUnmarshal) only checks if the memory representation of a value agrees with a given type. The memory representation for, e.g. 0 and None are the same, therefore checking None against the type int will succeed. Similarly a tuple can be considered as a record or an array.

Status

Memcheck is just good enough to be used in the regression test of Olmar. If you want to use it for something else you might have to do some nontrivial extensions. I am a bit reluctent to invest more time than absolutely needed into this, because Memcheck will fall into oblivion when the problems with SafeUnmarshal are resolved.

Please drop me a line at hendrik@askra.de if you are interested in Memcheck and it doesn't work out of the box. I will then try to help out.

Download / Installation

install camlp5

download the last cvs snapshot: memcheck-2016-10-14.tgz (should compile with camlp5 6.16 and OCaml 4.03.0)
compilation: make all

Usage

generate the runtime type descriptions from the source:
```
echo "open Memcheck;;" > type_descr.ml
camlp5o path_to_memcheck/generate_type_descr.cmo type_def.ml >> type_descr.ml
```
This will generate a constant name_type_descr for every type definition found. For parametric type definitions it will be a function taking a type description argument for each parameter type.

compile the runtime type descriptions:

ocamlc -I path_to_memcheck -c type_descr.ml

use it like

open Memcheck;;
check [Channel stdout; Verbose_trace] some_value (<type_name>_type_descr arguments ...)

Comparison with SafeUnmarshal

Memcheck was partly inspired by SafeUnmarshal and shares a lot of idias with it. The most important differances are

Memcheck is lightening fast in comparison with SafeUnmarshal: 50 seconds instead of several hours for checking 7 MB of data. See timing comparisons plot 1 and plot 2. These graphs permit only one possible interpretation: Either SafeUnmarshal has some big additional feature or there is a servere bug either in Memcheck or in SafeUnmarshal.
On error Memcheck can print a trace, which helps a lot, if the error is in the value and not in the type.
Memcheck can do arrays types and therefore hash tables. I haven't checked, but there are certainly types that SafeUnmarshal does support while Memcheck does not.
Memcheck relies on camlp4 while SafeUnmarshal exploits compiler support.
Memcheck runs with all ocaml versions while SafeUnmarshal is only available in the ty patch.

Technology

The most important point is that Memcheck needs unique tags or identifications for type expressions that occor during runtime type checking. Because

Memcheck supports circular data structures and it therefore must remember visited memory blocks.
Memcheck supports shared data and it therefore must remember the type that a block is supposed to have (in order to check that all fields that refer to the same shared block have the same type).
Type expressions are circular (think of the list type) and can therefore not be compared with the builtin structural equality.

Memcheck does the following to ensure that every type that occurs during runtime type checking is associated with precisely one tag:

Memcheck distinguishes between type expressions and type constructors. Type definitions in the source are in a straightforward way translated into type constructor definitions. Therefore, in the beginning there are only type constructors. Even builtin constant types like int are given as type constructors. To form a type one has to apply a type constructor to a (possibly empty) list of argument types.
Every type constructor gets a (hopefully) unique id (containing the source file and the location therein).
During runtime time checking types are computed from type constructor applications. First Memcheck computes type expressions for all arguments and thereby the unique tags of the arguments. Then a type constructor hash is consulted in order to see if this type constructor has already been applied to these arguments. If not a new type is built by substituting the arguments and finitely the new type gets a new tag.

The remainder is straightforward: Every memory block that is checked is registered in a hash table with the tag of its type. To detect sharing and cycles this hash is consulted. One only has to use structural equality in this hash (such that equal memory blocks at different addresses can have different types, like for instance for the two lists [0] and [None]).

Type-constructor applications are expanded when they are met during runtime type checking. Type-constructor applications are always boxed in a reference cell, which gets updated once the application is performed. Therefore, for instance, to check an int list, the list type constructor needs only to be applied twice: first at top level and then inside the list type constructor.

Known Problems

Does not support all base/builtin types yet. Please complain if you hit something.
Does not support types defined inside modules and functors. Modules should be easy. I have not thought about functors. The problem is that different functor applications yield incompatible types and therefore must yield different type constructor definitions.
Performance decreases with larger values because the hash function Hashtbl.hash_param is not good enough for this job. It tends to give the same hash to a large number of blocks such that some buckets in the value hash table have thousends of entries (while the average bucket fill is below 2). I'll believe (but haven't checked) that in those examples the values that end up in one overfilled bucket are different.
Complexity is quadratic in the number of non-shared structurally equal blocks. But it should be linear in the number of shared and structurally different blocks. I believe it is impossible to do better without changing the garbage collector or modifying the blocks themselves.

Contact

Please contact me at hendrik@askra.de with anything Memcheck related.

last changed on 14 Oct 2016 by Hendrik