Memcheck: runtime typechecking for Ocaml
Warning: This is a legacy page that is only
displayed for historical reasons. I stopped developing Memcheck
in 2007 because Olmar development stopped
then and Memcheck was only developed for Olmar. In contrast to
Olmar there exist no alternatives to Memcheck. In 2016 I updated
the sources to compile with camlp5 and OCaml 4.03.
Memcheck can check at runtime whether a given value has a certain
type. This is useful
- after unmarshalling data from disk
- after creating an ocaml value in a C program
Memcheck is very similar to SafeUnmarshal. I wrote Memcheck as a substitute for
SafeUnmarshal when I could not longer live with the deficiencies,
performance problems and the seemingly dead status of
SafeUnmarshal (see this message and this message on the ocaml mailing list).
Strictly speaking Memcheck (and SafeUnmarshal) only checks if the
memory representation of a value agrees with a given type. The
memory representation for, e.g. 0 and None
are the same, therefore checking None against the type
int will succeed. Similarly a tuple can be considered
as a record or an array.
Page navigation
Status
Memcheck is just good enough to be used in the regression test of
Olmar. If you want
to use it for something else you might have to do some nontrivial
extensions. I am a bit reluctent to invest more time than
absolutely needed into this, because Memcheck will fall into
oblivion when the problems with SafeUnmarshal are resolved.
Please drop me a line at hendrik@askra.de
if you are interested in Memcheck and it doesn't work out of the
box. I will then try to help out.
- install camlp5
- download the
last cvs snapshot: memcheck-2016-10-14.tgz (should compile with camlp5 6.16 and
OCaml 4.03.0)
- compilation: make all
Usage
- generate the runtime type descriptions from the source:
echo "open Memcheck;;" > type_descr.ml
camlp5o path_to_memcheck/generate_type_descr.cmo type_def.ml >> type_descr.ml
This will generate a constant name_type_descr for
every type definition found. For parametric type definitions it
will be a function taking a type description argument for each
parameter type.
- compile the runtime type descriptions:
ocamlc -I path_to_memcheck -c type_descr.ml
- use it like
open Memcheck;;
check [Channel stdout; Verbose_trace] some_value (<type_name>_type_descr arguments ...)
Comparison with SafeUnmarshal
Memcheck was partly inspired by SafeUnmarshal and shares a lot of idias with it. The most
important differances are
- Memcheck is lightening fast in comparison
with SafeUnmarshal: 50 seconds instead of several hours for
checking 7 MB of data. See timing comparisons
plot
1 and plot 2.
These graphs permit only one possible interpretation: Either
SafeUnmarshal has some big additional feature or there is a
servere bug either in Memcheck or in SafeUnmarshal.
- On error Memcheck can print a trace, which
helps a lot, if the error is in the value and not in the type.
- Memcheck can do arrays types and therefore hash tables. I
haven't checked, but there are certainly types that SafeUnmarshal
does support while Memcheck does not.
- Memcheck relies on camlp4 while SafeUnmarshal exploits
compiler support.
- Memcheck runs with all ocaml versions while
SafeUnmarshal is only available in the ty patch.
Technology
The most important point is that Memcheck needs unique
tags or identifications for type expressions that occor during runtime
type checking. Because
- Memcheck supports circular data structures and it therefore
must remember visited memory blocks.
- Memcheck supports shared data and it therefore must remember
the type that a block is supposed to have (in order to check that
all fields that refer to the same shared block have the same type).
- Type expressions are circular (think of the list type) and
can therefore not be compared with the builtin structural equality.
Memcheck does the following to ensure that every type that occurs
during runtime type checking is associated with precisely one
tag:
- Memcheck distinguishes between type expressions and type
constructors. Type definitions in the source are in a
straightforward way translated into type constructor definitions.
Therefore, in the beginning there are only type constructors.
Even builtin constant types like int are given as type
constructors. To form a type one has to apply a type constructor
to a (possibly empty) list of argument types.
- Every type constructor gets a (hopefully) unique id
(containing the source file and the location therein).
- During runtime time checking types are computed from type
constructor applications. First Memcheck computes type
expressions for all arguments and thereby the unique tags of the
arguments. Then a type constructor hash is consulted in order to
see if this type constructor has already been applied to these
arguments. If not a new type is built by substituting the
arguments and finitely the new type gets a new tag.
The remainder is straightforward: Every memory block that is
checked is registered in a hash table with the tag of its type.
To detect sharing and cycles this hash is consulted. One only has
to use structural equality in this hash (such that equal memory
blocks at different addresses can have different types, like for
instance for the two lists [0] and [None]).
Type-constructor applications are expanded when they are met
during runtime type checking. Type-constructor applications are
always boxed in a reference cell, which gets updated once the
application is performed. Therefore, for instance, to check an
int list, the list type constructor needs
only to be applied twice: first at top level and then inside the
list type constructor.
- Does not support all base/builtin types yet. Please complain if you
hit something.
- Does not support types defined inside modules and functors.
Modules should be easy. I have not thought about functors. The
problem is that different functor applications yield incompatible
types and therefore must yield different type
constructor definitions.
- Performance decreases with larger values because the hash
function Hashtbl.hash_param is not good enough for this job. It
tends to give the same hash to a large number of blocks such that
some buckets in the value hash table have thousends of entries
(while the average bucket fill is below 2). I'll believe (but
haven't checked) that in those examples the values that end up in
one overfilled bucket are different.
- Complexity is quadratic in the number of non-shared
structurally equal blocks. But it should be linear in the number
of shared and structurally different blocks. I believe it is
impossible to do better without changing the garbage collector or
modifying the blocks themselves.
Contact
Please contact me at hendrik@askra.de with anything
Memcheck related.
last changed on
14 Oct 2016
by Hendrik