This chapter describes the OCaml batch compiler ocamlc, which compiles OCaml source files to bytecode object files and links these object files to produce standalone bytecode executable files. These executable files are then run by the bytecode interpreter ocamlrun.
The ocamlc command has a command-line interface similar to the one of most C compilers. It accepts several types of arguments and processes them sequentially:
If the interface file x.mli exists, the implementation x.ml is checked against the corresponding compiled interface x.cmi, which is assumed to exist. If no interface x.mli is provided, the compilation of x.ml produces a compiled interface file x.cmi in addition to the compiled object code file x.cmo. The file x.cmi produced corresponds to an interface that exports everything that is defined in the implementation x.ml.
The output of the linking phase is a file containing compiled bytecode that can be executed by the OCaml bytecode interpreter: the command named ocamlrun. If a.out is the name of the file produced by the linking phase, the command
ocamlrun a.out arg1 arg2 … argn
executes the compiled code contained in a.out, passing it as arguments the character strings arg1 to argn. (See chapter 10 for more details.)
On most systems, the file produced by the linking phase can be run directly, as in:
./a.out arg1 arg2 … argn
The produced file has the executable bit set, and it manages to launch the bytecode interpreter by itself.
The following command-line options are recognized by ocamlc. The options -pack, -a, -c and -output-obj are mutually exclusive.
If -custom, -cclib or -ccopt options are passed on the command line, these options are stored in the resulting .cma library. Then, linking with this library automatically adds back the -custom, -cclib and -ccopt options as if they had been provided on the command line, unless the -noautolink option is given.
isatty(stderr)
holds.Unix: Never use the strip command on executables produced by ocamlc -custom, this would remove the bytecode part of the executable.
If the given directory starts with +, it is taken relative to the standard library directory. For instance, -I +labltk adds the subdirectory labltk of the standard library to the search path.
ocamlc -pack -o p.cmo a.cmo b.cmo c.cmogenerates compiled files p.cmo and p.cmi describing a compilation unit having three sub-modules A, B and C, corresponding to the contents of the object files a.cmo, b.cmo and c.cmo. These contents can be referenced as P.A, P.B and P.C in the remainder of the program.
The warning-list argument is a sequence of warning specifiers, with no separators between them. A warning specifier is one of the following:
Warning numbers and letters which are out of the range of warnings that are currently defined are ignored. The warnings are as follows.
Some warnings are described in more detail in section 8.5.
The default setting is -w +a-4-6-7-9-27-29-32..39-41..42-44-45. It is displayed by ocamlc -help. Note that warnings 5 and 10 are not always triggered, depending on the internals of the type checker. [The default seems to be -w +a-4-6-7-9-27-29-32..39-41..42-44-45-48-50]
Note: it is not recommended to use warning sets (i.e. letters) as arguments to -warn-error in production code, because this can break your build when future versions of OCaml add some new warnings.
The default setting is -warn-error -a (all warnings are non-fatal).
On native Windows, the following environment variable is also consulted:
This short section is intended to clarify the relationship between the names of the modules corresponding to compilation units and the names of the files that contain their compiled interface and compiled implementation.
The compiler always derives the module name by taking the capitalized base name of the source file (.ml or .mli file). That is, it strips the leading directory name, if any, as well as the .ml or .mli suffix; then, it set the first letter to uppercase, in order to comply with the requirement that module names must be capitalized. For instance, compiling the file mylib/misc.ml provides an implementation for the module named Misc. Other compilation units may refer to components defined in mylib/misc.ml under the names Misc.name; they can also do open Misc, then use unqualified names name.
The .cmi and .cmo files produced by the compiler have the same base name as the source file. Hence, the compiled files always have their base name equal (modulo capitalization of the first letter) to the name of the module they describe (for .cmi files) or implement (for .cmo files).
When the compiler encounters a reference to a free module identifier Mod, it looks in the search path for a file named Mod.cmi or mod.cmi and loads the compiled interface contained in that file. As a consequence, renaming .cmi files is not advised: the name of a .cmi file must always correspond to the name of the compilation unit it implements. It is admissible to move them to another directory, if their base name is preserved, and the correct -I options are given to the compiler. The compiler will flag an error if it loads a .cmi file that has been renamed.
Compiled bytecode files (.cmo files), on the other hand, can be freely renamed once created. That’s because the linker never attempts to find by itself the .cmo file that implements a module with a given name: it relies instead on the user providing the list of .cmo files by hand.
This section describes and explains the most frequently encountered error messages.
If filename has the format mod.cmo, this means you are trying to link a bytecode object file that does not exist yet. Fix: compile mod.ml first.
If your program spans several directories, this error can also appear because you haven’t specified the directories to look into. Fix: add the correct -I options to the command line.
In some cases, it is hard to understand why the two types t1 and t2 are incompatible. For instance, the compiler can report that “expression of type foo cannot be used with type foo”, and it really seems that the two types foo are compatible. This is not always true. Two type constructors can have the same name, but actually represent different types. This can happen if a type constructor is redefined. Example:
type foo = A | B let f = function A -> 0 | B -> 1 type foo = C | D f C
This result in the error message “expression C of type foo cannot be used with type foo”.
Non-generalized type variables in a type cause no difficulties inside a given structure or compilation unit (the contents of a .ml file, or an interactive session), but they cannot be allowed inside signatures nor in compiled interfaces (.cmi file), because they could be used inconsistently later. Therefore, the compiler flags an error when a structure or compilation unit defines a value name whose type contains non-generalized type variables. There are two ways to fix this error:
let sort_int_list = Sort.list (<) (* inferred type 'a list -> 'a list, with 'a not generalized *)write
let sort_int_list = (Sort.list (<) : int list -> int list);;
let map_length = List.map Array.length (* inferred type 'a array list -> int list, with 'a not generalized *)write
let map_length lv = List.map Array.length lv
Of course, you will always encounter this error if you have mutually recursive functions across modules. That is, function Mod1.f calls function Mod2.g, and function Mod2.g calls function Mod1.f. In this case, no matter what permutations you perform on the command line, the program will be rejected at link-time. Fixes:
mod1.ml: let f x = ... Mod2.g ... mod2.ml: let g y = ... Mod1.f ...define
mod1.ml: let f g x = ... g ... mod2.ml: let rec g y = ... Mod1.f g ...and link mod1.cmo before mod2.cmo.
mod1.ml: let forward_g = ref((fun x -> failwith "forward_g") : <type>) let f x = ... !forward_g ... mod2.ml: let g y = ... Mod1.f ... let _ = Mod1.forward_g := g
This section describes and explains in detail some warnings:
Some constructors, such as the exception constructors Failure and Invalid_argument, take as parameter a string value holding a text message intended for the user.
These text messages are usually not stable over time: call sites building these constructors may refine the message in a future version to make it more explicit, etc. Therefore, it is dangerous to match over the precise value of the message. For example, until OCaml 4.02, Array.iter2 would raise the exception
Invalid_argument "arrays must have the same length"
Since 4.03 it raises the more helpful message
Invalid_argument "Array.iter2: arrays must have the same length"
but this means that any code of the form
try ... with Invalid_argument "arrays must have the same length" -> ...
is now broken and may suffer from uncaught exceptions.
Warning 52 is there to prevent users from writing such fragile code in the first place. It does not occur on every matching on a literal string, but only in the case in which library authors expressed their intent to possibly change the constructor parameter value in the future, by using the attribute ocaml.warn_on_literal_pattern (see the manual section on builtin attributes in 7.18.1):
type t = | Foo of string [@ocaml.warn_on_literal_pattern] | Bar of string let no_warning = function | Bar "specific value" -> 0 | _ -> 1 let warning = function | Foo "specific value" -> 0 | _ _ -> 1 > | Foo "specific value" -> 0 > ^^^^^^^^^^^^^^^^ > Warning 52: the argument of this constructor should not be matched against a > constant pattern; the actual value of the argument could change > in the future.
If your code raises this warning, you should not change the way you test for the specific string to avoid the warning (for example using a string equality inside the right-hand-side instead of a literal pattern), as your code would remain fragile. You should instead enlarge the scope of the pattern by matching on all possible values. This may require some care: if the scrutinee may return several different cases of the same pattern, or raise distinct instances of the same exception, you may need to modify your code to separate those several cases.
For example,
try (int_of_string count_str, bool_of_string choice_str) with | Failure "int_of_string" -> (0, true) | Failure "bool_of_string" -> (-1, false)
should be rewritten into more atomic tests. For example, using the exception patterns documented in Section 7.21, one can write:
match int_of_string count_str with | exception (Failure _) -> (0, true) | count -> begin match bool_of_string choice_str with | exception (Failure _) -> (-1, false) | choice -> (count, choice) end
The semantics of or-patterns in OCaml is specified with a left-to-right bias: a value v matches the pattern p | q if it matches p or q, but if it matches both, the environment captured by the match is the environment captured by p, never the one captured by q.
While this property is generally intuitive, there is at least one specific case where a different semantics might be expected. Consider a pattern followed by a when-guard: | p when g -> e, for example:
| ((Const x, _) | (_, Const x)) when is_neutral x -> branch
The semantics is clear: match the scrutinee against the pattern, if it matches, test the guard, and if the guard passes, take the branch. In particular, consider the input (Const a, Const b), where a fails the test is_neutral a, while b passes the test is_neutral b. With the left-to-right semantics, the clause above is not taken by its input: matching (Const a, Const b) against the or-pattern succeeds in the left branch, it returns the environment x -> a, and then the guard is_neutral a is tested and fails, the branch is not taken.
However, another semantics may be considered more natural here: any pair that has one side passing the test will take the branch. With this semantics the previous code fragment would be equivalent to
| (Const x, _) when is_neutral x -> branch | (_, Const x) when is_neutral x -> branch
This is not the semantics adopted by OCaml.
Warning 57 is dedicated to these confusing cases where the specified left-to-right semantics is not equivalent to a non-deterministic semantics (any branch can be taken) relatively to a specific guard. More precisely, it warns when guard uses “ambiguous” variables, that are bound to different parts of the scrutinees by different sides of a or-pattern.