Camlp5 undocumented features

Warning: This is a legacy page that is only available for historical reasons. I stopped maintaining this page in spring 2007 when Nicolas Pouillard announced a rewrite of Camlp5 on the OCaml list.

On this page I collected information on Camlp5 that could neither be found in the reference manual nor in the tutorial around 2007. I found out about all that during my work on a quotation system in original syntax.

Undocumented revised syntax

Issue Original syntax Revised syntax
parenthesis required in record update { record_expr with field = expr; ... } { (record_expr) with field = expr; ... }
fun bindings in record field definitions { field = fun a -> ... } { field a = ... }
no bigarray syntax a.{1} Bigarray.Array1.get a 1
a.{1,2,3,4} <- b Bigarray.Genarray.set a [| 1;2;3;4 |] b
nested as patterns require parenthesis function | Some a as opt, b -> ... function [ ((Some a as opt), b) -> ...
manifest types type 'a a = 'a option = None | Some of 'a type 'a a = 'a option == None | Some of 'a
top level quantification on type variables type a = ! 'a . list 'a
abstract module type specifications module type MT module type MT = 'a
class declarations class [ 'a, 'b ] c = ... class c [ 'a, 'b ] = ...
class instantiation [ 'a, 'b ] c c [ 'a, 'b ]
class valued function types typ -> class_type [ typ ] -> class_type
semicolon required after every class fields method virtual m : int method virtual m : int;
order of private and virtual method private virtual m : int method virtual private m : int;
method virtual private m : int
object instance variables val a = ... value a = ...
class constraints constraint t1 = t2 type t1 = t2

SLIST0, SLIST1, SOPT

The module pa_extend_m defines the meta-symbols SLIST0, SLIST1 and SOPT. They behave like LIST0, LIST1 and OPT from pa_extend with the -quotify switch on. That is they produce Qast.t terms of the syntax tree instead of the syntax tree itself. (If you don't understand the last sentence don't worry. I don't understand it either. Its only that q_MLast uses SLIST and SOPT where one would have expected LIST and OPT.)

SLIST0 and SLIST1 auto-magically provide an $list:...$ anti-quotation. SOPT provides $opt:...$. However, I don't understand (yet) where these anti-quotations come from.

There are also some limitations with SLIST:

More Power for Camlp5 recursive descent parsing

There are situations where left recursive descent parsing is just not sufficient. For instance, when you want to parse something outside LL(n) like Ocaml (in the original syntax). There are two possible solutions.

Non-destructive recogniser

Consider the following two rules from the Ocaml grammar:

expr ::= ...
  | { field =  expr  { ; field =  expr } }
  | { expr with  field =  expr  { ; field =  expr } }

You cannot distinguish both rules with an LL(1) grammar, because you might have to parse arbitrarily long to see the difference between field = and expr with.

The Camlp5 solution looks as follows. (The following code is taken from paqo_o.ml. It originates from Daniel de Rauglaudre's pa_o.ml. However, Daniel used the revised syntax.)

    (* stream_peek_nth : int -> 'a Stream.t -> 'a option *)
let stream_peek_nth n strm =
  try
    Some(List.nth (Stream.npeek n strm) (n-1))
  with
    | Failure "nth" -> None
The function stream_peek_nth just returns the n-th element of a stream. Nothing spectacular. Note, that it does not discard elements of the stream.

Now we define an oracle that distinguishes the two grammar rules in question:

    (* test_label_eq : unit Grammar.Entry.e *)
let test_label_eq =
  Grammar.Entry.of_parser gram "test_label_eq"
    (let rec test lev strm =
       match stream_peek_nth lev strm with
	 | Some (("UIDENT", _) | ("LIDENT", _) | ("", ".")) ->
             test (lev + 1) strm
	 | Some ("", "=") -> ()
	 | _ -> raise Stream.Failure
     in
       test 1
    )
First look at the inner test function in olive green. It non-destructively scans the token stream from the Camlp5 lexer. It eats up any sequence of identifiers and dots. If it eventually hits = it returns. If there is a token which is neither an identifier nor a dot it raises Stream.Failure. With Grammar.Entry.of_parser this test function is converted into a grammar entry with the name test_label_eq. It is important that test behaves like a real grammar entry: it raises an exception if it can not parse the input such that Camlp5 can try the next rule.

The Camlp5 grammar is now as follows:

EXTEND
  expr:
    [ ...
    | "simple" LEFTA
      [ ...
      | "{"; test_label_eq; lel = lbl_expr_list; "}" -> ...
      | "{"; e = expr LEVEL "."; "with"; lel = lbl_expr_list; "}" -> ...
   ...
In the first rule we see test_label_eq. It will test if the first rule has to be taken or not. Parsing continues as usual in case test_label_eq returns unit (because it doesn't eat up any elements). If the second rule has to be taken then test_label_eq raises an exception, the first rule is aborted and the next one tried.

Do it yourself

Instead of just recognising you can do the whole parsing yourself. Consider the following rule from paqo_o.ml (again all the following originates from pa_o.ml):
EXTEND
  expr:
    [ ...
    | "simple" LEFTA
      [ ...
      | "("; op = operator_rparen -> ...
   ...
This rule parses infix operators in parenthesis that become normal functions this way. The entry operator_rparen parses an infix operator and the following closing parenthesis. It is defined as follows:
let operator_rparen =
  Grammar.Entry.of_parser gram "operator_rparen"
    (fun strm ->
       match Stream.npeek 2 strm with
	 | [("", s); ("", ")")] when is_operator s ->
	     begin
	       Stream.junk strm; 
	       Stream.junk strm; 
	       s 
	     end
	 | _ -> raise Stream.Failure)
So it is basically a function that checks for some elements on the token stream. If successful it discards them and returns an appropriate semantic value.


last changed on 20 Sep 2011 by Hendrik