This document describes the standard representation of parse trees for Erlang
programs as Erlang terms. This representation is known as the abstract
format.
Functions dealing with such parse trees are compile:forms/[1,2]
and functions in the modules
epp
,
erl_eval
,
erl_lint
,
erl_pp
,
erl_parse
,
and
io
.
They are also used as input and output for parse transforms (see the module
compile
).
We use the function Rep
to denote the mapping from an Erlang source
construct C
to its abstract format representation R
, and write
R = Rep(C)
.
The word LINE
below represents an integer, and denotes the
number of the line in the source file where the construction occurred.
Several instances of LINE
in the same construction may denote
different lines.
Since operators are not terms in their own right, when operators are mentioned below, the representation of an operator should be taken to be the atom with a printname consisting of the same characters as the operator.
A module declaration consists of a sequence of forms that are either function declarations or attributes.
F_1
, ..., F_k
, then
Rep(D) = [Rep(F_1), ..., Rep(F_k)]
.
-module(Mod)
, then
Rep(F) = {attribute,LINE,module,Mod}
.
-export([Fun_1/A_1, ..., Fun_k/A_k])
, then
Rep(F) = {attribute,LINE,export,[{Fun_1,A_1}, ..., {Fun_k,A_k}]}
.
-import(Mod,[Fun_1/A_1, ..., Fun_k/A_k])
, then
Rep(F) = {attribute,LINE,import,{Mod,[{Fun_1,A_1}, ..., {Fun_k,A_k}]}}
.
-compile(Options)
, then
Rep(F) = {attribute,LINE,compile,Options}
.
-file(File,Line)
, then
Rep(F) = {attribute,LINE,file,{File,Line}}
.
-record(Name,{V_1, ..., V_k})
, then
Rep(F) =
{attribute,LINE,record,{Name,[Rep(V_1), ..., Rep(V_k)]}}
. For Rep(V), see below.
-A(T)
, then
Rep(F) = {attribute,LINE,A,T}
.
Name(Ps_1) when Gs_1 -> B_1 ; ... ; Name(Ps_k) when Gs_k -> B_k end
,
where each Ps_i
, Gs_i
and B_i
is a pattern sequence, a guard sequence and a body, respectively, and each Ps_i
has the same length Arity
, then
Rep(F) =
{function,LINE,Name,Arity,
[{clause,LINE,Rep(Ps_1),Rep(Gs_1),Rep(B_1)}, ...,
{clause,LINE,Rep(Ps_k),Rep(Gs_k),Rep(B_k)}]}
.
Each field in a record declaration may have an optional explicit default initializer expression
A
, then
Rep(V) = {record_field,LINE,Rep(A)}
.
A = E
, then
Rep(V) = {record_field,LINE,Rep(A),Rep(E)}
.
In addition to the representations of forms, the list that represents
a module declaration (as returned by functions in erl_parse
and
epp
) may contain tuples {error,E}
, denoting
syntactically incorrect forms, and {eof,LINE}
, denoting an end
of stream encountered before a complete form had been parsed.
There are five kinds of atomic literals, which are represented in the same way in patterns, expressions and guard expressions:
{integer,LINE,L}
.
{float,LINE,L}
.
C_1
, ..., C_k
, then
Rep(L) = {string,LINE,[C_1, ..., C_k]}
.
{atom,LINE,L}
.
Note that negative integer and float literals do not occur as such; they are parsed as an application of the unary negation operator.
If Ps
is a sequence of patterns P_1, ..., P_k
, then
Rep(Ps) = [Rep(P_1), ..., Rep(P_k)]
. Such sequences occur as the
list of arguments to a function or fun.
Individual patterns are represented as follows:
P_1 = P_2
, then
Rep(P) = {match,LINE,Rep(P_1),Rep(P_2)}
.
V
, then
Rep(P) = {var,LINE,A}
,
where A is an atom with a printname consisting of the same characters as
V
.
_
, then
Rep(P) = {var,LINE,'_'}
.
{P_1, ..., P_k}
, then
Rep(P) = {tuple,LINE,[Rep(P_1), ..., Rep(P_k)]}
.
[]
, then
Rep(P) = {nil,LINE}
.
[P_h | P_t]
, then
Rep(P) = {cons,LINE,Rep(P_h),Rep(P_t)}
.
<<P_1:Size_1/TSL_1, ..., P_k:Size_k/TSL_k>>
, then
Rep(E) = {bin,LINE,[{bin_element,LINE,Rep(P_1),Rep(Size_1),Rep(TSL_1)}, ..., {bin_element,LINE,Rep(P_k),Rep(Size_k),Rep(TSL_k)}]}
.
For Rep(TSL), see below.
An omitted Size
is represented by default
. An omitted TSL
(type specifier list) is represented by default
.
P_1 Op P_2
, where Op
is a binary operator (this
is either an occurrence of ++
applied to a literal string or character
list, or an occurrence of an expression that can be evaluated to a number
at compile time),
then Rep(P) = {op,LINE,Op,Rep(P_1),Rep(P_2)}
.
Op P_0
, where Op
is a unary operator (this is an
occurrence of an expression that can be evaluated to a number at compile
time), then Rep(P) = {op,LINE,Op,Rep(P_0)}
.
#Name{Field_1=P_1, ..., Field_k=P_k}
,
then Rep(P) =
{record,LINE,Name,
[{record_field,LINE,Rep(Field_1),Rep(P_1)}, ...,
{record_field,LINE,Rep(Field_k),Rep(P_k)}]}
.
Note that every pattern has the same source form as some expression, and is represented the same way as the corresponding expression.
A body B is a sequence of expressions E_1, ..., E_k
, and
Rep(B) = [Rep(E_1), ..., Rep(E_k)]
.
An expression E is one of the following alternatives:
L
, then
Rep(P) = Rep(L).
P = E_0
, then
Rep(E) = {match,LINE,Rep(P),Rep(E_0)}
.
V
, then
Rep(E) = {var,LINE,A}
,
where A
is an atom with a printname consisting of the same
characters as V
.
{E_1, ..., E_k}
, then
Rep(E) = {tuple,LINE,[Rep(E_1), ..., Rep(E_k)]}
.
[]
, then
Rep(E) = {nil,LINE}
.
[E_h | E_t]
, then
Rep(E) = {cons,LINE,Rep(E_h),Rep(E_t)}
.
<<V_1:Size_1/TSL_1, ..., V_k:Size_k/TSL_k>>
, then
Rep(E) = {bin,LINE,[{bin_element,LINE,Rep(V_1),Rep(Size_1),Rep(TSL_1)}, ..., {bin_element,LINE,Rep(V_k),Rep(Size_k),Rep(TSL_k)}]}
.
For Rep(TSL), see below.
An omitted Size
is represented by default
. An omitted TSL
(type specifier list) is represented by default
.
E_1 Op E_2
, where Op
is a binary operator,
then Rep(E) = {op,LINE,Op,Rep(E_1),Rep(E_2)}
.
Op E_0
, where Op
is a unary operator, then
Rep(E) = {op,LINE,Op,Rep(E_0)}
.
#Name{Field_1=E_1, ..., Field_k=E_k}
, then
Rep(E) =
{record,LINE,Name,
[{record_field,LINE,Rep(Field_1),Rep(E_1)}, ...,
{record_field,LINE,Rep(Field_k),Rep(E_k)}]}
.
E_0#Name{Field_1=E_1, ..., Field_k=E_k}
, then
Rep(E) =
{record,LINE,Rep(E_0),Name,
[{record_field,LINE,Rep(Field_1),Rep(E_1)}, ...,
{record_field,LINE,Rep(Field_k),Rep(E_k)}]}
.
#Name.Field
, then
Rep(E) = {record_index,LINE,Name,Rep(Field)}
.
E_0#Name.Field
, then
Rep(E) = {record_field,LINE,Rep(E_0),Name,Rep(Field)}
.
catch E_0
, then
Rep(E) = {'catch',LINE,Rep(E_0)}
.
E_0(E_1, ..., E_k)
, then
Rep(E) = {call,LINE,Rep(E_0),[Rep(E_1), ..., Rep(E_k)]}
.
E_m:E_0(E_1, ..., E_k)
, then
Rep(E) =
{call,LINE,{remote,LINE,Rep(E_m),Rep(E_0)},[Rep(E_1), ...,
Rep(E_k)]}
.
[E_0 || W_1, ..., W_k]
,
where each W_i
is a generator or a filter, then
Rep(E) = {lc,LINE,Rep(E_0),[Rep(W_1), ..., Rep(W_k)]}
. For Rep(W), see
below.
begin B end
, where B
is a body, then
Rep(E) = {block,LINE,Rep(B)}
.
if Gs_1 -> B_1 ; ... ; Gs_k -> B_k end
,
where each Gs_i
and B_i
is a guard sequence and a body,
respectively, then
Rep(E) =
{'if',LINE,[{clause,LINE,[],Rep(Gs_1),Rep(B_1)}, ...,
{clause,LINE,[],Rep(Gs_k),Rep(B_k)}]}
.
case E_0 of P_1 when Gs_1 -> B_1 ; ... ; P_k when Gs_k -> B_k end
,
where E_0
is an expression and each P_i
, Gs_i
and B_i
is a pattern, a guard sequence and a body, respectively, then
Rep(E) =
{'case',LINE,Rep(E_0),
[{clause,LINE,[Rep(P_1)],Rep(Gs_1),Rep(B_1)}, ...,
{clause,LINE,[Rep(P_k)],Rep(Gs_k),Rep(B_k)}]}
.
receive P_1 when Gs_1 -> B_1 ; ... ; P_k when Gs_k -> B_k end
,
where each P_i
, Gs_i
and B_i
is a pattern, a guard sequence and a body, respectively, then
Rep(E) =
{'receive',LINE,
[{clause,LINE,[Rep(P_1)],Rep(Gs_1),Rep(B_1)}, ...,
{clause,LINE,[Rep(P_k)],Rep(Gs_k),Rep(B_k)}]}
.
receive P_1 when Gs_1 -> B_1 ; ... ; P_k when Gs_k -> B_k after E_0 -> B_t end
,
where each P_i
, Gs_i
and B_i
is a pattern, a guard sequence and a body, respectively, E_0
is an expression and B_t
is a body, then
Rep(E) =
{'receive',LINE,
[{clause,LINE,[Rep(P_1)],Rep(Gs_1),Rep(B_1)}, ...,
{clause,LINE,[Rep(P_k)],Rep(Gs_k),Rep(B_k)}],
Rep(E_0),Rep(B_t)}
.
fun Name/Arity
, then
Rep(E) = {'fun',LINE,{function,Name,Arity}}
.
fun Ps_1 when Gs_1 -> B_1 ; ... ; Ps_k when Gs_k -> B_k end
,
where each Ps_i
, Gs_i
and B_i
is a pattern sequence, a guard sequence and a body, respectively, then Rep(E) =
{'fun',LINE,{clauses,
[{clause,LINE,[Rep(Ps_1)],Rep(Gs_1),Rep(B_1)},
...,
{clause,LINE,[Rep(Ps_k)],Rep(Gs_k),Rep(B_k)}]}}
.
query [E_0 || W_1, ..., W_k] end
,
where each W_i
is a generator or a filter, then
Rep(E) = {'query',LINE,{lc,LINE,Rep(E_0),[Rep(W_1), ..., Rep(W_k)]}}
.
For Rep(W), see below.
E_0.Field
, a Mnesia record access
inside a query, then
Rep(E) = {record_field,LINE,Rep(E_0),Rep(Field)}
.
( E_0 )
, then
Rep(E) = Rep(E_0)
,
i.e., parenthesized expressions cannot be distinguished from their bodies.
When W is a generator or a filter (in the body of a list comprehension), then:
P <- E
, where P
is a pattern and E
is an expression, then
Rep(W) = {generate,LINE,Rep(P),Rep(E)}
.
E
, which is an expression, then
Rep(W) = Rep(E)
.
A type specifier list TSL for a binary element is a sequence of type
specifiers TS_1 - ... - TS_k
.
Rep(TSL) = [Rep(TS_1), ..., Rep(TS_k)]
.
When TS is a type specifier for a binary element, then:
A
, Rep(TS) = A
.
A:Value
where A
is an atom and Value
is an integer, Rep(TS) = {A, Value}
.
A guard Gs is a nonempty sequence of guard tests G_1, ..., G_k
, and
Rep(Gs) = [Rep(G_1), ..., Rep(G_k)]
.
A guard sequence Gss is a sequence of guards Gs_1; ...; Gs_k
, and
Rep(Gss) = [Rep(Gs_1), ..., Rep(Gs_k)]
. If the guard sequence is
empty, Rep(Gss) = []
.
A guard test G is either true
, an application of a BIF to a sequence
of guard expressions (syntactically this includes guard record tests),
or a binary operator applied to two guard expressions.
true
, then
Rep(G) = {atom,LINE,true}
.
A(E_1, ..., E_k)
, where A
is an atom and E_1
, ..., E_k
are guard expressions, then
Rep(G) = {call,LINE,{atom,LINE,A},[Rep(E_1), ..., Rep(E_k)]}
.
E_1 Op E_2
, where Op
is a binary operator, and E_1
, E_2
are guard
expressions, then
Rep(G) = {op,LINE,Op,Rep(E_1),Rep(E_2)}
.
All guard expressions are expressions and are represented in the same way as the corresponding expressions.
When Erlang source code is compiled, the abstract code, after some
preprocessing, is stored as the abstract_code
chunk in the BEAM
file, for debugging purposes. The version of the preprocessed format
in OTP R7 is called abstract_v1
, in R8 abstract_v2
. The
preprocessing changes the representation so it becomes slightly
incompatible with the format described above. The differences are:
{remote, ...}
form (which is not allowed in source form).
{'fun',LINE,{clauses, Clauses},Extra}
. The form of this extra
element may change from one OTP release to the next.
{'fun',LINE,{function,Name,Arity},Extra}
.