This document describes the erl_crash.dump
file generated
upon abnormal exit of the Erlang runtime system.
The system will write the crash dump in the current directory of the emulator or in the file pointed out by the environment variable (whatever that means on the current operating system) ERL_CRASH_DUMP. For a crash dump to be written, there has to be a writable file system mounted.
Crash dumps are written mainly for one of two reasons: either the
builtin function erlang:halt/1
is called explicitly with a
string argument from running Erlang code,
or else the runtime system has detected an
error that cannot be handled. The most usual reason that the
system can't handle the error is that the cause is external
limitations, such as running out of memory. A crash dump due to an
internal error may be caused by the system reaching limits in the
emulator itself (like the number of atoms in the
system, or too many simultaneous ets tables). Usually the emulator or
the operating system can be reconfigured to avoid the crash,
which is why
interpreting the crash dump correctly is important.
The reason for the dump is noted in the beginning of the file
as Slogan: <reason>
(the word "slogan" has historical
roots). If the system is halted by the BIF
erlang:halt/1
, the slogan is the string parameter passed
to the BIF, otherwise it is a description generated
by the emulator or the (Erlang) kernel. Normally the message
should be enough to understand the problem, but nevertheless
some messages are described here. Note however that the
suggested reasons for the crash are only
suggestions. The exact reasons for the errors may vary
depending on the local applications and the underlying operating
system.
N
indicates the amount of
memory needed (in bytes), which could give some hint of what
the problem is. If N
is very large, it could be that an
Erlang process consumes vast amounts of memory, possibly due to
an error in the Erlang code.
beam
file damaged or error in the compiler.
|
"Function Name
undefined" |
"No function
Name:Name/1" |
"No function Name:start/2" - The kernel/stdlib
applications are damaged or the start script is damaged.
N
" -
The number of file descriptors for sockets exceed 1024 (Unix only). The
limit on file-descriptors in some Unix flavors can be set to over
1024, but only 1024 sockets/pipes can be used simultaneously
by Erlang (due to limitations in the Unix select
call).
The number of open regular files is not affected by this.
application_controller
has shut down (Who
=
application_controller
, Why
= shutdown
).
The application controller may have shut down for
a number of reasons, the most usual being that the node name of
the distributed Erlang node is already in use. A complete
supervisor tree "crash" (i.e., the top supervisors have exited)
will give about the same result. This message comes from the
Erlang code and not from the virtual machine itself. It is
always due to some kind of failure in an application, either
within OTP or a "user-written" one. Looking at the error log for
your application is probably the first step to take.
-boot
parameter or with a boot script from the wrong
version of OTP.
-config
argument) or
faulty configuration files. Check that all files are in their
correct location and that the configuration files (if any) are not
damaged. Usually there are also messages written to the
controlling terminal and/or the error log explaining what's
wrong.
Other errors than the ones mentioned above may occur, as the
erlang:halt/1
BIF may generate any message. If the
message is not generated by the BIF and does not occur in the
list above, it may be due to an error in the emulator. There may
however be unusual messages that I haven't mentioned, that still
are connected to an application failure. There is a lot more
information available, so more thorough reading of the crash
dump may reveal the crash reason. The size of
processes, the number of ets tables and the Erlang data on each
process stack can be useful for tracking down the problem.
After the general information in the crash dump (the date, slogan and version information) follows a listing of each living Erlang process in the system, and zombie processes. The process information for one process may look like this (line numbers have been added):
(1) <0.2.0> Waiting. Registered as: erl_prim_loader (2) Spawned as: erl_prim_loader:start_it/4 (3) Message buffer data: 262 words (4) Link list: [<0.0.0>,<0,1>] (5) Dictionary: [{fake, entry}] (6) Reductions 2194 stack+heap 987 old_heap_sz=987 (7) Heap unused=85 OldHeap unused=987 (8) Stack dump: (9) program counter = 0x1875e4 (erl_prim_loader:loop/3 + 52) (10) cp = 0xed830 (<terminate process normally>) (11) arity = 0 (12) (13) 1d4ae0 Return addr 0xED830 (<terminate process normally>) (14) y(0) ["/usr/local/product/releases/otp_beam_sunos5_r7b_patched/lib/kernel-2.6.1.6/ebin","/usr/local/product/releases/otp_beam_sunos5_r7b_patched/lib/stdlib-1.9.3/ebin"] (15) y(1) <0.1.0> (16) y(2) {state,[],none,get_from_port_efile,stop_port,exit_port,#Port<0.2>,infinity,dummy_in_handler} (17) y(3) infinity
Each line of the output should be interpreted as follows:
<0.2.0>
), the state
of the process (Waiting
) and the registered name of the
process, if any (erl_prim_loader
). The state of the
process can be one of the following:
receive
).
erlang:halt/1
was called, this was
the process calling it.
erlang:suspend_process/1
or because it is
trying to write to a busy port.
spawn
or spawn_link
call that started the process.
put/2
and get/1
thing), if non-empty.
(16)
.
When interpreting the data for a process, it is helpful to know that anonymous function objects (funs) are given a name constructed from the name of the function in which they are created, and a number (starting with 0) indicating the number of that fun within that function.
This section lists the open ports, their owners, any linked processed, and the name of their driver or external process.
This section mostly contains information for runtime system developers. What can be of interest is the following fields:
objs(
N)
, is the number of atoms present
in the system at the time of the crash. Some ten thousands atoms
is perfectly normal, but more could indicate that
the BIF erlang:list_to_atom/1
is used to dynamically generate a
lot of different atoms, which is never a good idea.
objs(
N)
indicates the number of loaded
modules in the system.
N
- This number indicates
how many bytes are allocated to binaries (the binary data type)
for the whole system. Binaries allocated directly on process
heaps (small binaries) are not accounted for here.
The rest of the information is only of interest for runtime system developers.
This section contains information about all the ETS tables in the system. The following fields are interesting for each table:
|
Ordered set (AVL tree),
Elements: N - The most interesting here is that it
indicates whether the table is a ordered_set
or not.
This section contains information about all the timers started
with the BIFs erlang:start_timer/3
and
erlang:send_after/3
. Each line
includes the message to be sent, the pid to receive the message
and how many milliseconds were left until the message
would have been sent.
If the Erlang node was alive, i.e., set up for communicating with other nodes, this section lists the connections that were active.
This is a list of all loaded modules, together with the memory usage of each module, in bytes. Note that loaded code is usually larger than the packed format in the beam files.
At the end of the list, the memory usage by loaded code is summarized. There is one field for "Current code" which is code that is the current latest version of the modules. There is also a field for "Old code" which is code where there exists a newer version in the system, but the old version is not yet purged.
Now all the atoms in the system are written. This is only interesting if one suspects that dynamic generation of atoms could be a problem, otherwise this section can be ignored.
The format of the crash dump evolves between releases of OTP. Some information here may not apply to your version. A description as this will never be complete; it is meant as an explanation of the crash dump in general and as a help when trying to find application errors, not as a complete specification.