This document describes the erl_crash.dump
file generated
upon abnormal exit of the Erlang runtime system.
Important: For OTP release R9C the Erlang crash dump has had a major facelift. This means that the information in this document will not be directly applicable for older dumps. However, if you use the Crashdump Viewer tool on older dumps, the crash dumps are translated into a format similar to this.
The system will write the crash dump in the current directory of the emulator or in the file pointed out by the environment variable (whatever that means on the current operating system) ERL_CRASH_DUMP. For a crash dump to be written, there has to be a writable file system mounted.
Crash dumps are written mainly for one of two reasons: either the
builtin function erlang:halt/1
is called explicitly with a
string argument from running Erlang code, or else the runtime
system has detected an error that cannot be handled. The most
usual reason that the system can't handle the error is that the
cause is external limitations, such as running out of memory. A
crash dump due to an internal error may be caused by the system
reaching limits in the emulator itself (like the number of atoms
in the system, or too many simultaneous ets tables). Usually the
emulator or the operating system can be reconfigured to avoid the
crash, which is why interpreting the crash dump correctly is
important.
The erlang crash dump is a readable text file, but it might not be
very easy to read. Using the Crashdump Viewer tool in the
observer
application will simplify the task. This is an
HTML based tool for browsing Erlang crash dumps.
The first part of the dump shows the creation time for the dump, a slogan indicating the reason for the dump, the system version, of the node from which the dump originates, the compile time of the emulator running the originating node and the number of atoms in the atom table.
The reason for the dump is noted in the beginning of the file
as Slogan: <reason> (the word "slogan" has historical
roots). If the system is halted by the BIF
erlang:halt/1
, the slogan is the string parameter
passed to the BIF, otherwise it is a description generated by
the emulator or the (Erlang) kernel. Normally the message
should be enough to understand the problem, but nevertheless
some messages are described here. Note however that the
suggested reasons for the crash are only
suggestions. The exact reasons for the errors may vary
depending on the local applications and the underlying
operating system.
heap
, old_heap
,
heap_frag
, or binary
. For more information on
allocators see
erts_alloc(3).
beam
file damaged or error in the compiler.
|
"Function
Name undefined" |
"No function
Name:Name/1" |
"No function
Name:start/2" - The kernel/stdlib applications are
damaged or the start script is damaged.
N
" - The number of file descriptors for sockets
exceed 1024 (Unix only). The limit on file-descriptors in
some Unix flavors can be set to over 1024, but only 1024
sockets/pipes can be used simultaneously by Erlang (due to
limitations in the Unix select
call). The number of
open regular files is not affected by this.
application_controller
has shut down (Who
= application_controller
,
Why
= shutdown
). The application controller
may have shut down for a number of reasons, the most usual
being that the node name of the distributed Erlang node is
already in use. A complete supervisor tree "crash" (i.e.,
the top supervisors have exited) will give about the same
result. This message comes from the Erlang code and not from
the virtual machine itself. It is always due to some kind of
failure in an application, either within OTP or a
"user-written" one. Looking at the error log for your
application is probably the first step to take.
-boot
parameter or with a boot script from
the wrong version of OTP.
-config
argument)
or faulty configuration files. Check that all files are in
their correct location and that the configuration files (if
any) are not damaged. Usually there are also messages
written to the controlling terminal and/or the error log
explaining what's wrong.
Other errors than the ones mentioned above may occur, as the
erlang:halt/1
BIF may generate any message. If the
message is not generated by the BIF and does not occur in the
list above, it may be due to an error in the emulator. There
may however be unusual messages that I haven't mentioned, that
still are connected to an application failure. There is a lot
more information available, so more thorough reading of the
crash dump may reveal the crash reason. The size of processes,
the number of ets tables and the Erlang data on each process
stack can be useful for tracking down the problem.
The number of atoms in the system at the time of the crash is
shown as Atoms: <number>. Some ten thousands atoms is
perfectly normal, but more could indicate that the BIF
erlang:list_to_atom/1
is used to dynamically generate a
lot of different atoms, which is never a good idea.
Under the tag =memory you will find information similar to what you can obtain on a living node with erlang:memory().
The tags =hash_table:<table_name> and =index_table:<table_name> presents internal tables. These are mostly of interest for runtime system developers.
Under the tag =allocated_areas you will find information similar to what you can obtain on a living node with erlang:system_info(allocated_areas).
Under the tag =allocator:<A> you will find various information about allocator <A>. The information is similar to what you can obtain on a living node with erlang:system_info({allocator, <A>}). For more information see the documentation of erlang:system_info({allocator, <A>}), and the erts_alloc(3) documentation.
The Erlang crashdump contains a listing of each living Erlang process in the system. The process information for one process may look like this (line numbers have been added):
The following fields can exist for a process:
receive
).
erlang:halt/1
was called, this was
the process calling it.
erlang:suspend_process/1
or because it is
trying to write to a busy port.
spawn
or spawn_link
call that
started the process.
spawn
or spawn_link
.
See also the section about process data.
This section lists the open ports, their owners, any linked processed, and the name of their driver or external process.
This section contains information about all the ETS tables in the system. The following fields are interesting for each table:
named_table
, this is the name.
named_table
or not.
ordered_set
.
ordered_set
. (The
number of elements is the same as the number of objects in the
table.)
This section contains information about all the timers started
with the BIFs erlang:start_timer/3
and
erlang:send_after/3
. The following fields exists for each
timer:
If the Erlang node was alive, i.e., set up for communicating with other nodes, this section lists the connections that were active. The following fields can exist:
This section contains information about all loaded modules. First, the memory usage by loaded code is summarized. There is one field for "Current code" which is code that is the current latest version of the modules. There is also a field for "Old code" which is code where there exists a newer version in the system, but the old version is not yet purged. The memory usage is in bytes.
All loaded modules are then listed. The following fields exist:
In this section, all funs are listed. The following fields exist for each fun:
For each process there will be at least one =proc_stack and one =proc_heap tag followed by the raw memory information for the stack and heap of the process.
For each process there will also be a =proc_messages
tag if the process' message queue is non-empty and a
=proc_dictionary tag if the process' dictionary (the
put/2
and get/1
thing) is non-empty.
The raw memory information can be decoded by the Crashdump Viewer tool. You will then be able to see the stack dump, the message queue (if any) and the dictionary (if any).
The stack dump is a dump of the Erlang process stack. Most of
the live data (i.e., variables currently in use) are placed on
the stack; thus this can be quite interesting. One has to
"guess" what's what, but as the information is symbolic,
thorough reading of this information can be very useful. As an
example we can find the state variable of the Erlang primitive
loader on line (5)
in the example below:
(1) 3cac44 Return addr 0x13BF58 (<terminate process normally>) (2) y(0) ["/view/siri_r10_dev/clearcase/otp/erts/lib/kernel/ebin","/view/siri_r10_dev/ (3) clearcase/otp/erts/lib/stdlib/ebin"] (4) y(1) <0.1.0> (5) y(2) {state,[],none,#Fun<erl_prim_loader.6.7085890>,undefined,#Fun<erl_prim_loader.7.9000327>,#Fun<erl_prim_loader.8.116480692>,#Port<0.2>,infinity,#Fun<erl_prim_loader.9.10708760>} (6) y(3) infinity
When interpreting the data for a process, it is helpful to know that anonymous function objects (funs) are given a name constructed from the name of the function in which they are created, and a number (starting with 0) indicating the number of that fun within that function.
Now all the atoms in the system are written. This is only interesting if one suspects that dynamic generation of atoms could be a problem, otherwise this section can be ignored.
Note that the last created atom is printed first.
The format of the crash dump evolves between releases of OTP. Some information here may not apply to your version. A description as this will never be complete; it is meant as an explanation of the crash dump in general and as a help when trying to find application errors, not as a complete specification.