There are two management protocol independent EVA services provided, the basic Event and Alarm service and the Log Control service. The basic EVA service provides clients with an API for registering and sending events and alarms. The Log control service provides a mechanism for control of generic logs. Also included is a specialization of the generic log function for logging of events and alarms.
Each service provides client functions that can be used from applications in the system to, for example, send alarms. There is also an API that management applications can use to monitor and control the system. This API can be extended for specific management protocols, such as SNMP or CORBA.
This service contains functions for the client API to EVA. EVA is a distributed global application, which means that clients can access the EVA functionality from any node.
Clients can register and send events and alarms. Management applications can subscribe to event and alarms, and control the treatment of them.
An event is a notification sent from the NE to a management application. An event is uniquely identified by
its name. A special form of an event is an alarm. An
alarm represents a fault in the system that needs to be reported
to the manager. An example of an alarm could be
equipment_on_fire
. When an alarm is sent, it becomes
active, and is stored in an active alarm list. When
the application that sent the alarm notices that the fault that
caused the alarm is not valid anymore, it clears the
alarm. When an alarm is cleared, the alarm is deleted from the
active alarm list, and an clear_alarm
event is generated
by EVA. Each fault may give rise to several alarms, maybe with
different severities. There can however only be one active
alarm for each fault at the same time. For example, associated
with disk space usage may be two alarms,
disk_80_percent_filled
and disk_90_percent_filled
.
These two alarms represents the same fault, but only one of them
can be active at the same time. An active alarm is identified
by its fault_id. In contrast to alarms, ordinary
events do not represent faults, and are not stored as the
alarms in the active alarm list.
The basic EVA server is a global server to which all events and
alarms are sent. The server updates its tables (e.g. the active
alarm list), and sends the event or alarm to the
alarm_handler
process that runs on the same node as the
global server. alarm_handler
is a gen_event
process defined in the SASL application.
Before a client can send an event or alarm, the name of the
event must be registered in EVA. To register an event, a client
calls register_event/2
. The parameters of this function
are the name of the event and whether the event should be logged
by default or not. A manager can decide to change this value
later. To register an alarm, a client calls
register_alarm/4
. The parameters of this function are
the name and logging parameters as for events, and the class and
default severity of the alarm.
EVA stores the definitions of events and alarms in the Mnesia
tables eventTable
and alarmTable
respectively.
Since an alarm is a special form of an event, each alarm is present
in both of these tables. The active alarm list is stored in the
Mnesia table alarm
. The records for all these tables
are defined in the header file eva.hrl
, available in the
include
directory in the distribution.
All registered events are stored in the eventTable
.
It has the following attributes:
name
log
generated
The event is uniquely identified by its name
, which is
an atom.
The log
attribute is a boolean flag that
tells whether this event should be stored in some log when it
is generated or not. This attribute is writable.
The generated
attribute is a counter that counts how
many times the event has been generated.
The alarmTable
extends the eventTable
, and has
the following attributes:
name
class
severity
The alarm is uniquely identified by its name
, which is
an atom. Note that each alarm is present in the
eventTable
as well.
The class
attribute categorizes the alarm, and is
defined when the alarm is registered. It is as defined in
X.733, ITU Alarm Reporting Function:
communications
. An alarm of this class is
principally associated with the procedures or processes
required to convey information from one point to another.
qos
. An alarm of this class is principally
associated with a degradation in the quality of service.
processing
. An alarm of this class is
principally associated with a software or processing
fault.
equipment
. An alarm of this class is principally
associated with an equipment fault.
environmental
. An alarm of this class is principally
associated with a condition relating to an enclosure in
with equipment resides.
The severity
parameter defines five severity levels,
which provide an indication of how it is perceived that the
capability of the managed object has been affected. Those severity
levels which represent service affecting conditions ordered
from most severe to least severe are critical
,
major
, minor
and warning
. The levels
used are as defined in X.733, ITU Alarm Reporting Function:
indeterminate
. The Indeterminate severity
level indicates that the severity level cannot be
determined.
critical
. The Critical severity level
indicates that a service affecting condition has occurred
and an immediate corrective action is required. Such a
severity can be reported, for example, when a managed
object becomes totally out of service and its capability
must be restored.
major
. The Major severity level indicates
that a service affecting condition has developed and an
urgent corrective action is required. Such a severity can
be reported, for example, when there is a severe
degradation in the capability of the managed object and
its full capability must be restored.
minor
. The Minor severity level indicates the
existence of a non-service affecting fault condition and
that corrective action should be taken in order to prevent
a more serious (for example, service affecting) fault.
Such a severity can be reported, for example, when the
detected alarm condition is not currently degrading the
capacity of the managed object.
warning
. The Warning severity level indicates
the detection of a potential or impending service
affecting fault, before any significant effects have been
felt. Action should be taken to further diagnose (if
necessary) and correct the problem in order to prevent it
from becoming a more serious service affecting fault.
When an alarm is cleared, a clear_alarm
event is
generated. This event clears the alarm with the
fault_id
contained in the event. It is not required
that the clearing of previously reported alarms are reported.
Therefore, a managing system cannot assume that the absence of
an clear_alarm
event for a fault means that the
condition that caused the generation of previous alarms is
still present. Managed object definers shall state if, and
under which conditions, the clear_alarm
event is used.
The active alarm list is stored in the ordered Mnesia table
alarm
. The corresponding record is sent to the
alarm_handler
when an alarm is sent. It has the
following read-only attributes:
index
fault_id
name
sender
cause
severity
time
extra
A row in the active alarm list is uniquely identified by its
fault_id
. However, to make the table ordered, the
alarms uses the integer index
as a key into the table.
For each new alarm, EVA allocates a new index
that is
greater than the index
of all other active alarms.
The name
is the name of the corresponding alarm type,
defined in alarmTable
.
sender
is a term that uniquely identifies the resource
that generated the alarm.
cause
describes the probable cause of the alarm.
severity
is the perceived severity of the alarm.
time
is the UTC time the alarm was generated.
extra
is any extra information describing the alarm.
When an event is generated, the event
record is sent
to alarm_handler
. It has the following attributes:
name
sender
time
extra
The name
is the name of the corresponding event type,
defined in eventTable
.
sender
is a term that uniquely identifies the resource
that generated the event.
time
is the UTC time the event was generated.
extra
is any extra information describing the event.
As an example of how to register and send events and alarms, consider the following code:
%%%----------------------------------------------------------------- %%% Resource code %%%----------------------------------------------------------------- reg() -> eva:register_event(boardRemoved, true), eva:register_event(boardInserted, false), eva:register_alarm(boardFailure, true, equipment, minor). remove_board(No) -> eva:send_event(boardRemoved, {board, No}, []). insert_board(No, BoardName, BoardType) -> eva:send_event(boardInserted, {board, No}, {BoardName, BoardType}). board_on_fire(No) -> FaultId = eva:get_fault_id(), %% Cause = fire, ExtraParams = [] eva:send_alarm(boardFailure, FaultId, {board, No}, fire, []), FaultId.
Two events and one alarm is defined. Board removal is an
event that is logged by default, and board insertion is
an event that is not logged by default. The alarm
equipmentFailure
is a minor alarm that is logged
by default.
When the application detects that board N
is on fire,
board_on_fire(N)
is called. This function is
responsible for sending the alarm. It gets a new fault
identifier for the fault, and calls eva:send_alarm/5
,
pointing out the faulty board (N
), and suggests that
the probable cause for the equipment trouble is fire
.
The board_on_fire
function returns the fault identifier
for the new alarm. This fault identifier can be used at a
later time in a call to eva:clear_alarm(FaultId)
to
clear the alarm.
The Log Control service contains functions for monitoring logs, and functions for transferring logs to remote hosts, e.g. management stations. The main purpose of the Log Control service is to provide one entity through which all logs in the system can be controlled by a management station. Regardless of the type log, all logs are controlled in a similiar fashion.
Clients can register their logs in the log server. Management applications can control the logs, and transfer the logs to a remote host.
This service uses a log server that monitors all
logs in the system. Each log uses the standard module
disk_log
for the actual logging.
Each log has an administrative and an operational status,
that both can be either up
or down
. If the
operational status is up
, the log is working, and if it
is down
, the log does not work. The administrative
status is writable, and reflects the desired operational
status. Normally they are both the same. If the
administrative status is set to up
, the operational
status will be up
as well. However, if the log for
some reason does not work, e.g. if the disk partition is full,
the operational status will be down
. When the
operational status is down, no events are logged in the log.
The Tlog
service defines two EVA alarms;
log_file_error
and log_wrap_too_often
.
log_file_error
. This alarm is generated if
a file error occurs when an item is logged.
Default severity is critical
. The cause for this
alarm can be any Reason
as returned from
file:write
in case of error. The alarm is
cleared if the file system starts working again. For
example, the alarm can be generated if the partition is
full, and cleared when space is available.
log_wrap_too_often
. This alarm is generated
when the log wraps more often than the wrap time.
Default severity is major
. The cause for this
alarm is undefined. The alarm is cleared if the log
wraps within the wrap time, the next time it wraps.
The following is an example of code that creates a log to be controlled by the generic Log Control function:
start() -> disk_log:open([{name, "ex_log"}, {file, "ex_log/ex_log.LOG"}, {type, wrap}, {size, {10000, 4}}]), log:open("ex_log", ex_log_type, 3600). test() -> %% Log an item disk_log:log("ex_log", {1, "log this"}), %% Set the administrative status of the log to 'down' log:set_admin_status("ex_log", down), %% Try to log - this one won't be logged disk_log:log("ex_log", {2, "won't be logged"}), Logs1 = log:get_logs(), %% Set the administrative status of the log to 'up' log:set_admin_status("ex_log", up), %% Log an item disk_log:log("ex_log", {3, "log this"}), Logged = disk_log:chunk("ex_log", start), {Logs1, Logged}.
It is possible to transfer a log to a remote host. When the log is transferred, the log may be filtered, and the log records may be formatted.
As the logs are implemented as disk_log
logs, each log
consists of several log files. When the log is transferred,
it is written to one single file on the remote host. When
disk_log
is used, the log records are normally not
formatted when they are stored in the log, in order to
increase log performance. However, a manager will probably
need the log formatted in a human readable format. Thus, when
the log is being transferred, each log record may be formatted
in a log specifc way. Of course, to further increase
performance, the log can be transferred as is, and leave it to
the managar to format the log off-line.
The EVA log service uses the generic Log Control service to implement log functionality for events and alarms defined in EVA.
In the rest of this description, the term event refers to both events and alarms as defined in EVA.
This log functionality supports logging of events from EVA. It
uses the module disk_log
for logging of events. There
can be several event logs active at the same time. It is possible to
create new event logs dynamically, either from within an
application, or from a management system. Each log uses a
filter function to decide whether an event should be stored in
the log or not.
There is a concept of a default log. The default log is used
to log any event that has the log
flag in
eventTable
set to true
, but no log is currently
able to store the event (or there is no other log defined to log
the event). The usage of the default log is optional.
For example, suppose that we want to define an alarm log, that logs all alarms in the system. We can do this with the following code:
-module(alarm_log). -export([alarm_filter/1, make_alarm_log/0]). alarm_filter(Item) when record(alarm, Item) -> true; alarm_filter(_) -> false. make_alarm_log() -> disk_log:open([{name, "alarm_log"}, {format, internal}, {type, wrap}, {size, {10000, 10}}]), eva_log:open("alarm_log", {alarm_log, alarm_filter, []}, 36000).
If we set the administrative status of this log to down
,
and an alarm that should be logged according to its definition
in the eventTable
, the alarm is stored in the default log
instead of "alarm log"
(provided there are no other logs
that are defined to log the alarm).