[Ericsson Utvecklings AB]

httpd

MODULE

httpd

MODULE SUMMARY

An implementation of an HTTP 1.0 compliant Web server, as defined in RFC 1945.

DESCRIPTION

HTTP (Hypertext Transfer Protocol) is an application-level protocol with the lightness and speed necessary for distributed, collaborative and hyper-media information systems. The httpd module handles HTTP 1.0 as described in RFC 1945 with a few exceptions such as gateway and proxy functionality. The same is true for servers written by NCSA and others.

The server implements numerous features such as SSL (Secure Sockets Layer), ESI (Erlang Scripting Interface), CGI (Common Gateway Interface), User Authentication(using Mnesia, dets or plain text database), Common Logfile Format (with or without disk_log(3) support), URL Aliasing, Action Mappings, Directory Listings and SSI (Server-Side Includes).

The configuration of the server is done using Apache-style run-time configuration directives. The goal is to be plug-in compatible with Apache but with enhanced fault-tolerance, scalability and load-balancing characteristics.

All server functionality has been implemented using an especially crafted server API; EWSAPI (Erlang Web Server API). This API can be used to advantage by all who wants to enhance the server core functionality, for example custom logging and authentication.

RUN-TIME CONFIGURATION

All functionality in the server can be configured using Apache-style run-time configuration directives stored in a configuration file. Take a look at the example config files in the conf directory (UNIX: $INETS_ROOT/examples/server_root/conf/, Windows: %INETS_ROOT%\examples\server_root\conf\) of the server root for a complete understanding.

An alphabetical list of all config directives:

EWSAPI MODULES

All server functionality below has been implemented using EWSAPI (Erlang Web Server API) modules. The following modules all have separate manual pages (mod_cgi(3), mod_auth(3), ...):

httpd_core
Core features.
mod_actions
Filetype/method-based script execution.
mod_alias
Aliases and redirects.
mod_auth
User authentication using text files, mnesia or dets
mod_cgi
Invoking of CGI scripts.
mod_dir
Basic directory handling.
mod_esi
Efficient Erlang Scripting.
mod_get
HTTP GET Method
mod_head
HTTP HEAD Method
mod_include
Server-parsed documents.
mod_log
Standard logging in the Common Logfile Format using text files.
mod_disk_log
Standard logging in the Common Logfile Format using disk_log(3).

The Modules config directive can be used to alter the server behavior, that is to alter the EWSAPI Module Sequence. An example module sequence can be found in the example config directory. If this needs to be altered read the EWSAPI Module Interaction section below.

EXPORTS

start()
start(ConfigFile) -> ServerRet
start_link()
start_link(ConfigFile) -> ServerRet

Types:

ConfigFile = string()
ServerRet = {ok,Pid} | ignore | {error,EReason} | {stop,SReason}
Pid = pid()
EReason = {already_started, Pid} | term()
SReason = string()

start/1 and start_link/1 starts a server as specified in the given ConfigFile. The ConfigFile supports a number of config directives specified below.

start/0 and start/0 starts a server as specified in a hard-wired config file, that is start("/var/tmp/server_root/conf/8888.conf"). Before utilizing start/0 or start_link/0, copy the example server root (UNIX: $INETS_ROOT/examples/server_root/, Windows: %INETS_ROOT%\examples\server_root\) to a specific installation directory (UNIX: /var/tmp/, Windows: X:\var\tmp\) and you have a server running in no time.

If you copy the example server root to the specific installation directory it is furthermore easy to start an SSL enabled server, that is start("/var/tmp/server_root/conf/ssl.conf").

restart()
restart(Port) -> ok | {error,Reason}
restart(ConfigFile) -> ok | {error,Reason}
restart(Address,Port) -> ok | {error,Reason}

Types:

Port = integer()
Address = {A,B,C,D} | string() | undefined
ConfigFile = string()
Reason = term()

restart restarts the server and reloads its config file.

The follwing directives cannot be changed: BindAddress, Port and SocketType. If these should be changed, then a new server should be started instead.

Note!

Before the restart function can be called the server must be blocked. After restart has been called, the server must be unblocked.

stop()
stop(Port) -> ServerRet
stop(ConfigFile) -> ServerRet
stop(Address,Port) -> ServerRet

Types:

Port = integer()
Address = {A,B,C,D} | string() | undefined
ConfigFile = string()
ServerRet = ok | not_started

stop/2 stops the server which listens to the specified Port on Address. stop(integer()) stops a server which listens to a specific Port. stop(string()) extracts BindAddress and Port from the config file and stops the server which listens to the specified Port on Address. stop/0 stops a server which listens to port 8888, that is stop(8888).

block() -> ok | {error,Reason}
block(Port) -> ok | {error,Reason}
block(ConfigFile) -> ok | {error,Reason}
block(Address,Port) -> ok | {error,Reason}
block(Port,Mode) -> ok | {error,Reason}
block(ConfigFile,Mode) -> ok | {error,Reason}
block(Address,Port,Mode) -> ok | {error,Reason}
block(ConfigFile,Mode,Timeout) -> ok | {error,Reason}
block(Address,Port,Mode,Timeout) -> ok | {error,Reason}

Types:

Port = integer()
Address = {A,B,C,D} | string() | undefined
ConfigFile = string()
Mode = disturbing | non_disturbing
Timeout = integer()
Reason = term()

This function is used to block a server. The blocking can be done in two ways, disturbing or non-disturbing.

By performing a disturbing block, the server is blocked forcefully and all ongoing requests are terminated. No new connections are accepted. If a timeout time is given then on-going requests are given this much time to complete before the server is forcefully blocked. In this case no new connections is accepted.

A non-disturbing block is more gracefull. No new connections are accepted, but the ongoing requests are allowed to complete. If a timeout time is given, it waits this long before giving up (the block operation is aborted and the server state is once more not-blocked)

Default mode is disturbing.

Default port is 8888

unblock() -> ok | {error,Reason}
unblock(Port) -> ok | {error,Reason}
unblock(ConfigFile) -> ok | {error,Reason}
unblock(Address,Port) -> ok | {error,Reason}

Types:

Port = integer()
Address = {A,B,C,D} | string() | undefined
ConfigFile = string()
Reason = term()

Unblocks a server. If the server is already unblocked this is a no-op. If a block is ongoing, then it is aborted (this will have no effect on ongoing requests).

parse_query(QueryString) -> ServerRet

Types:

QueryString = string()
ServerRet = [{Key,Value}]
Key = Value = string()

parse_query/1 parses incoming data to erl and eval scripts (See mod_esi(3)) as defined in the standard URL format, that is '+' becomes 'space' and decoding of hexadecimal characters (%xx).

EWSAPI MODULE PROGRAMMING

Note!

The Erlang/OTP programming knowledge required to undertake an EWSAPI module is quite high and is not recommended for the average server user. It is best to only use it to add core functionality, e.g. custom authentication or a RFC 2109 implementation.

Warning!

The current implementation of EWSAPI is under review and feedback is welcomed.

EWSAPI should only be used to add core functionality to the server. In order to generate dynamic content, for example on-the-fly generated HTML, use the standard CGI or ESI facilities instead.

As seen above the major part of the server functionality has been realized as EWSAPI modules (from now on only called modules). If you intend to write your own server extension start with examining the standard modules (UNIX: $INETS_ROOT/src/, Windows: %INETS_ROOT%\src\) mod_*.erl and note how to they are configured in the example config directory (UNIX: $INETS_ROOT/examples/server_root/conf/, Windows: %INETS_ROOT%\examples\server_root\conf\).

Each module implements do/1 (mandatory), load/2, store/2 and remove/1. The latter functions are needed only when new config directives are to be introduced (See EWSAPI Module Configuration below).

A module can choose to export functions to be used by other modules in the EWSAPI Module Sequence (See Modules config directive). This should only be done as an exception! The goal is to keep each module self-sustained thus making it easy to alter the EWSAPI Module Sequence without any unneccesary module dependencies.

A module can furthermore use data generated by previous modules in the EWSAPI Module Sequence or generate data to be used by consecutive EWSAPI modules. This is made possible due to an internal list of key-value tuples (See EWSAPI Module Interaction below).

Note!

The server executes do/1 (using apply/1) for each module listed in the Modules config directive. do/1 takes the record mod as an argument, as described below. See httpd.hrl (UNIX: $INETS_ROOT/src/httpd.hrl, Windows: %INETS_ROOT%\src\httpd.hrl):

-record(mod,{data=[],
             socket_type=ip_comm,
             socket,
             config_db,
             method,
             request_uri,
             http_version,
             request_line,
             parsed_header=[],
             entity_body}).
    

The fields of the mod record has the following meaning:

data
Type [{InteractionKey,InteractionValue}] is used to propagate data between modules (See EWSAPI Module Interaction below). Depicted interaction_data() in function type declarations.
socket_type
Type ip_comm | ssl, that is the socket type.
socket
The actual socket in ip_comm or ssl format depending on the socket_type.
config_db
The config file directives stored as key-value tuples in an ETS-table. Depicted config_db() in function type declarations.
method
Type "GET" | "POST" | "DELETE" | "PUT", that is the HTTP method.
request_uri
The Request-URI as defined in RFC 1945, for example "/cgi-bin/find.pl?person=jocke"
request_line
The Request-Line as defined in RFC 1945, for example "GET /cgi-bin/find.pl?person=jocke HTTP/1.0".
parsed_header
Type [{HeaderKey,HeaderValue}], that is all HTTP header fields stored in a list of key-value tuples. See RFC 1945 for a listing of all header fields, for example {date,"Wed, 15 Oct 1997 14:35:17 GMT"}.
entity_body
The Entity-Body as defined in RFC 1945, for example data sent from a CGI-script using the POST method.

A do/1 function typically uses a restricted set of the mod record's fields to do its stuff and then returns a term depending on the outcome, that is {proceed,NewData} | {break,NewData} | done which has the following meaning (OldData refers to the data field in the incoming mod record):

{proceed,OldData}
Proceed to next module as nothing happened.
{proceed,[{response,{StatusCode,Response}}|OldData]}
A generated response (Response) should be sent back to the client including a status code (StatusCode) as defined in RFC 1945.
{proceed,[{response,{already_sent,StatusCode,Size}}|OldData]}
A generated response has already manually been sent back to the client, using the socket provided by the mod record (see above), including a valid status code (StatusCode) as defined in RFC 1945 and the size (Size) of the response in bytes.
{proceed,[{status,{StatusCode,PhraseArgs,Reason}}}|OldData]}
A generic status message should be sent back to the client (if the next module in the EWSAPI Module Sequence does not think otherwise!) including at status code (StatusCode) as defined in RFC 1945, a term describing how the client will be informed (PhraseArgs) and a reason (Reason) to why it happened. Read more about PhraseArgs in httpd_util:message/3.
{break,NewData}
Has the same semantics as proceed above but with one important exception; No more modules in the EWSAPI Module Sequence are executed. Use with care!
done
No more modules in the EWSAPI Module Sequence are executed and no response should be sent back to the client. If no response is sent back to the client, using the socket provided by the mod record, the client will typically get a "Document contains no data...".

Warning!

Each consecutive module in the EWSAPI Module Sequence can choose to ignore data returned from the previous module either by trashing it or by "enhancing" it.

Keep in mind that there exist numerous utility functions to help you as an EWSAPI module programmer, e.g. nifty lookup of data in ETS-tables/key-value lists and socket utilities. You are well advised to read httpd_util(3) and httpd_socket(3).

EWSAPI MODULE CONFIGURATION

An EWSAPI module can define new config directives thus making it configurable for a server end-user. This is done by implementing load/2 (mandatory), store/2 and remove/1.

The config file is scanned twice (load/2 and store/2) and a cleanup is done (remove/1) during server shutdown. The reason for this is: "A directive A can be dependent upon another directive B which occur either before or after directive A in the config file". If a directive does not depend upon other directives; store/2 can be left out. Even remove/1 can be left out if neither load/2 nor store/2 open files or create ETS-tables etc.

load/2 takes two arguments. The first being a row from the config file, that is a config directive in string format such as "Port 80". The second being a list of key-value tuples (which can be empty!) defining a context. A context is needed because there are directives which defines inner contexts, that is directives within directives, such as <Directory>. load/2 is expected to return:

eof
End-of-file found.
ok
Ignore the directive.
{ok,ContextList}
Introduces a new context by adding a tuple to the context list or reverts to a previous context by removing a tuple from the context list. See <Directory> which introduces a new context and </Directory> which reverts to a previous one (Advice: Look at the source code for mod_auth:load/2).
{ok,ContextList,[{DirectiveKey,DirectiveValue}]}
Introduces a new context (see above) and defines a new config directive, e.g. {port,80}.
{ok,ContextList,[{DirectiveKey,DirectiveValue}]}
Introduces a new context (see above) and defines a several new config directives, e.g. [{port,80},{foo,on}].
{error,Reason}
An invalid directive.

A naive example from mod_log.erl:

load([$T,$r,$a,$n,$s,$f,$e,$r,$L,$o,$g,$ |TransferLog],[]) ->
  {ok,[],{transfer_log,httpd_conf:clean(TransferLog)}};
load([$E,$r,$r,$o,$r,$L,$o,$g,$ |ErrorLog],[]) ->
  {ok,[],{error_log,httpd_conf:clean(ErrorLog)}}.
    

store/2 takes two arguments. The first being a tuple describing a directive ({DirectiveKey,DirectiveValue}) and the second argument a list of tuples describing all directives ([{DirectiveKey,DirectiveValue}]). This makes it possible for directive A to be dependent upon the value of directive B. store/2 is expected to return:

{ok,{DirectiveKey,NewDirectiveValue}}
Introduces a new value for the specified directive replacing the old one generated by load/2.
{ok,[{DirectiveKey,NewDirectiveValue}]}
Introduces new values for the specified directives replacing the old ones generated by load/2.
{error,Reason}
An invalid directive.

A naive example from mod_log.erl:

store({error_log,ErrorLog},ConfigList) ->
  case create_log(ErrorLog,ConfigList) of
    {ok,ErrorLogStream} ->
      {ok,{error_log,ErrorLogStream}};
    {error,Reason} ->
      {error,Reason}
  end.
    

remove/1 takes the ETS-table representation of the config-file as input. It is up to you to cleanup anything you opened or created in load/2 or store/2. remove/1 is expected to return:

ok
If the cleanup was successful.
{error,Reason}
If the cleanup failed.

A naive example from mod_log.erl:

remove(ConfigDB) ->
  lists:foreach(fun([Stream]) -> file:close(Stream) end,
                ets:match(ConfigDB,{transfer_log,'$1'})),
  lists:foreach(fun([Stream]) -> file:close(Stream) end,
                ets:match(ConfigDB,{error_log,'$1'})),
  ok.
    

Keep in mind that there exists numerous utility functions to help you as an EWSAPI module programmer, e.g. nifty lookup of data in ETS-tables/key-value lists and configure utilities. You are well advised to read httpd_conf(3) and httpd_util(3).

EWSAPI MODULE INTERACTION

Modules in the EWSAPI Module Sequence uses the mod record's data field to propagate responses and status messages, as seen above. This data type can be used in a more versatile fashion. A module can prepare data to be used by subsequent EWSAPI modules, for example the mod_alias module appends the tuple {real_name,string()} to inform subsequent modules about the actual file system location for the current URL.

Before altering the EWSAPI Modules Sequence you are well advised to observe what types of data each module uses and propagates. Read the "EWSAPI Interaction" section for each module.

An EWSAPI module can furthermore export functions to be used by other EWSAPI modules but also for other purposes, for example mod_alias:path/3 and mod_auth:add_user/5. These functions should be described in the module documentation.

Note!

When designing an EWSAPI module try to make it self-contained, that is avoid being dependent on other modules both concerning exchange of interaction data and the use of exported functions. If you are dependent on other modules do state this clearly in the module documentation!

You are well advised to read httpd_util(3) and httpd_conf(3).

BUGS

If a Web browser connect itself to an SSL enabled server using a URL not starting with https:// the server will hang due to an ugly bug in the SSLeay package!

SEE ALSO

httpd_core(3), httpd_conf(3), httpd_socket(3), httpd_util(3), inets(6), mod_actions(3), mod_alias(3), mod_auth(3), mod_security(3), mod_cgi(3), mod_dir(3), mod_disk_log(3), mod_esi(3), mod_include(3), mod_log(3)

AUTHORS

Joakim Grebenö - support@erlang.ericsson.se
Torbjörn Törnkvist - support@erlang.ericsson.se
Joe Armstrong - support@erlang.ericsson.se

inets 2.6.5
Copyright © 1991-2002 Ericsson Utvecklings AB