NAME
sge_conf - Grid Engine configuration files
DESCRIPTION
sge_conf defines the global and local Grid Engine configura-
tions and can be shown/modified by qconf(1) using the
-sconf/-mconf options. Only root or the cluster administra-
tor may modify sge_conf.
At its initial start-up, sge_qmaster(8) checks to see if a
valid Grid Engine configuration is available at a well known
location in the Grid Engine internal directory hierarchy.
If so, it loads that configuration information and proceeds.
If not, sge_qmaster(8) writes a generic configuration con-
taining default values to that same location. The Grid
Engine execution daemons sge_execd(8) upon start-up retrieve
their configuration from sge_qmaster(8).
The actual configuration for both sge_qmaster(8) and
sge_execd(8) is a superposition of a so called global confi-
guration and a local configuration being pertinent for the
host on which a master or execution daemon resides. If a
local configuration is available, its entries overwrite the
corresponding entries of the global configuration. Note: The
local configuration does not have to contain all valid con-
figuration entries, but only those which need to be modified
against the global entries.
FORMAT
The paragraphs that follow provide brief descriptions of the
individual parameters that compose the global and local con-
figurations for a Grid Engine cluster:
execd_spool_dir
The execution daemon spool directory path. Again, a feasible
spool directory requires read/write access permission for
root. The entry in the global configuration for this parame-
ter can be overwritten by execution host local configura-
tions, i.e. each sge_execd(8) may have a private spool
directory with a different path, in which case it needs to
provide read/write permission for the root account of the
corresponding execution host only.
Under execd_spool_dir a directory named corresponding to the
unqualified hostname of the execution host is opened and
contains all information spooled to disk. Thus, it is possi-
ble for the execd_spool_dirs of all execution hosts to phy-
sically reference the same directory path (the root access
restrictions mentioned above need to be met, however).
Being a parameter set at installation time changing the glo-
bal execd_spool_dir in a running system is not supported. If
the change should still be proceeded it is required to res-
tart all effected execution daemons. Please make sure run-
ning jobs have finished before doing so. If you do so run-
ning jobs will be lost.
The default location for the execution daemon spool direc-
tory is <sge_root>/<cell>/spool.
The global configuration entry for this value may be
overwritten by the execution host local configuration.
mailer
mailer is the absolute pathname to the electronic mail
delivery agent on your system. It must accept the following
syntax:
mailer -s <subject-of-mail-message> <recipient>
Each sge_execd(8) may use a private mail agent. Changing
mailer will take immediate effect.
The default for mailer depends on the operating system of
the host on which the Grid Engine master installation was
run. Common values are /bin/mail or /usr/bin/Mail.
The global configuration entry for this value may be
overwritten by the execution host local configuration.
xterm
xterm is the absolute pathname to the X Window System termi-
nal emulator, xterm(1).
Each sge_execd(8) may use a private mail agent. Changing
xterm will take immediate effect.
The default for xterm is /usr/bin/X11/xterm.
The global configuration entry for this value may be
overwritten by the execution host local configuration.
load_sensor
A comma separated list of executable shell script paths or
programs to be started by sge_execd(8) and to be used in
order to retrieve site configurable load information (e.g.
free space on a certain disk partition).
Each sge_execd(8) may use a set of private load_sensor pro-
grams or scripts. Changing load_sensor will take effect
after two load report intervals (see load_report_time). A
load sensor will be restarted automatically if the file
modification time of the load sensor executable changes.
The global configuration entry for this value may be
overwritten by the execution host local configuration.
In addition to the load sensors configured via load_sensor,
sge_exec(8) searches for an executable file named qloadsen-
sor in the execution host's Grid Engine binary directory
path. If such a file is found, it is treated like the con-
figurable load sensors defined in load_sensor. This facility
is intended for pre-installing a default load sensor.
prolog
The executable path of a shell script that is started before
execution of Grid Engine jobs with the same environment set-
ting as that for the Grid Engine jobs to be started after-
wards. An optional prefix "user@" specifies the user under
which this procedure is to be started. The procedures stan-
dard output and the error output stream are written to the
same file used also for the standard output and error output
of each job. This procedure is intended as a means for the
Grid Engine administrator to automate the execution of gen-
eral site specific tasks like the preparation of temporary
file systems with the need for the same context information
as the job. Each sge_execd(8) may use a private prologue
script. Correspondingly, the execution host local configura-
tions is can be overwritten by the queue configuration (see
queue_conf(5) ). Changing prolog will take immediate effect.
The default for prolog is the special value NONE, which
prevents from execution of a prologue script.
The following special variables being expanded at runtime
can be used (besides any other strings which have to be
interpreted by the procedure) to constitute a command line:
$host
The name of the host on which the prolog or epilog pro-
cedures are started.
$job_owner
The user name of the job owner.
$job_id
Grid Engine's unique job identification number.
$job_name
The name of the job.
$processors
The processors string as contained in the queue confi-
guration (see queue_conf(5)) of the master queue (the
queue in which the prolog and epilog procedures are
started).
$queue
The cluster queue name of the master queue instance,
i.e. the cluster queue in which the prolog and epilog
procedures are started.
$stdin_path
The pathname of the stdin file. This is always
/dev/null for prolog, pe_start, pe_stop and epilog. In
the job script it is the pathname of the stdin file for
the job. When delegated file staging is enabled, this
path is set to $fs_stdin_tmp_path. When delegated file
staging is not enabled, it is the stdin pathname given
via DRMAA or qsub.
$stdout_path
$stderr_path
The pathname of the stdout/stderr file. This always
points to the output/error file. When delegated file
staging is enabled, this path is set to
$fs_stdout_tmp_path/$fs_stderr_tmp_path. When delegated
file staging is not enabled, it is the stdout/stderr
pathname given via DRMAA or qsub.
$merge_stderr
If merging of stderr and stdout is requested, this flag
is "1", otherwise it is "0". If this flag is 1, stdout
and stderr are merged in one file, the stdout file.
Merging of stderr and stdout can be requested via the
DRMAA job template attribute 'drmaa_join_files' (see
drmaa_attributes(3) ) or the qsub parameter '-j y' (see
qsub(1) ).
$fs_stdin_host
When delegated file staging is requested for the stdin
file, this is the name of the host where the stdin file
has to be copied from before the job is started.
$fs_stdout_host
$fs_stderr_host
When delegated file staging is requested for the
stdout/stderr file, this is the name of the host where
the stdout/stderr file has to be copied to after the
job has run.
$fs_stdin_path
When delegated file staging is requested for the stdin
file, this is the pathname of the stdin file on the
host $fs_stdin_host.
$fs_stdout_path
$fs_stderr_path
When delegated file staging is requested for the
stdout/stderr file, this is the pathname of the
stdout/stderr file on the host
$fs_stdout_host/$fs_stderr_host.
$fs_stdin_tmp_path
When delegated file staging is requested for the stdin
file, this is the destination pathname of the stdin
file on the execution host. The prolog script must copy
the stdin file from $fs_stdin_host:$fs_stdin_path to
localhost:$fs_stdin_tmp_path to establish delegated
file staging of the stdin file.
$fs_stdout_tmp_path
$fs_stderr_tmp_path
When delegated file staging is requested for the
stdout/stderr file, this is the source pathname of the
stdout/stderr file on the execution host. The epilog
script must copy the stdout file from
localhost:$fs_stdout_tmp_path to
$fs_stdout_host:$fs_stdout_path (the stderr file from
localhost:$fs_stderr_tmp_path to
$fs_stderr_host:$fs_stderr_path) to establish delegated
file staging of the stdout/stderr file.
$fs_stdin_file_staging
$fs_stdout_file_staging
$fs_stderr_file_staging
When delegated file staging is requested for the
stdin/stdout/stderr file, the flag is set to "1", oth-
erwise it is set to "0" (see in delegated_file_staging
how to enable delegated file staging).
These three flags correspond to the DRMAA job template
attribute 'drmaa_transfer_files' (see
drmaa_attributes(3) ).
The global configuration entry for this value may be
overwritten by the execution host local configuration.
epilog
The executable path of a shell script that is started after
execution of Grid Engine jobs with the same environment set-
ting as that for the Grid Engine jobs that has just com-
pleted. An optional prefix "user@" specifies the user under
which this procedure is to be started. The procedures stan-
dard output and the error output stream are written to the
same file used also for the standard output and error output
of each job. This procedure is intended as a means for the
Grid Engine administrator to automate the execution of gen-
eral site specific tasks like the cleaning up of temporary
file systems with the need for the same context information
as the job. Each sge_execd(8) may use a private epilogue
script. Correspondingly, the execution host local configura-
tions is can be overwritten by the queue configuration (see
queue_conf(5) ). Changing epilog will take immediate
effect.
The default for epilog is the special value NONE, which
prevents from execution of a epilogue script. The same
special variables as for prolog can be used to constitute a
command line.
The global configuration entry for this value may be
overwritten by the execution host local configuration.
shell_start_mode
This parameter defines the mechanisms which are used to
actually invoke the job scripts on the execution hosts. The
following values are recognized:
unix_behavior
If a user starts a job shell script under UNIX interac-
tively by invoking it just with the script name the
operating system's executable loader uses the informa-
tion provided in a comment such as `#!/bin/csh' in the
first line of the script to detect which command inter-
preter to start to interpret the script. This mechanism
is used by Grid Engine when starting jobs if
unix_behavior is defined as shell_start_mode.
posix_compliant
POSIX does not consider first script line comments such
a `#!/bin/csh' as being significant. The POSIX standard
for batch queuing systems (P1003.2d) therefore requires
a compliant queuing system to ignore such lines but to
use user specified or configured default command inter-
preters instead. Thus, if shell_start_mode is set to
posix_compliant Grid Engine will either use the command
interpreter indicated by the -S option of the qsub(1)
command or the shell parameter of the queue to be used
(see queue_conf(5) for details).
script_from_stdin
Setting the shell_start_mode parameter either to
posix_compliant or unix_behavior requires you to set
the umask in use for sge_execd(8) such that every user
has read access to the active_jobs directory in the
spool directory of the corresponding execution daemon.
In case you have prolog and epilog scripts configured,
they also need to be readable by any user who may exe-
cute jobs.
If this violates your site's security policies you may
want to set shell_start_mode to script_from_stdin. This
will force Grid Engine to open the job script as well
as the epilogue and prologue scripts for reading into
STDIN as root (if sge_execd(8) was started as root)
before changing to the job owner's user account. The
script is then fed into the STDIN stream of the command
interpreter indicated by the -S option of the qsub(1)
command or the shell parameter of the queue to be used
(see queue_conf(5) for details).
Thus setting shell_start_mode to script_from_stdin also
implies posix_compliant behavior. Note, however, that
feeding scripts into the STDIN stream of a command
interpreter may cause trouble if commands like rsh(1)
are invoked inside a job script as they also process
the STDIN stream of the command interpreter. These
problems can usually be resolved by redirecting the
STDIN channel of those commands to come from /dev/null
(e.g. rsh host date < /dev/null). Note also, that any
command-line options associated with the job are passed
to the executing shell. The shell will only forward
them to the job if they are not recognized as valid
shell options.
Changes to shell_start_mode will take immediate effect. The
default for shell_start_mode is posix_compliant.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
login_shells
UNIX command interpreters like the Bourne-Shell (see sh(1))
or the C-Shell (see csh(1)) can be used by Grid Engine to
start job scripts. The command interpreters can either be
started as login-shells (i.e. all system and user default
resource files like .login or .profile will be executed when
the command interpreter is started and the environment for
the job will be set up as if the user has just logged in) or
just for command execution (i.e. only shell specific
resource files like .cshrc will be executed and a minimal
default environment is set up by Grid Engine - see qsub(1)).
The parameter login_shells contains a comma separated list
of the executable names of the command interpreters to be
started as login-shells. Shells in this list are only
started as login shells if the parameter shell_start_mode
(see above) is set to posix_compliant.
Changes to login_shells will take immediate effect. The
default for login_shells is sh,csh,tcsh,ksh.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
min_uid
min_uid places a lower bound on user IDs that may use the
cluster. Users whose user ID (as returned by getpwnam(3)) is
less than min_uid will not be allowed to run jobs on the
cluster.
Changes to min_uid will take immediate effect. The default
for min_uid is 0.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
min_gid
This parameter sets the lower bound on group IDs that may
use the cluster. Users whose default group ID (as returned
by getpwnam(3)) is less than min_gid will not be allowed to
run jobs on the cluster.
Changes to min_gid will take immediate effect. The default
for min_gid is 0.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
user_lists
The user_lists parameter contains a comma separated list of
so called user access lists as described in access_list(5).
Each user contained in at least one of the enlisted access
lists has access to the cluster. If the user_lists parameter
is set to NONE (the default) any user has access being not
explicitly excluded via the xuser_lists parameter described
below. If a user is contained both in an access list
enlisted in xuser_lists and user_lists the user is denied
access to the cluster.
Changes to user_lists will take immediate effect
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
xuser_lists
The xuser_lists parameter contains a comma separated list of
so called user access lists as described in access_list(5).
Each user contained in at least one of the enlisted access
lists is denied access to the cluster. If the xuser_lists
parameter is set to NONE (the default) any user has access.
If a user is contained both in an access list enlisted in
xuser_lists and user_lists (see above) the user is denied
access to the cluster.
Changes to xuser_lists will take immediate effect
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
administrator_mail
administrator_mail specifies a comma separated list of the
electronic mail address(es) of the cluster administrator(s)
to whom internally-generated problem reports are sent. The
mail address format depends on your electronic mail system
and how it is configured; consult your system's configura-
tion guide for more information.
Changing administrator_mail takes immediate effect. The
default for administrator_mail is an empty mail list.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
projects
The projects list contains all projects which are granted
access to Grid Engine. User belonging to none of these pro-
jects cannot use Grid Engine. If users belong to projects in
the projects list and the xprojects list (see below), they
also cannot use the system.
Changing projects takes immediate effect. The default for
projects is none.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
xprojects
The xprojects list contains all projects which are granted
access to Grid Engine. User belonging to one of these pro-
jects cannot use Grid Engine. If users belong to projects in
the projects list (see above) and the xprojects list, they
also cannot use the system.
Changing xprojects takes immediate effect. The default for
xprojects is none.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local
configuration.
load_report_time
System load is reported periodically by the execution dae-
mons to sge_qmaster(8). The parameter load_report_time
defines the time interval between load reports.
Each sge_execd(8) may use a different load report time.
Changing load_report_time will take immediate effect.
Note: Be careful when modifying load_report_time. Reporting
load too frequently might block sge_qmaster(8) especially if
the number of execution hosts is large. Moreover, since the
system load typically increases and decreases smoothly, fre-
quent load reports hardly offer any benefit.
The default for load_report_time is 40 seconds.
The global configuration entry for this value may be
overwritten by the execution host local configuration.
reschedule_unknown
Determines whether jobs on hosts in unknown state are
rescheduled and thus sent to other hosts. Hosts are
registered as unknown if sge_master(8) cannot establish con-
tact to the sge_execd(8) on those hosts (see max_unheard ).
Likely reasons are a breakdown of the host or a breakdown of
the network connection in between, but also sge_execd(8) may
not be executing on such hosts.
In any case, Grid Engine can reschedule jobs running on such
hosts to another system. reschedule_unknown controls the
time which Grid Engine will wait before jobs are rescheduled
after a host became unknown. The time format specification
is hh:mm:ss. If the special value 00:00:00 is set, then jobs
will not be rescheduled from this host.
Rescheduling is only initiated for jobs which have activated
the rerun flag (see the -r y option of qsub(1) and the rerun
option of queue_conf(5)). Parallel jobs are only
rescheduled if the host on which their master task executes
is in unknown state. Checkpointing jobs will only be
rescheduled when the when option of the corresponding check-
pointing environment contains an appropriate flag. (see
checkpoint(5)). Interactive jobs (see qsh(1), qrsh(1),
qtcsh(1)) are not rescheduled.
The default for reschedule_unknown is 00:00:00
The global configuration entry for this value may be over
written by the execution host local configuration.
max_unheard
If sge_qmaster(8) could not contact or was not contacted by
the execution daemon of a host for max_unheard seconds, all
queues residing on that particular host are set to status
unknown. sge_qmaster(8), at least, should be contacted by
the execution daemons in order to get the load reports.
Thus, max_unheard should by greater than the
load_report_time (see above).
Changing max_unheard takes immediate effect. The default
for max_unheard is 2 minutes 30 seconds.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
loglevel
This parameter specifies the level of detail that Grid
Engine components such as sge_qmaster(8) or sge_execd(8) use
to produce informative, warning or error messages which are
logged to the messages files in the master and execution
daemon spool directories (see the description of the
execd_spool_dir parameter above). The following message lev-
els are available:
log_err
All error events being recognized are logged.
log_warning
All error events being recognized and all detected
signs of potentially erroneous behavior are logged.
log_info
All error events being recognized, all detected signs
of potentially erroneous behavior and a variety of
informative messages are logged.
Changing loglevel will take immediate effect.
The default for loglevel is log_info.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
max_aj_instances
This parameter defines the maximum amount of array task to
be scheduled to run simultaneously per array job. An
instance of an array task will be created within the master
daemeon when it gets a start order from the scheduler. The
instance will be destroyed when the array task finishes.
Thus the parameter provides control mainly over the memory
consumption of array jobs in the master and scheduler dae-
mon. It is most useful for very large clusters and very
large array jobs. The default for this parameter is 2000.
The value 0 will deactivate this limit and will allow the
scheduler to start as many array job tasks as suitable
resources are available in the cluster.
Changing max_aj_instances will take immediate effect.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
max_aj_tasks
This parameter defines the maximum number of array job tasks
within an array job. sge_qmaster(8) will reject all array
job submissions which request more than max_aj_tasks array
job tasks. The default for this parameter is 75000. The
value 0 will deactivate this limit.
Changing max_aj_tasks will take immediate effect.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
max_u_jobs
The number of active (not finished) jobs which each Grid
Engine user can have in the system simultaneously is con-
trolled by this parameter. A value greater than 0 defines
the limit. The default value 0 means "unlimited". If the
max_u_jobs limit is exceeded by a job submission then the
submission command exits with exit status 25 and an
appropriate error message.
Changing max_u_jobs will take immediate effect.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
max_jobs
The number of active (not finished) jobs simultaneously
allowed in Grid Engine is controlled by this parameter. A
value greater than 0 defines the limit. The default value 0
means "unlimited". If the max_jobs limit is exceeded by a
job submission then the submission command exits with exit
status 25 and an appropriate error message.
Changing max_jobs will take immediate effect.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
enforce_project
If set to true, users are required to request a project
whenever submitting a job. See the -P option to qsub(1) for
details.
Changing enforce_project will take immediate effect. The
default for enforce_project is false.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
enforce_user
If set to true, a user(5) must exist to allow for job sub-
mission. Jobs are rejected if no corresponding user exists.
If set to auto, a user(5) object for the submitting user
will automatically be created during job submission, if one
does not already exist. The auto_user_oticket,
auto_user_fshare, auto_user_default_project, and
auto_user_delete_time configuration parameters will be used
as default attributes of the new user(5) object.
Changing enforce_user will take immediate effect. The
default for enforce_user is false.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
auto_user_oticket
The number of override tickets to assign to automatically
created user(5) objects. User objects are created automati-
cally if the enforce_user attribute is set to auto.
Changing auto_user_oticket will affect any newly created
user objects, but will not change user objects created in
the past.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
auto_user_fshare
The number of functional shares to assign to automatically
created user(5) objects. User objects are created automati-
cally if the enforce_user attribute is set to auto.
Changing auto_user_fshare will affect any newly created user
objects, but will not change user objects created in the
past.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
auto_user_default_project
The default project to assign to automatically created
user(5) objects. User objects are created automatically if
the enforce_user attribute is set to auto.
Changing auto_user_default_project will affect any newly
created user objects, but will not change user objects
created in the past.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
auto_user_delete_time
The number of seconds of inactivity after which automati-
cally created user(5) objects will be deleted. User objects
are created automatically if the enforce_user attribute is
set to auto. If the user has no active or pending jobs for
the specified amount of time, the object will automatically
be deleted. A value of 0 can be used to indicate that the
automatically created user object is permanent and should
not be automatically deleted.
Changing auto_user_delete_time will affect the deletion time
for all users with active jobs.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
set_token_cmd
This parameter is only present if your Grid Engine system is
licensed to support AFS.
Set_token_cmd points to a command which sets and extends AFS
tokens for Grid Engine jobs. In the standard Grid Engine AFS
distribution, it is supplied as a script which expects two
command line parameters. It reads the token from STDIN,
extends the token's expiration time and sets the token:
<set_token_cmd> <user> <token_extend_after_seconds>
As a shell script this command will call the programs:
- SetToken
- forge
which are provided by your distributor as source code. The
script looks as follows:
--------------------------------
#!/bin/sh
# set_token_cmd
forge -u $1 -t $2 | SetToken
--------------------------------
Since it is necessary for forge to read the secret AFS
server key, a site might wish to replace the set_token_cmd
script by a command, which connects to a self written daemon
at the AFS server. The token must be forged at the AFS
server and returned to the local machine, where SetToken is
executed.
Changing set_token_cmd will take immediate effect. The
default for set_token_cmd is none.
The global configuration entry for this value may be
overwritten by the execution host local configuration.
pag_cmd
This parameter is only present if your Grid Engine system is
licensed to support AFS.
The path to your pagsh is specified via this parameter. The
sge_shepherd(8) process and the job run in a pagsh. Please
ask your AFS administrator for details.
Changing pag_cmd will take immediate effect. The default
for pag_cmd is none.
The global configuration entry for this value may be
overwritten by the execution host local configuration.
token_extend_time
This parameter is only present if your Grid Engine system is
licensed to support AFS.
the time period for which AFS tokens are periodically
extended. Grid Engine will call the token extension 30
minutes before the tokens expire until jobs have finished
and the corresponding tokens are no longer required.
Changing token_extend_time will take immediate effect. The
default for token_extend_time is 24:0:0, i.e. 24 hours.
The global configuration entry for this value may be
overwritten by the execution host local configuration.
gid_range
The gid_range is a comma separated list of range expressions
of the form n-m (n as well as m being integer numbers
greater than 99), where m is an abbreviation for m-m. These
numbers are used in sge_execd(8) to identify processes
belonging to the same job.
Each sge_execd(8) may use a separate set up group ids for
this purpose. All number in the group id range have to be
unused supplementary group ids on the system, where the
sge_execd(8) is started.
Changing gid_range will take immediate effect. There is no
default for gid_range. The administrator will have to assign
a value for gid_range during installation of Grid Engine.
The global configuration entry for this value may be
overwritten by the execution host local configuration.
qmaster_params
A list of additional parameters can be passed to the Grid
Engine qmaster. The following values are recognized:
ENABLE_FORCED_QDEL
If this parameter is set, non-administrative users can
foce deletion of their own jobs via the -f option of
qdel(1). Without this parameter, forced deletion of
jobs is only allowed by the Grid Engine manager or
operator.
Note: Forced deletion for jobs is executed differently
depending on whether users are Grid Engine administra-
tors or not. In case of administrative users, the jobs
are removed from the internal database of Grid Engine
immediately. For regular users, the equivalent of a
normal qdel(1) is executed first, and deletion is
forced only if the normal cancellation was unsuccess-
ful.
FORBID_RESCHEDULE
If this parameter is set, re-queuing of jobs cannot be
initiated by the job script which is under control of
the user. Without this parameter jobs returning the
value 99 are rescheduled. This can be used to cause the
job to be restarted at a different machine, for
instance if there are not enough resources on the
current one.
FORBID_APPERROR
If this parameter is set, the application cannot set
itself to error state. Without this parameter jobs
returning the value 100 are set to error state (and
therefore can be manually rescheduled by clearing the
error state). This can be used to set the job to error
state when a starting condition of the application is
not fulfilled before the application itself has been
started, or when a clean up procedure (e.g. in the epi-
log) decides that it is necessary to run the job again,
by returning 100 in the prolog, pe_start, job script,
pe_stop or epilog script.
DISABLE_AUTO_RESCHEDULING
If set to "true" or "1", the reschedule_unknown parame-
ter is not taken into account.
MAX_DYN_EC
Sets the max number of dynamic event clients (as used
by qsub -sync y and by Grid Engine DRMAA API library
sessions). The default is set to 99. The number of
dynamic event clients should not be bigger than half of
the number of file descriptors the system has. The
number of file descriptors are shared among the connec-
tions to all exec hosts, all event clients, and file
handles that the qmaster needs.
MONITOR_TIME
Specifies the time interval when the monitoring infor-
mation should be printed. The monitoring is disabled by
default and can be enabled by specifying an interval.
The monitoring is per thread and is written to the mes-
sages file or displayed by the "qping -f" command line
tool. Example: MONITOR_TIME=0:0:10 generates and prints
the monitoring information approximately every 10
seconds. The specified time is a guideline only and not
a fixed interval. The interval that is actually used is
printed. In this example, the interval could be any-
thing between 9 seconds and 20 seconds.
LOG_MONITOR_MESSAGE
Monitoring information is logged into the messages
files by default. This information can be accessed via
by qping(1). If monitoring is always enabled, the mes-
sages files can become quite large. This switch dis-
ables logging into the messages files, making qping -f
the only source of monitoring data.
PROF_SIGNAL
Profiling provides the user with the possibility to get
system measurements. This can be useful for debugging
or optimisation of the system. The profiling output
will be done within the messages file.
Enables the profiling for qmaster signal thread. (e.g.
PROF_SIGNAL=true)
PROF_MESSAGE
Enables the profiling for qmaster message thread.
(e.g. PROF_MESSAGE=true)
PROF_DELIVER
Enables the profiling for qmaster event deliver thread.
(e.g. PROF_DELIVER=true)
PROF_TEVENT
Enables the profiling for qmaster timed event thread.
(e.g. PROF_TEVENT=true)
Please note, that the cpu utime and stime values contained
in the profiling output are not per thread cpu times. These
cpu usage statistics are per process statistics. So the
printed profiling values for cpu mean "cpu time consumed by
sge_qmaster (all threads) while the reported profiling level
was active".
STREE_SPOOL_INTERVAL
Sets the time interval for spooling the sharetree
usage. The default is set to 00:04:00. The setting
accepts colon-separated string or seconds. There is no
setting to turn the sharetree spooling off. (e.g.
STREE_SPOOL_INTERVAL=00:02:00)
Changing qmaster_params will take immediate effect. The
default for qmaster_params is none.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
execd_params
This is foreseen for passing additional parameters to the
Grid Engine execution daemon. The following values are
recognized:
ACCT_RESERVED_USAGE
If this parameter is set to true, for reserved usage is
used for the accounting entries cpu, mem and io instead
of the measured usage.
ENABLE_WINDOMACC
If this parameter is set to true, Windows Domain
accounts (WinDomAcc) are used on Windows hosts. These
accounts require the use of sgepasswd(1) (see also
sgepasswd(5)). If this parameter is set to false or is
not set, local Windows accounts are used. On non-
Windows hosts, this parameter is ignored.
KEEP_ACTIVE
This value should only be set for debugging purposes.
If set to true, the execution daemon will not remove
the spool directory maintained by sge_shepherd(8) for a
job.
PTF_MIN_PRIORITY, PTF_MAX_PRIORITY
The maximum/minimum priority which Grid Engine will
assign to a job. Typically this is a negative/positive
value in the range of -20 (maximum) to 19 (minimum) for
systems which allow setting of priorities with the
nice(2) system call. Other systems may provide dif-
ferent ranges.
The default priority range (varies from system to sys-
tem) is installed either by removing the parameters or
by setting a value of -999.
See the "messages" file of the execution daemon for the
predefined default value on your hosts. The values are
logged during the startup of the execution daemon.
PROF_EXECD
Enables the profiling for the execution daemon. (e.g.
PROF_EXECD=true)
NOTIFY_KILL
The parameter allows you to change the notification
signal for the signal SIGKILL (see -notify option of
qsub(1)). The parameter either accepts signal names
(use the -l option of kill(1)) or the special value
none. If set to none, no notification signal will be
sent. If it is set to TERM, for instance, or another
signal name then this signal will be sent as notifica-
tion signal.
NOTIFY_SUSP
With this parameter it is possible to modify the notif-
ication signal for the signal SIGSTOP (see -notify
parameter of qsub(1)). The parameter either accepts
signal names (use the -l option of kill(1)) or the spe-
cial value none. If set to none, no notification signal
will be sent. If it is set to TSTP, for instance, or
another signal name then this signal will be sent as
notification signal.
SHARETREE_RESERVED_USAGE
If this parameter is set to true, reserved usage is
taken for the Grid Engine share tree consumption
instead of measured usage.
Changing execd_params will take immediate effect. The
default for execd_params is none.
The global configuration entry for this value may be
overwritten by the execution host local configuration.
USE_QSUB_GID
If this parameter is set to true, the primary group id
being active when a job was submitted will be set to
become the primary group id for job execution. If the
parameter is not set, the primary group id as defined
for the job owner in the execution host passwd(5) file
is used.
The feature is only available for jobs submitted via
qsub(1), qrsh(1), qmake(1) and qtcsh(1). Also, it only
works for qrsh(1) jobs (and thus also for qtcsh(1) and
qmake(1)) if rsh and rshd components are used which are
provided with Grid Engine (i.e., the rsh_daemon and
rsh_command parameters may not be changed from the
default).
INHERIT_ENV
This parameter indicates whether the shepherd should
allow the environment inherited by the execution daemon
from the shell that started it to be inherited by the
job it's starting. When true, any environment variable
that is set in the shell which starts the execution
daemon at the time the execution daemon is started will
be set in the environment of any jobs run by that exe-
cution daemon, unless the environment variable is
explictly overridden, such as PATH or LOGNAME. If set
to false, each job starts with only the environment
variables that are explicitly passed on by the execu-
tion daemon, such as PATH and LOGNAME. The default
value is true.
SET_LIB_PATH
This parameter tells the execution daemon whether to
add the Grid Engine shared library directory to the
library path of executed jobs. If set to true, and
INHERIT_ENV is also set to true, the Grid Engine shared
library directory will be prepended to the library path
which is inherited from the shell which started the
execution daemon. If INHERIT_ENV is set to false, the
library path will contain only the Grid Engine shared
library directory. If set to false, and INHERIT_ENV is
set to true, the library path exported to the job will
be the one inherited from the shell which started the
execution daemon. If INHERIT_ENV is also set to false,
the library path will be empty. After the execution
daemon has set the library path, it may be further
altered by the shell in which the job is executed, or
by the job script itself. The default value for
SET_LIB_PATH is false.
reporting_params
Used to define the behaviour of reporting modules in the
Grid Engine qmaster. Changes to the reporting_params takes
immediate effect. The following values are recognized:
accounting
If this parameter is set to true, the accounting file
is written. The accounting file is prerequisite for
using the qacct command.
reporting
If this parameter is set to true, the reporting file is
written. The reporting file contains data that can be
used for monitoring and analysis, like job accounting,
job log, host load and consumables, queue status and
consumables and sharetree configuration and usage.
Attention: Depending on the size and load of the clus-
ter, the reporting file can become quite large. Only
activate the reporting file if you have a process run-
ning that will consume the reporting file! See report-
ing(5) for further information about format and con-
tents of the reporting file.
flush_time
Contents of the reporting file are buffered in the Grid
Engine qmaster and flushed at a fixed interval. This
interval can be configured with the flush_time parame-
ter. It is specified as a time value in the format
HH:MM:SS. Sensible values range from a few seconds to
one minute. Setting it too low may slow down the qmas-
ter. Setting it too high will make the qmaster consume
large amounts of memory for buffering data.
accounting_flush_time
Contents of the accounting file are buffered in the
Grid Engine qmaster and flushed at a fixed interval.
This interval can be configured with the
accounting_flush_time parameter. It is specified as a
time value in the format HH:MM:SS. Sensible values
range from a few seconds to one minute. Setting it too
low may slow down the qmaster. Setting it too high will
make the qmaster consume large amounts of memory for
buffering data. Setting it to 00:00:00 will disable
accounting data buffering; as soon as data is gen-
erated, it will be written to the accounting file. If
this parameter is not set, the accounting data flush
interval will default to the value of the flush_time
parameter.
joblog
If this parameter is set to true, the reporting file
will contain job logging information. See reporting(5)
for more information about job logging.
sharelog
The Grid Engine qmaster can dump information about
sharetree configuration and use to the reporting file.
The parameter sharelog sets an interval in which share-
tree information will be dumped. It is set in the for-
mat HH:MM:SS. A value of 00:00:00 configures qmaster
not to dump sharetree information. Intervals of several
minutes up to hours are sensible values for this param-
eter. See reporting(5) for further information about
sharelog.
finished_jobs
Grid Engine stores a certain number of just finished jobs to
provide post mortem status information. The finished_jobs
parameter defines the number of finished jobs being stored.
If this maximum number is reached, the eldest finished job
will be discarded for every new job being added to the fin-
ished job list.
Changing finished_jobs will take immediate effect. The
default for finished_jobs is 0.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
qlogin_daemon
This parameter specifies the executable that is to be
started on the server side of a qlogin(1) request. Usually
this is the fully qualified pathname of the system's telnet
daemon. If no value is given, a specialized Grid Engine com-
ponent is used.
Changing qlogin_daemon will take immediate effect. The
default for qlogin_daemon is none.
The global configuration entry for this value may be
overwritten by the execution host local configuration.
qlogin_command
This is the command to be executed on the client side of a
qlogin(1) request. Usually this is the fully qualified
pathname of the systems's telnet client program. If no value
is given, a specialized Grid Engine component is used. It is
automatically started with the target host and port number
as parameters.
Changing qlogin_command will take immediate effect. The
default for qlogin_command is none.
The global configuration entry for this value may be
overwritten by the execution host local configuration.
rlogin_daemon
This parameter specifies the executable that is to be
started on the server side of a qrsh(1) request without a
command argument to be executed remotely. Usually this is
the fully qualified pathname of the system's rlogin daemon.
If no value is given, a specialized Grid Engine component is
used.
Changing rlogin_daemon will take immediate effect. The
default for rlogin_daemon is none.
The global configuration entry for this value may be
overwritten by the execution host local configuration.
rlogin_command
This is the command to be executed on the client side of a
qrsh(1) request without a command argument to be executed
remotely. Usually this is the fully qualified pathname of
the system's rlogin client program. If no value is given, a
specialized Grid Engine component is used. The command is
automatically started with the target host and port number
as parameters like required for telnet(1). The Grid Engine
rlogin client has been extended to accept and use the port
number argument. You can only use clients, such as ssh,
which also understand this syntax.
Changing rlogin_command will take immediate effect. The
default for rlogin_command is none.
The global configuration entry for this value may be
overwritten by the execution host local configuration.
rsh_daemon
This parameter specifies the executable that is to be
started on the server side of a qrsh(1) request with a com-
mand argument to be executed remotely. Usually this is the
fully qualified pathname of the system's rsh daemon. If no
value is given, a specialized Grid Engine component is used.
Changing rsh_daemon will take immediate effect. The default
for rsh_daemon is none.
The global configuration entry for this value may be
overwritten by the execution host local configuration.
rsh_command
This is the command to be executed on the client side of a
qrsh(1) request with a command argument to be executed
remotely. Usually this is the fully qualified pathname of
the system's rsh client program. If no value is given, a
specialized Grid Engine component is used. The command is
automatically started with the target host and port number
as parameters like required for telnet(1) plus the command
with its arguments to be executed remotely. The Grid Engine
rsh client has been extended to accept and use the port
number argument. You can only use clients, such as ssh,
which also understand this syntax.
Changing rsh_command will take immediate effect. The default
for rsh_command is none.
The global configuration entry for this value may be
overwritten by the execution host local configuration.
delegated_file_staging
This flag must be set to "true" when the prolog and epilog
are ready for delegated file staging, so that the DRMAA
attribute 'drmaa_transfer_files' is supported. To establish
delegated file staging, use the variables beginning with
"$fs_..." in prolog and epilog to move the input, output and
error files from one host to the other. When this flag is
set to "false", no file staging is available for the DRMAA
interface. File staging is currently implemented only via
the DRMAA interface. When an error occurs while moving the
input, output and error files, return error code 100 so that
the error handling mechanism can handle the error correctly.
(See also FORBID_APPERROR).
reprioritize
This flag enables or disables the reprioritization of jobs
based on their ticket amount. The reprioritize_interval in
sched_conf(5) takes effect only if reprioritize is set to
true. To turn off job reprioritization, the reprioritize
flag must be set to false and the reprioritize_interval to 0
which is the default.
This value is a global configuration parameter only. It can-
not be overridden by the execution host local configuration.
SEE ALSO
sge_intro(1), csh(1), qconf(1), qsub(1), rsh(1), sh(1),
getpwnam(3), drmaa_attributes(3), queue_conf(5),
sched_conf(5), sge_execd(8), sge_qmaster(8),
sge_shepherd(8), cron(8), Grid Engine Installation and
Administration Guide.
COPYRIGHT
See sge_intro(1) for a full statement of rights and permis-
sions.
Man(1) output converted with
man2html