This document describes a project that extends N1 Grid Engine with Advance Reservation capabilities. In the specification document "Resource Reservation and Backfilling" Advance Reservation (AR) was defined as:
A reservation (possibly independent of a particular job) that can be requested by a user or administrator and gets created by the scheduler. The reservation causes the associated resources be blocked for other jobs.The GRAAP-WG Advance Reservation definition is:
A advance reservation is a possibly limited or restricted delegation of a particular resource capability over a defined time interval, obtained by the requester from the resource owner through a negotiation process.A possibly better way to explain what AR will be adding to Grid Engine is by using the analogy of a flight reservation system: With Grid Engine 6.0 Resource Reservation (RR) capabilities an administrator can guarantee that passenger's will get their flights in the order they arrive at the airport. This is sufficient for last-minute travelers, yet it does not allow adequate travel planning. The purpose of AR is to fill this gap, so that administrators can arrange a passengers travel in advance based on an allocation schema that gets considered by the Grid Engine scheduler.
End users are allowed to create, delete, show and use an AR.
The GUI will provide the same functionality as the CLI interface
JGDI will provide the same functionality as the CLI interface
The clients are uses by end users for requesting, deleting, showing and using an AR. It's also desired to modify an AR.
The new clients are:
command | description |
---|---|
qrsub | create a new AR |
qralter (desired) | modify AR |
qrdel | delete an AR |
qrstat | view status of ARs |
Enhanced clients:
command | description |
---|---|
qsub | submit a job |
qstat | show the status of jobs and queues |
qmon | submit/delete/show AR |
switch/argument | description |
---|---|
-help | print this help |
-a date_time | start time in [[CC]YY]MMDDhhmm[.SS] |
-e date_time | end time in [[CC]YY]MMDDhhmm[.SS] |
-d time | duration in TIME format |
-w e/v | validate availability of AR request, default e |
-N name | AR name |
-A account_string | AR name in accounting record |
-l resource_list | request the given resources |
-u wc_user | access list |
-q wc_queue_list | reserve in queue(s) |
-now | reserve in queues with qtype interactive |
-pe pe_name slot_range | reserve slot range for parallel jobs |
-ckpt ckpt-name | reserve in queue with ckpt method |
-m b/e/a/n | define mail notification events |
-M user[@host],... | notify these e-mail addresses |
-he yes/no | hard error handling |
-alloc allocation_rule slot_range (desired) | reserve more than one slots |
The most of the options are already specified for qsub and defined in submit(1)
Additional switches are:
Specifies the end time for the Advance Reservation in [[CC]YY]MMDDhhmm[.SS] format (see -a option). The use of this switch is optional if the start time with the -a option and the duration with the -d option is requested.
Specifies the duration of the Advance Reservation in TIME format. Refer to queue_conf(5) for a format description. The use of this switch is optional if the start time with the -a option and the end time with the -e option is requested.
Specifies the access list for the new Advance Reservation. Only users defined in this list are allowed to request the AR handle for their jobs. By default only the user who requested the AR has access. A access list is differentiated from a user name by prefixing the group name with a '@' sign.
Defines or redefines under which circumstances mail is to be sent to the AR owner or to the users defined with the -M option described below. The option arguments have the following meaning:
flag | description |
---|---|
'b' | Mail is sent at the beginning of the AR |
'e' | Mail is sent at the end of the AR |
'a' | Mail is sent when the AR when goes into error state or is valid again |
'n' | No mail is sent |
Specifies the behaviour when the AR goes into error state. A hard error means as long as the AR is in error state no jobs using the reservation will be scheduled. If soft error is specified the reservation stays usable with the remaining resources.
By default the soft error handling is used.
It's desired to implement IZ 285 that describes a switch to define the allocation rule and how many slots are requested at submission time. The multiplication will also happen but the slots are now separated from the other resources.
Examples:
* Request 2 slots per host, on 2 quad CPU hosts, with two compiler licenses
-alloc 2 4 -l license=1/2 |
-alloc $fill_up 10 |
Reserve an slot in queue all.q on host1 or host2 or host3
qrsub -q all.q -l "h=host1|host2|host3" -u $user -a 01121200 -d 1:0:0 |
qrsub -q "*@host1,*@host2,*@host3" -u $user -a 01121200 -d 1:0:0 |
Reserve 4 slots on a host with arch=sol-sparc64
qrsub -pe alloc_pe_slots 4 -l h=sol-sparc64 -u $user -a 01121200 -d 1:0:0 |
Currently not decided to implement
switch/argument | description |
---|---|
-help | print this help |
-f | force action (jobs referring to AR will be deleted) |
ar_list | delete all ARs given in list |
Jobs referring to a Advance Reservation tagged for deletion will also be removed. Only if all jobs referring an AR are removed from the N1 Grid Engine database the Reservation will also be removed.
Command Line Switches:
switch/argument | description |
---|---|
-help | print this help |
-ar ar_id | show scheduler advance reservation information |
-u user_list | view only ARs requested by this user |
-explain | explain error |
-xml | print output in XML format |
The string $user is a placeholder for the current user name. An asterisk "*" can be used as user name wildcard to request any users ARs be displayed.
Possible reasons are:
The output format for the error reasons is one line per reason.
% qrstat AR-ID name user state start at end at duration --------------------------------------------------------------------------------------- 192 project_xy user1 r 12/14/2006 14:47:23 12/14/2006 14:57:33 0:10:10 193 user2 w 12/18/2006 10:00:00 12/19/2006 10:00:10 24:0:10 % qrstat -ar 193 ============================================================== ar_number: 193 submission_time: Mon Nov 27 17:11:34 2006 owner: user1 acl_list: user1,user2 start: Mon Dec 18 10:00:00 2006 end: Tue Dec 19 10:00:10 2006 duration: 24:0:10 slots: host1=2,host2=1 complex_values: host1=myapp=2,host2=myapp=1 ar_name: ... |
switch/argument | description |
---|---|
-ar | request AR with the ar_id |
ar_id | positive integer |
Modifying the AR-ID with qalter will be denied if the job is already running.
switch/argument | description |
---|---|
-j | show scheduler job information |
The following example illustrates the output. In the example 20 slots reserved on host brag and 1 AR job runs in the AR.
% qstat queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- all.q@brag BIPC 20/1/20 0.02 darwin-x86 16 0.55500 Sleeper rd141302 r 11/28/2006 11:48:26 1 |
The "Submit Jobs" and the "Job Control" mask need to be enhanced for the AR-ID.
Qmon will get two new masks:
Changing max_advance_reservations will take immediate effect.
This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.
The Advance Reservation Object represents the requested AR for the N1 Grid Engine System. The request is stored in the internal Qmaster Database like all other object (for example. usersets, complexes).
![]() |
During the submit at first some basic values are validated.
To guarantee all jobs are removed from the cluster when AR end time is reached it is necessary to consider the DURATION_OFFSET for Advance Reservation also. This means all jobs submitted to a AR will have a resulting runtime limit of AR duration - DURATION_OFFSET. Jobs requesting a longer runtime will not be scheduled. The AR requester needs to keep this in mind when he creates a new AR.
It's necessary to restrict the users that are allowed to make an AR. This is done by a user list called 'arusers'. Only managers or users contained in the 'arusers' user list are allowed to create an ARs.
The 'arusers' user list will be created at installation time and the SGE admin user will be added to the list.
The resource selection will be the same as for a regular sequential or parallel job. At first the best suited hosts are selected and at second the desired amount of resources are reserved.
For a Advance Reservation the following restrictions and enhancements for selection the resources are necessary
For example:
% qconf -sq all.q | grep xuser_list denied_users % qconf -su denied_users | grep entries entries user3 % qrsub -u user1,user2,user3 -q all.q -a .... Error: No queue instances to reserve |
Conflicts can happen if jobs will not finish at the desired end of scheduled time slot. This is the case for jobs with no run time limit (h_rt) because jobs requested h_rt will be deleted automatically by Grid Engine if they exceed the requested time.
The solution to avoid such conflicts is by keeping non-runtime-limit jobs diverged from AR jobs. Because every job has a implicit slot count that refers to a queue instance this can be done by not reserving a AR on hosts where jobs with no run-time-limit are running and at the same time by not scheduling a non-runtime-limit job on a queue instance with a AR.
The following example illustrates the behavior:
% qstat -f queuename qtype used/tot. load_avg arch states ---------------------------------------------------------------------------- all.q@host1 BIPC 1/1 0.01 sol-amd64 3001 0.55500 Sleeper rd141302 r 12/14/2006 10:20:47 1 ---------------------------------------------------------------------------- all.q@host2 BIPC 0/1 0.09 sol-amd64 % qrsub -a 12141200 -d 0:30:0 -u user1 -h host1 denied: Reservation can't be granted % qrsub -a 12141200 -d 0:30:0 -u user1 -h host2 Your reservation 1 has been granted % qsub -w e -l h=host2 job_script Unable to run job: error: no suitable queues. Exiting. |
The reason why we are able to ignore RQS reservation and resource usage in conjunction with AR is that Quotas does not represent the capacity of a resource. The capacity is defined a global/queue/host level and must of course honored for AR requests. This means ignoring RQS will not lead in a resource overload, only the quota share for a special request may be exceed for the time the AR time window is open.
The following examples illustrates the behavior:
|
An already granted AR would become invalid for several reasons like configuration changes on reserved hosts, user set changes, or queue changes. All of these foreseeable events need to be rejected. A list of foreseeable events is:
The cases that are not foreseeable are the outage of a host or a queue instance error. In such cases the affected AR goes into error state. If error mail sending is enabled the AR requester and all emails specified with the -M option will get a mail when the AR became invalid (earliest at AR start time when the error is detected) and when the AR is satisfied again.
The error handling can be specified at AR submit time. A hard error blocks the fault AR and the scheduled does not dispatch the referring AR-jobs. With soft error the AR stays usable with the remaining resources.
When the AR end time is reached at first all jobs referring to the AR will be deleted and at second the AR itself will be deleted. No jobs can request the AR handle any longer.
State | Description |
w | waiting - granted but start time not reached |
r | running - start time reached |
x | exited - end time reached and doing cleanup |
d | deleted - manual deletion |
e | error - AR became invalid |
This request adds the advance reservations in the specified list. The list elements are full specified advance reservations. The request is used for implementing qrsub command.
This request deletes all advance reservation in the specified list. The list elements needs only to specify the name or ID of the advance reservation to be removed. The request is used for implementing the qrdel command.
This event is sent each time when a new advance reservation has been created. It contains the full advance reservation object, but no usage information.
This event is sent each time when an existing advance reservation is removed. The event contains only the name of the advance reservation to be removed.
This event is sent each time when an existing advance reservation has changed. It contains the full advance reservation object.
TBD
The AR Job Object is not a new Component, it's a enhancement of the regular N1GE Job Object to deal with Advance Reservations. All previous 6.1 job properties are still valid will work for AR and non AR jobs.
![]() |
At job submit time some additional verifications are done. If one of the verifications fails the job will be rejected.
The verifications are:
Jobs requesting an AR will only be scheduled if the AR start time is already reached, which means the AR is active and in state running.
Additionally it needs to be ensured the resources requested by the job are all reserved and free. If other jobs are running using the AR and the resources are already in use the job will not dispatched.
To be sure jobs requesting an AR can not use more resources than reserved by the AR it's necessary to debit the AR job usage. This is done before the job starts. At job end the usage needs to be undebited.
All jobs (pending or running) will be deleted when the AR refereed by the job ends or will be deleted.
TBD
After a AR end it's necessary to detect if a resource was reserved but not used over the complete time, only used partial or not used at all. Thus we need to add billing capabilities by enhancing the current accounting and reporting files. In addition dbwriter and arco needs to be enhanced.
The Accounting Component is part of the qmaster.
To get an overview what finished jobs were running in a previous AR the job entry in the accounting field needs an additional field for the AR-ID.
TBD