Functional Specification Document for 6.2 Advance Reservation

Roland Dittel, Andreas Haas
12 January 2007
Work in progress

1 Introduction

This document describes a project that extends N1 Grid Engine with Advance Reservation capabilities. In the specification document "Resource Reservation and Backfilling" Advance Reservation (AR) was defined as:

        A reservation (possibly independent of a particular job) that can 
        be requested by a user or administrator and gets created by the 
        scheduler. The reservation causes the associated resources be blocked
        for other jobs.
The GRAAP-WG Advance Reservation definition is:
        A advance reservation is a possibly limited or restricted delegation
        of a particular resource capability over a defined time interval,
        obtained by the requester from the resource owner through a negotiation
        process.
A possibly better way to explain what AR will be adding to Grid Engine is by using the analogy of a flight reservation system: With Grid Engine 6.0 Resource Reservation (RR) capabilities an administrator can guarantee that passenger's will get their flights in the order they arrive at the airport. This is sufficient for last-minute travelers, yet it does not allow adequate travel planning. The purpose of AR is to fill this gap, so that administrators can arrange a passengers travel in advance based on an allocation schema that gets considered by the Grid Engine scheduler.

2 Project Overview

2.1 Project Aim

Aim is to enhance Grid Engine with Advance Reservation capabilities. Cornerstones for the enhancement are

4 Functional Definition

4.3 Diagnostics

4.4 User Experience

4.4.1 Command Line (CLI)

End users are allowed to create, delete, show and use an AR.

4.4.2 Graphical User Interface (GUI)

The GUI will provide the same functionality as the CLI interface

4.4.3 JGDI (API)

JGDI will provide the same functionality as the CLI interface

5.1 Component Clients

5.1.1 Overview

The clients are uses by end users for requesting, deleting, showing and using an AR. It's also desired to modify an AR.

The new clients are:

command description
qrsub create a new AR
qralter (desired) modify AR
qrdel delete an AR
qrstat view status of ARs

Enhanced clients:

command description
qsub submit a job
qstat show the status of jobs and queues
qmon submit/delete/show AR

5.1.2 Functionality

5.1.3 Interfaces

5.1.3.1 qrsub - submit a advance reservation to N1 Grid Engine

5.1.3.1.1 Overview
Command Line Switches:
switch/argument description
-help print this help
-a date_time start time in [[CC]YY]MMDDhhmm[.SS]
-e date_time end time in [[CC]YY]MMDDhhmm[.SS]
-d time duration in TIME format
-w e/v validate availability of AR request, default e
-N name AR name
-A account_string AR name in accounting record
-l resource_list request the given resources
-u wc_user access list
-q wc_queue_list reserve in queue(s)
-now reserve in queues with qtype interactive
-pe pe_name slot_range reserve slot range for parallel jobs
-ckpt ckpt-name reserve in queue with ckpt method
-m b/e/a/n define mail notification events
-M user[@host],... notify these e-mail addresses
-he yes/no hard error handling
-alloc allocation_rule slot_range (desired) reserve more than one slots

5.1.3.1.2 Description
Qrsub submits a Advance Reservation to the N1 Grid Engine queuing system.

The most of the options are already specified for qsub and defined in submit(1)

Additional switches are:

Available for qrsub only.

Specifies the end time for the Advance Reservation in [[CC]YY]MMDDhhmm[.SS] format (see -a option). The use of this switch is optional if the start time with the -a option and the duration with the -d option is requested.

Available for qrsub only

Specifies the duration of the Advance Reservation in TIME format. Refer to queue_conf(5) for a format description. The use of this switch is optional if the start time with the -a option and the end time with the -e option is requested.

Behavior for qrsub

Specifies the access list for the new Advance Reservation. Only users defined in this list are allowed to request the AR handle for their jobs. By default only the user who requested the AR has access. A access list is differentiated from a user name by prefixing the group name with a '@' sign.

Defines or redefines under which circumstances mail is to be sent to the AR owner or to the users defined with the -M option described below. The option arguments have the following meaning:

flag description
'b' Mail is sent at the beginning of the AR
'e' Mail is sent at the end of the AR
'a' Mail is sent when the AR when goes into error state or is valid again
'n' No mail is sent

Specifies the behaviour when the AR goes into error state. A hard error means as long as the AR is in error state no jobs using the reservation will be scheduled. If soft error is specified the reservation stays usable with the remaining resources.

By default the soft error handling is used.

It's desired to implement IZ 285 that describes a switch to define the allocation rule and how many slots are requested at submission time. The multiplication will also happen but the slots are now separated from the other resources.

Examples:

* Request 2 slots per host, on 2 quad CPU hosts, with two compiler licenses

    -alloc 2 4 -l license=1/2
* Request 10 slots, as much as possible on one host
    -alloc $fill_up 10

5.1.3.1.2 Examples

Reserve an slot in queue all.q on host1 or host2 or host3

qrsub -q all.q -l "h=host1|host2|host3" -u $user -a 01121200 -d 1:0:0
qrsub -q "*@host1,*@host2,*@host3" -u $user -a 01121200 -d 1:0:0

Reserve 4 slots on a host with arch=sol-sparc64

qrsub -pe alloc_pe_slots 4 -l h=sol-sparc64 -u $user -a 01121200 -d 1:0:0

5.1.3.2 qralter

Currently not decided to implement

5.1.3.3 qrdel

5.1.3.3.1 Overview
Command Line Switches:
switch/argument description
-help print this help
-f force action (jobs referring to AR will be deleted)
ar_list delete all ARs given in list

5.1.3.3.2 Description
Qrdel provides a means for a operator, manager or user referenced in ar_users access list to delete one or more Advance Reservations. The AR identifiers can either AR-IDs or AR names. Qrdel deletes ARs in the order in which their identifiers are presented.

Jobs referring to a Advance Reservation tagged for deletion will also be removed. Only if all jobs referring an AR are removed from the N1 Grid Engine database the Reservation will also be removed.

Prints a listing of all options. Force action for ARs with jobs referring to the AR. The job(s) are deleted from the list of jobs registered at sge_qmaster(8) even if the sge_execd(8) controlling the job(s) does not respond to the delete request by the sge_qmaster(8). A list of ARs, which should be deleted.

5.1.3.4 qrstat

5.1.3.4.1 Overview

Command Line Switches:

switch/argument description
-help print this help
-ar ar_id show scheduler advance reservation information
-u user_list view only ARs requested by this user
-explain explain error
-xml print output in XML format

5.1.3.4.2 Description
Qrstat shows the current status of the granted N1 Grid Engine Advance Reservations. Selection options allow you to get information about specific ARs or users. Without any options qrstat will display an overview of all reservations.

Prints a list of all options. Displays various information for all ARs contained in the ar_list. The ar_list can contain ar_ids, ar_names or patterns. Displays information only for those ARs being requested by one of the users from the given list.

The string $user is a placeholder for the current user name. An asterisk "*" can be used as user name wildcard to request any users ARs be displayed.

Displays the reason for a Advance Reservation error state.

Possible reasons are:

The output format for the error reasons is one line per reason.

This option can be used with all other options and changes the output to XML. The used schemes are referenced in the XML output. The output is printed to stdout.

5.1.3.4.3 Examples

% qrstat
AR-ID   name       user         state start at            end at              duration
---------------------------------------------------------------------------------------
    192 project_xy user1        r     12/14/2006 14:47:23 12/14/2006 14:57:33 0:10:10
    193            user2        w     12/18/2006 10:00:00 12/19/2006 10:00:10 24:0:10

% qrstat -ar 193
==============================================================
ar_number:                  193
submission_time:            Mon Nov 27 17:11:34 2006
owner:                      user1
acl_list:                   user1,user2
start:                      Mon Dec 18 10:00:00 2006
end:                        Tue Dec 19 10:00:10 2006
duration:                   24:0:10
slots:                      host1=2,host2=1
complex_values:             host1=myapp=2,host2=myapp=1
ar_name:                    
...

5.1.3.5 qsub/qalter

5.1.3.5.1 Overview
Command Line Switches:
switch/argument description
-ar request AR with the ar_id
ar_id positive integer

5.1.3.5.2 Description

Selects the Advance Reservation to be used by the job.

Modifying the AR-ID with qalter will be denied if the job is already running.

5.1.3.6 qstat

5.1.3.5.1 Overview
Command Line Switches:
switch/argument description
-j show scheduler job information

5.1.3.5.2 Description

  1. qstat -j
    The qstat -j prints out all job data. Because the job object will now have a AR-ID requested for this job this must be also shown by 'qstat -j'
  2. qstat
    The qstat -f output is often to search for unused hosts. To represent if a host is reserved and can not be used by non-AR jobs the used/tot. field need to be enhanced to print out how many slots are reserved.

    The following example illustrates the output. In the example 20 slots reserved on host brag and 1 AR job runs in the AR.

    % qstat
    queuename                      qtype resv/used/tot. load_avg arch          states
    ---------------------------------------------------------------------------------
    all.q@brag                     BIPC  20/1/20        0.02     darwin-x86
         16 0.55500 Sleeper    rd141302     r     11/28/2006 11:48:26     1
    

5.1.3.7 qmon

The "Submit Jobs" and the "Job Control" mask need to be enhanced for the AR-ID.

Qmon will get two new masks:

  1. Submit Advance Reservation
    The mask will be similar to the "Submit Jobs" mask. The most of the input parameter are already used by the "Submit Jobs" mask and some others need to be added.
  2. Show Granted Advance Reservations
    The mask will be similar to the "Job Control" mask. The data will be the same as for the qrstat command.

5.1.3.8 qmaster configuration sge_conf(5)

The number of active (not finished) advance reservations simultaneously allowed in N1 Grid Engine is controlled by this parameter. If max_advance_reservations is set to "0" no reservations are allowed. if the max_advance_reservations limit is exceeded by a AR submission then the submission command exits with exit status 25 and an appropriate error massage.

Changing max_advance_reservations will take immediate effect.

This value is a global configuration parameter only. It cannot be overwritten by the execution host local configuration.

5.1.4 Other Requirements

5.2 Advance Reservation Object

5.2.1 Overview

The Advance Reservation Object represents the requested AR for the N1 Grid Engine System. The request is stored in the internal Qmaster Database like all other object (for example. usersets, complexes).

AR life time

5.2.2 Functionality

5.2.2.1 AR submit

5.2.2.1.1 Validate AR request

During the submit at first some basic values are validated.

  1. dissonant time window values
    If start_time, end_time and duration does not match together the request will be rejected. For example a start time of 12:00 PM, end time of 13:00 PM and a duration of 2 hours will be rejected.
  2. reduced granted time window due to DURATION_OFFSET
    The net time window of a granted AR is always reduced due to fixed schedule interval. Additionally the effective job runtime is always longer than the real job runtime because of start/stop overhead and prolog/epilog scripts. To reflect these longer times in the reservation schedule the parameter "DURATION_OFFSET" (see CR 6283308) was introduced.

    To guarantee all jobs are removed from the cluster when AR end time is reached it is necessary to consider the DURATION_OFFSET for Advance Reservation also. This means all jobs submitted to a AR will have a resulting runtime limit of AR duration - DURATION_OFFSET. Jobs requesting a longer runtime will not be scheduled. The AR requester needs to keep this in mind when he creates a new AR.

5.2.2.1.2 Access Control for creating new AR requests

It's necessary to restrict the users that are allowed to make an AR. This is done by a user list called 'arusers'. Only managers or users contained in the 'arusers' user list are allowed to create an ARs.

The 'arusers' user list will be created at installation time and the SGE admin user will be added to the list.

5.2.2.1.3 Selecting hosts and reserving resources

The resource selection will be the same as for a regular sequential or parallel job. At first the best suited hosts are selected and at second the desired amount of resources are reserved.

For a Advance Reservation the following restrictions and enhancements for selection the resources are necessary

  1. AR access list and queue acl_list
    Because every user of the access list requested for the AR with the '-u' option need to be able to run a job in the AR only those queue instances can be reserved where all users of the requested user list has access. During the AR granting this needs to be validated and if only one user of the list has no access to the queue instance, this queue instance will not considered for the reservation.

    For example:

    % qconf -sq all.q | grep xuser_list
    denied_users
    
    % qconf -su denied_users | grep entries
    entries user3
    
    % qrsub -u user1,user2,user3 -q all.q -a ....
    Error: No queue instances to reserve
    
  2. Avoiding conflicts with non runtime jobs
    To make AR behavior predictable it's necessary to have the reserved resources free at the time of the AR start. This can be done in conflict cases either by preempting the non-AR jobs or proactive by avoiding conflict cases. Because Grid Engine currently does not support preemption the later one is implemented.

    Conflicts can happen if jobs will not finish at the desired end of scheduled time slot. This is the case for jobs with no run time limit (h_rt) because jobs requested h_rt will be deleted automatically by Grid Engine if they exceed the requested time.

    The solution to avoid such conflicts is by keeping non-runtime-limit jobs diverged from AR jobs. Because every job has a implicit slot count that refers to a queue instance this can be done by not reserving a AR on hosts where jobs with no run-time-limit are running and at the same time by not scheduling a non-runtime-limit job on a queue instance with a AR.

    The following example illustrates the behavior:

    % qstat -f
    queuename                      qtype used/tot. load_avg arch          states
    ----------------------------------------------------------------------------
    all.q@host1                   BIPC  1/1      0.01     sol-amd64
       3001 0.55500 Sleeper    rd141302     r     12/14/2006 10:20:47     1
    ----------------------------------------------------------------------------
    all.q@host2                     BIPC  0/1      0.09     sol-amd64
    
    % qrsub -a 12141200 -d 0:30:0 -u user1 -h host1
    denied: Reservation can't be granted
    
    % qrsub -a 12141200 -d 0:30:0 -u user1 -h host2
    Your reservation 1 has been granted
    
    % qsub -w e -l h=host2 job_script
    Unable to run job: error: no suitable queues.
    Exiting.
    
  3. AR and Resource Quotas
    The interaction between AR and resource quotas raises the questions whether it's necessary to honor quotas for AR jobs. Honoring AR reservation in resource quotas raises some issues with to much reserved resources or inefficient AR time frames. Therefore we've decided not to honor AR reservation and resource usage for AR jobs in the resource quotas.

    The reason why we are able to ignore RQS reservation and resource usage in conjunction with AR is that Quotas does not represent the capacity of a resource. The capacity is defined a global/queue/host level and must of course honored for AR requests. This means ignoring RQS will not lead in a resource overload, only the quota share for a special request may be exceed for the time the AR time window is open.

    The following examples illustrates the behavior:

    • resource quota: limit users user1 to slots=10
    • AR: start time now for user1 with slots=10.

    1. user1 can use 20 slots (10 with and 10 without AR)
    2. if all slots are used qquota output will show 10 of 10 used slots (the 10 AR slots are not booked)

5.2.2.2 AR open time window

5.2.2.2.1 AR error state

An already granted AR would become invalid for several reasons like configuration changes on reserved hosts, user set changes, or queue changes. All of these foreseeable events need to be rejected. A list of foreseeable events is:

The cases that are not foreseeable are the outage of a host or a queue instance error. In such cases the affected AR goes into error state. If error mail sending is enabled the AR requester and all emails specified with the -M option will get a mail when the AR became invalid (earliest at AR start time when the error is detected) and when the AR is satisfied again.

The error handling can be specified at AR submit time. A hard error blocks the fault AR and the scheduled does not dispatch the referring AR-jobs. With soft error the AR stays usable with the remaining resources.

5.2.2.3 AR cleanup

When the AR end time is reached at first all jobs referring to the AR will be deleted and at second the AR itself will be deleted. No jobs can request the AR handle any longer.

5.2.3 Interfaces

5.2.3.1 AR States

State Description
w waiting - granted but start time not reached
r running - start time reached
x exited - end time reached and doing cleanup
d deleted - manual deletion
e error - AR became invalid

5.2.3.2 Additional GDI Requests

This request adds the advance reservations in the specified list. The list elements are full specified advance reservations. The request is used for implementing qrsub command.

This request deletes all advance reservation in the specified list. The list elements needs only to specify the name or ID of the advance reservation to be removed. The request is used for implementing the qrdel command.

5.2.3.3 Additional Event Client requests

This event is sent each time when a new advance reservation has been created. It contains the full advance reservation object, but no usage information.

This event is sent each time when an existing advance reservation is removed. The event contains only the name of the advance reservation to be removed.

This event is sent each time when an existing advance reservation has changed. It contains the full advance reservation object.

5.2.3.4 AR Cull Specification

TBD

5.2.4 Other Requirements

5.3 AR Job Object

5.3.1 Overview

The AR Job Object is not a new Component, it's a enhancement of the regular N1GE Job Object to deal with Advance Reservations. All previous 6.1 job properties are still valid will work for AR and non AR jobs.

AR job life time

5.3.2 Functionality

5.3.2.1 Job Submission

5.3.2.1.1 Validate AR request

At job submit time some additional verifications are done. If one of the verifications fails the job will be rejected.

The verifications are:

  1. AR-ID is valid
    Verify that the requested AR-ID has the correct syntax and the AR exists.
  2. AR access list
    Ensure the job submit user has access to the selected AR.
  3. job run time
    Ensure the runtime selected with hr_t is a valid time frame. At first the runtime must be smaller than the AR time window duration and at second the the runtime must be smaller than the AR end time minus the current time.
  4. job resource requests
    A job can only request the resources already reserved by the AR. It's not possible for jobs to use more or other resources than requested in the AR.

5.3.2.2 Job Scheduling

Jobs requesting an AR will only be scheduled if the AR start time is already reached, which means the AR is active and in state running.

Additionally it needs to be ensured the resources requested by the job are all reserved and free. If other jobs are running using the AR and the resources are already in use the job will not dispatched.

5.3.2.3 Job Execution

To be sure jobs requesting an AR can not use more resources than reserved by the AR it's necessary to debit the AR job usage. This is done before the job starts. At job end the usage needs to be undebited.

All jobs (pending or running) will be deleted when the AR refereed by the job ends or will be deleted.

5.3.3 Interfaces

5.2.3.1 AR Cull Enhancements

TBD

5.3.4 Other Requirements

5.4 Component Accounting

5.4.1 Overview

After a AR end it's necessary to detect if a resource was reserved but not used over the complete time, only used partial or not used at all. Thus we need to add billing capabilities by enhancing the current accounting and reporting files. In addition dbwriter and arco needs to be enhanced.

The Accounting Component is part of the qmaster.

5.4.2 Functionality

5.4.3 Interfaces

5.4.3.1 accounting file

To get an overview what finished jobs were running in a previous AR the job entry in the accounting field needs an additional field for the AR-ID.

5.4.3.2 reporting file

TBD

5.4.4 Other Requirements