NAME
     sge_shepherd - Grid Engine single job controlling agent

SYNOPSIS
     sge_shepherd

DESCRIPTION
     sge_shepherd provides the parent process functionality for a
     single  Grid Engine job.  The parent functionality is neces-
     sary on UNIX systems to retrieve resource usage  information
     (see  getrusage(2))  after  a job has finished. In addition,
     the sge_shepherd forwards signals to the job,  such  as  the
     signals  for  suspension, enabling, termination and the Grid
     Engine checkpointing signal (see sge_ckpt(1) for details).

     The sge_shepherd receives information about the  job  to  be
     started  from the sge_execd(8).  During the execution of the
     job it actually starts up to 5 child processes. First a pro-
     log  script  is run if this feature is enabled by the prolog
     parameter in the cluster configuration.  (See  sge_conf(5).)
     Next a parallel environment startup proceedure is run if the
     job is a parallel job. (See sge_pe(5) for more information.)
     After  that,  the  job itself is run, followed by a parallel
     environment  shutdown  procedure  for  parallel  jobs,   and
     finally  an epilog script if requested by the epilog parame-
     ter in the cluster  configuration.  The  prolog  and  epilog
     scripts  as  well  as  the  parallel environment startup and
     shutdown procedures are to be provided by  the  Grid  Engine
     administrator  and are intended for site specific actions to
     be taken prior and after execution of the actual user job.

     After the job has finished and the  epilog  script  is  pro-
     cessed,  sge_shepherd  retrieves  resource  usage statistics
     about the job, places them in a job specific subdirectory of
     the  sge_execd(8)  spool  directory  for  reporting  through
     sge_execd(8) and finishes.

     sge_shepherd also places an exit status file  in  the  spool
     directory.  This  exit  status  can  be viewed with qacct -j
     JobId  (see  qacct(1));  it  is  not  the  exit  status   of
     sge_shepherd  itself  but  of one of the methods executed by
     sge_shepherd. This exit status can  have  several  meanings,
     depending on in which method an error occured (if any).  The
     possible methods are: prolog, parallel start, job,  parallel
     stop,  epilog,  suspend, restart, terminate, clean, migrate,
     and checkpoint.

     The following exit values are returned:

     0      All methods: Operation was executed successfully.

     99     Job script, prolog and epilog: When FORBID_RESCHEDULE
            is  not  set  in the configuration (see sge_conf(5)),
            the job gets requeued.  Otherwise see "Other".

     100    Job script, prolog and epilog:  When  FORBID_APPERROR
            is  not  set  in the configuration (see sge_conf(5)),
            the job gets requeued.  Otherwise see "Other".

     Other  Job script: This  is  the  exit  status  of  the  job
            itself.  No  action  is  taken  upon this exit status
            because the meaning of this exit status is not known.
            Prolog, epilog and parallel start: The queue  is  set
            to error state and the job is requeued.
            Parallel stop: The queue is set to error  state,  but
            the  job  is not requeued. It is assumed that the job
            itself ran successfully and only the clean up  script
            failed.
            Suspend,  restart,  terminate,  clean,  and  migrate:
            Always successfull.
            Checkpoint: Success, except for kernel checkpointing:
            checkpoint  was  not successfull, did not happen (but
            migration will happen by Grid Engine).

RESTRICTIONS
     sge_shepherd should not be invoked  manually,  but  only  by
     sge_execd(8).

FILES
     <execd_spool>/job_dir/<job_id>     job specific directory

SEE ALSO
     sge_intro(1), sge_conf(5), sge_execd(8).

COPYRIGHT
     See sge_intro(1) for a full statement of rights and  permis-
     sions.


















Man(1) output converted with man2html