PBS Job Batch Submission

The PBSBatch class is a tool launch many jobs simultaneously.

The basic steps are:

  1. Instantiating a PBS that will be used to submit the jobs.

  2. Creating a list of BatchJob objects that hold the name of the job and a list of the commands to run.

  3. Setting up the job directory with the appropriate input files.

  4. Giving the PBS object and list of BatchJob to the PBSBatch constructor and then calling one of the launch methods.

Setting up the Job Directories

By default jobs are launched in directories with the same name as the job. This prevents concurrent jobs in the batch from overwriting each other’s output files.

To set up a job, these directories can be created and populated with code like this:

batch = PBSBatch(pbs,jobs)

batch.create_directories()
common_inputs_to_copy = ['fun3d.nml','*.cfg']

for job in jobs:
    for input in common_inputs_to_copy:
        os.system(f'cp {input} {job.name}')

Launch Methods

The batch jobs can be submitted with two different methods of the PBSBatch class.

launch_jobs_with_limit() will launch every job in the list, but it will only allow a certain number of jobs to be active in the queue system (queued, running, held) at a time. This would be the preferred launch method if you have many jobs and don’t want to submit 100s of jobs into the queue at a time as a courtesy to your fellow HPC users.

launch_all_jobs() will launch every job in the list. It has an optional argument to wait for the jobs to finish before returning or returning immediately after all of the jobs are submitted to the queue.

Batch Job Class

class pbs4py.pbs_batch.BatchJob(name, body)

Class for individual PBS jobs within a batch of jobs

Can be used as a context manager to enter/exit a directory with the job’s name

name

Name of the job.

Type

str

body

list of commands to run in PBS job

Type

List[str]

id

pbs job identifier returned by qsub

Type

str

get_pbs_job_state()

Get the job’s status after it has been submitted. Returns the entry of job_state in the qstat information, e.g., ‘Q’, ‘R’, ‘F’, ‘H’, etc.

Return type

str

PBSBatch Class

class pbs4py.pbs_batch.PBSBatch(pbs, jobs, use_separate_directories=True)

Batch of PBS jobs. Assumes all jobs required the same job request size. By default, separate directories with the job’s name will be used to separate output files.

Parameters
  • pbs (PBS) – PBS handler that will be used to submit the jobs

  • jobs (List[BatchJob]) – List of Job objects that will be run

  • use_separate_directories (bool) – whether to run each job in a separate directory with the job’s name

create_directories()

Create the set of directories with the jobs’ names

launch_all_jobs(wait_for_jobs_to_finish=False, check_frequency_in_secs=30)

Launch of the all of the jobs in the list. Stores the pbs job id in the job objects

Parameters
  • wait_for_jobs_to_finish (bool) – If True, the jobs will be submitted, and this function will not return until all of the jobs are finished.

  • check_frequency_in_secs (float) – Time interval to wait before checking if all jobs are done. Only relevant if wait_for_jobs_to_finish is True.

launch_jobs_with_limit(max_jobs_at_a_time=20, check_frequency_in_secs=30)

The “courteous” version of launch_all_jobs(wait_for_jobs_to_finish=True) and where a limit is set for the maximum number of jobs running or in the queue at a time since some people may not like if you submit 1000 jobs at once.

Parameters
  • max_jobs_at_a_time (int) – Limit for number of jobs to have queued, running, or held at a time

  • check_frequency_in_secs (float) – Time interval to wait before checking if jobs’ statuses.

wait_for_all_jobs_to_finish(check_frequency_in_secs=30)

A blocking check for all the jobs in the batch to finish. Can be paired with launch_all_jobs.

Parameters

check_frequency_in_secs (float) – How often to check and print the jobs’ states