Submitting batch jobs

A simple example

Consider this job script (job1.sh) of running Matlab on 1 CPU core:

 1#!/bin/bash
 2#$ -l h_rt=01:00:00
 3#$ -l h_data=4G
 4#$ -N job_name
 5#$ -cwd
 6#$ -o stdout.$JOB_ID
 7#$ -e stderr.$JOB_ID
 8
 9source /u/local/Modules/default/init/modules.sh
10module load matlab/R2020a
11
12matlab -nodisplay -nojvm  -nosplash -singleCompThread  < svd.m

Since this is our first job script example, we will explain each component in details. In later sections, we will consider other job types. You will find that submitting other job types to be very similiar to this basic one.

Job parameter

Option

Purpose

-l h_rt

wall-clock time limit

-l h_data

memory size

-N

job name (optional)

-cwd

use the current working directory

-o

file name of standard output

-e

file name of standard error

Additional comments:

  • This job script is bash (shell) script (indicated by the line, #!/bin/bash), so the body of the script has to written in bash syntax.

  • Without -cwd, the output will be written to the user’s top level $HOME directory. Use -cwd to run in the current directory, and to keep your top-level $HOME clean.

  • A batch job does not have a “screen” attached to it; all “screen output” of the program will go here. Similarily for the -e option.

  • The order of these job parameters does not matter.

  • The standard input and output files may be combined by (-e is omitted):

    #$ -j y
    #$ -o stdout.$JOB_ID
    
  • The $JOB_ID variable makes the stdout/stderr file names unique for different jobs.

  • In the job parameter block (prefixed by #$), these environment variables are supported: $JOB_ID, $TASK_ID (see “job array”), $JOB_NAME, $HOSTNAME, $USER and $HOME

  • By default, the job will allocate one CPU core to run the computation. That’s why we also force Matlab to run in the single-thread mode using the option -singleCompThread. Running computations exceeding the requested computing resources might cause the job to be terminated without notification.

  • A batch job does not have a “screen” attached to it. That’s why we turn off Matlab’s GUI (by -nodisplay -nojvm  -nosplash) to conserve memory consumption. Otherwise the memory size quest (h_data) will be unnecessarily much larger.

Common mistakes

  1. Adding spaces in between items, e.g.

    #$ -l h_rt=1:00:00, h_data=4G         # <-- this is wrong!!
    

In this case, due to syntax error the specified h_data value is not captured by the job scheduler. Instead, the default 1G is used, which may (or may not) be too small for the job.

  1. Missing the unit for memory size, e.g.

    #$ -l h_rt=1:00:00,h_data=4
    

    In this case, the job will allocate just 4 bytes of memory to run. Most likely it will fail.

  2. Fail to initialize Modules, or fail to module load, resulting in “command not found” error messages.

    When a job starts, it is run in a non-login shell in which Modules is not initialized. Consequently, it is necessary to initialize Modules in a job script (line 9 in the example above).

Quiz

Exercise 1

What is missing in this job script and why is it necessary? quiz1.sh

#!/bin/bash
#$ -l h_rt=1:00:00,h_data=4G
#$ -cwd
#$ -o stdout.$JOB_ID
#$ -j y

module load matlab/R2020a
matlab -nodisplay -nojvm  -nosplash -singleCompThread  < svd.m

Exercise 2

How will you fix the highlited lines in this job script? quiz2.sh

#!/bin/bash
$# -l h_rt=1:00:00,h_data=4G
$# -cwd
$# -o stdout.$JOB_ID
$# -j y

module load matlab/R2020a
matlab -nodisplay -nojvm  -nosplash -singleCompThread  < svd.m