Getting Started

This is a step-by-step tutorial about understanding job scheduling and use the job scheduler to submit computing jobs on the Hoffman2 Cluster.

The users access Hoffman2 Cluster’s computing power via the job scheduler, Univa Grid Engine (or UGE), by submitting their computing jobs, either in batch mode or in interactive mode. The user should have a good idea about the available computing resources (CPU, memory size, and software packges) before submitting computing jobs. In practice, such understanding may help reducing unnecessary wait time, avoid common mistakes, or know where to look when things do not work.

Prerequisites

You should already have an account on the Hoffman2 Cluster. If not, see this page for getting an account.

Connection to Hoffman2 Cluster

The most standard way to connect to Hoffman2 Cluster is by running the secure shell (ssh) client from a text terminal, such as the (built-in) Terminal program on the Mac, or the (built-in) PowerShell terminal on Windows.

Text editing

You will need to edit your job script before submitting it to the job scheduler. The easiest way is to edit the job script (text file) on Hoffman2 Cluster directly, using a text editor such as nano, vim or emacs. Or, perhaps less preferrably, edit the files on your local computer and upload it to Hoffman2 Cluster. If you are editing the files on Windows, be aware of the differences of the end-of-line characters between Windows and Linux.

The freeway analogy

Think of Hoffman2 Cluster like a freeway: many lanes (the compute nodes) filled with many cars (the user jobs).

source: https://en.wikipedia.org/

Observations from the free way analogy:

If everyone is going at full speed, the freeway can support an incredible amount of “flow rate”
Getting into the freeway might take some time (ramp, merge, etc.)
If someone blocks a lane, other cares are affected
If someone blocks a few lanes, more are affected
Unlike your own drive way, one needs to follow some rules when using the freeway

Login nodes vs. compute nodes

Key point: Use the compute nodes via the job scheduler as much as possible.

The login nodes have limited CPU/memory. They are not for running intensive computations/tasks (including compiling large software packages).
Examples of using login nodes are: editing source files, submitting jobs, and checking job status, etc.
Use the compute nodes for compute-intensive tasks
We will cover how to access the compute nodes via the job scheduler in details.
Recall The freeway analogy: your convenience may negatively affects others.

Free account vs. High priority access

Everyone affiliated with UCLA can get an account on Hoffman2 Cluster. Research groups can purchase compute nodes for high priority access, or additional storage beyond the standard $HOME directory (see: File system). For details about purchasing the pricing, please see: https://idre.ucla.edu/service-pricing-ordering-information

File system

You have access to several directories for differet purposes:

Directory	Environment variable	Purposes	Life span
home	`$HOME`	40GB, home directory	same as your account
scratch	`$SCRATCH`	2TB, temporariy I/O	at least 2 weeks (sometimes longer but not guaranteed)
work	`$TMPDIR`	100+GB, node-local temporary I/O	runtime of a job
Purchased storage (optional)	See the symbolic link in your `$HOME`, if available	project space	monthly/annual renewal

In general, the $HOME directory is for storing your source code, scripts, documents and maybe some data files (be aware of the 40GB space limitation). The $SCRATCH directory is good for running jobs, but you need to copy the useful output away before the files are automatically purged. The $TMPDIR, local to a compute node, may be useful for certain programs that can take advantage of very fast disk I/O.

Examples of the directory names:

$HOME: /u/home/b/bruin
$SCRATCH: /u/scratch/b/bruin
$TMPDIR: /work/1234.1.pod_smp.q (different for different jobs, on different compute nodes)
Purchased storage: /u/project/PI_name/...

Note

It is advisable to use the environment variable names, such as $SCRATCH, in your job scripts instead of “hard-wiring” the full paths.

The role of the job script

Key points:

The job scheduler does not run your computations automatically.
The job scheduler is about requesting computing resources (e.g. CPU, memory, runtime, etc)

Once the request is granted, a job is dispatched to the allocated CPU core(s)/memory/compute nodes to run. The user is responsible for specifying how the job is run, typically in a job script.

Typically a job script consists of two parts:

The requsted computing resources (e.g. how many CPU cores, how much memory, and for how long, etc.)
How the computation is run on the granted computing resources (CPU/memory)

All of these information can be written into one job script (shell script).

Some of the information may be provided via the command line, but we recommended writing everything into a job script (so it’s self-documenting).