5.2 Slurm Batch Jobs - Single GPU
Batch jobs are non-interactive and the preferred way of using the LRZ AI Systems.
In a batch job, the allocation of resources and job submission are done in a single step.
If no resources are available, the job queues until the requested allocation is possible.
Slurm
SLURM is a resource manager designed for multi-user systems. If the requested resources are not immediately available, the job is placed in a queue until the allocation can be granted. You can check the queue using squeue, or view only your own jobs with squeue --me. SLURM schedules jobs using policies such as fair-share to balance access among users. Batch jobs are executed automatically once scheduled.
SLURM provides a set of command-line tools, often referred to as s-commands, such as sinfo, salloc, srun, sbatch, and squeue, which are used to submit, allocate, run, and monitor jobs on the system. For more details, see the official SLURM documentation.
Slurm Essentials
The sbatch
command submits jobs by describing them in a file with a special format. This file is usually referred to as batch script.
Once the script is created, it is submitted as:
sbatch example.sbatch
An example of a batch script is depicted next.
#!/bin/bash #SBATCH -p lrz-v100x2 # Select partition (use sinfo) #SBATCH --gres=gpu:1 # Request 1 GPU #SBATCH -o log_%j.out # File to store standard output #SBATCH -e log_%j.err # File to store standard error echo "Start on $(hostname) at $(date)" # Run outside of srun srun command # Run the actual command GPU-enabled with srun
The first part of a batch script is the preamble, which includes lines starting with #! and #SBATCH. This section defines the resource allocation required to run the job, such as the partition, number of GPUs, and runtime limits.
In addition, two important #SBATCH options specify where to redirect the job’s output and error messages. Since batch jobs are non-interactive, there is no terminal or shell to display output. Instead, the standard output and error streams must be written to files. In our example, we use log_%j.out and log_%j.err, where %j is automatically replaced by the Slurm job ID.
Following the preamble, the actual job commands are listed.
In this example, the script runs two commands sequentially.
The first command (echo) is not run with srun. It is executed directly by the SLURM batch script on the first node of the allocation. Since it is outside of SLURM’s job step management, it does not benefit from features like resource binding or tracking. This is fine for simple shell operations such as logging or environment setup.
The second command is run with srun, which means it is executed as a managed SLURM job step. This launches the command in a parallel context, typically across all nodes of the allocation (unless specified otherwise). If the allocation includes only a single node, srun will still create a parallel job, but limited to that one node.
Running with srun initializes MPI-related environment variables such as LOCAL_RANK, RANK, and WORLD_SIZE, which are often required by distributed frameworks (e.g., PyTorch, TensorFlow, MPI). These variables help coordinate parallel computation by assigning each process a role and identity within the job.
Batch Jobs with Enroot Containers
Non-Parallel Jobs
To run non-parallel containerized jobs with SLURM using Enroot, you typically work with a pre-existing container image.
This involves two separate steps in your batch script:
Creating a container from the container image
Running the desired command inside the created container
The following script illustrates this approach:
#!/bin/bash #SBATCH -p lrz-v100x2 # Select partition (use sinfo) #SBATCH --gres=gpu:1 # Request 1 GPU #SBATCH -o log_%j.out # File to store standard output #SBATCH -e log_%j.err # File to store standard error enroot create <NAME>.sqsh enroot start NAME command
The option --name CNAME in enroot create would assign the container the name CNAME and create it on the first node of your allocation.
The line enroot start NAME command
also executes the command
in the first node of the allocation within the container.
As of Ubuntu 22.04, using the Enroot
command line interface for starting the job without previously creating the container is not possible.
Parallel Jobs
One Container
To apply a single container image across all commands in your job, it is recommended to use the --container-image option in the batch script preamble.
Although srun is not explicitly used before each command in the script, they are all executed as part of a parallel job.
Keep in mind that invoking srun within this context will fail, as it is not available inside the scope of an already parallel job.
#!/bin/bash #SBATCH -p lrz-v100x2 # Select partition (use sinfo) #SBATCH --gres=gpu:1 # Request 1 GPU #SBATCH -o log_%j.out # File to store standard output #SBATCH -e log_%j.err # File to store standard error #SBATCH --container-image="docker://nvcr.io/nvidia/pytorch:23.07-py3" command1 command2
One Containers per Job Step
For containerized parallel jobs, even when allocating just a single node, we recommend using the Pyxis plugin for SLURM to manage container execution.
The following example illustrates a typical workflow:
command1 is executed in a container created from
the image nvcr.io#nvidia/pytorch:23.07-py3 (on each allocated node)After completion, command2 is run in a new container based on
the image nvcr.io#nvidia/tensorflow:22.12-py3 (on each allocated node)Finally, command3 and command4 are executed consecutively within the same container
created from the image nvcr.io#nvidia/tensorflow:22.12-py3
Note: each call to srun results in the creation of a fresh container instance, even if the same image is used.
#!/bin/bash #SBATCH -p lrz-v100x2 # Select partition (use sinfo) #SBATCH --gres=gpu:1 # Request 1 GPU #SBATCH -o log_%j.out # File to store standard output #SBATCH -e log_%j.err # File to store standard error srun --container-image=nvcr.io#nvidia/pytorch:23.07-py3 command1 srun --container-image=nvcr.io#nvidia/tensorflow:22.12-py3 command2 srun --container-image=nvcr.io#nvidia/tensorflow:22.12-py3 bash -c "command3 ; command4"