|Multiple serial workers with srun |
|Multiple serial |
Workers with mpiexec
|Multiple (parallel) workers|
|Multiple (parallel) workers|
(on one node)
|Long running parallel jobs with large number of nodes for each worker||yes||no||no||no||no||no||no|
|Parallel workers||yes||no||no||yes||yes||no||yes, with redisexec|
|Serial workers||no||yes||yes||yes||yes||yes||yes (R or python)|
|Number of workers = number of allocated cores||no||yes||yes||yes||yes||yes||yes|
|Number of workers> number of allocated cores||no||no||no||yes||no||yes||yes|
|Number of Workers >> number of allocated cores||no||no||no||no||no||yes||yes|
|(very) unbalanced Workers||yes|
Depending on the method the following environment variables can be used to distinguish between tasks
- SLURM_ARRAY_TASK_ID: for Array jobs
- SLURM_PROCID: The MPI rank (or relative process ID) of the current process (with srun)
SLURM_LOCALID: Node local task ID for the process within a job (with srun)
- SLURM_STEPID: The step ID of the current job (with srun)
- PMI_RANK: The MPI rank (or relative process ID) of the current process with Intel MPI (with mpiexec)
- SUBJOB within pexec
Job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily; job arrays with many tasks can be submitted in milliseconds. All jobs must have the same initial options (e.g. size, time limit, etc.). Job arrays will have additional environment variable set.
- SLURM_ARRAY_JOB_ID will be set to the first job ID of the array.
- SLURM_ARRAY_TASK_ID will be set to the job array index value.
- SLURM_ARRAY_TASK_COUNT will be set to the number of tasks in the job
Combining Job Arrays with other methods below is possible (e.g. with "Mutliple parallel workers with srun", or "srun --multi-prog")
Multiple serial Workers with srun --multi-prog
Run a job with different programs and different arguments for each task. In this case, the executable program specified is actually a configuration file specifying the executable and arguments for each task. The number work tasks is limited by the number of SLURM tasks.
- Task rank: One or more task ranks to use this configuration. Multiple values may be comma separated. Ranges may be indicated with two numbers separated with a '-' with the smaller number first (e.g. "0-4" and not "4-0"). To indicate all tasks not otherwise specified, specify a rank of '*' as the last line of the file.
- Executable: The name of the program to execute
- Arguments: The expression "%t" will be replaced with the task's number (SLURM_TASKID). The expression "%o" will be replaced with the task's offset within this range (e.g. a configured task rank value of "1-5" would have offset values of "0-4"). Single quotes may be used to avoid having the enclosed values interpreted. The expression "%t" will be replaced with the task's number. The expression "%o" will be replaced with the task's offset within this range (e.g. a configured task rank value of "1-5" would have offset values of "0-4", SLURM_LOCALID).
Multiple serial Workers with mpiexec
a) only a few commands
b) many commands using PMI_RANK
Multiple parallel workers with srun
srun can be used as a resource manager, which also works with OpenMP threading. The following script runs multiple job steps in parallel within an allocated set of nodes. Currently, we recommend issuing a small sleep between the submission of the tasks. The sum of the nodes involved in the job steps should not be larger than the number of allocated nodes of the job. Here we provide an example for a script where 128 work units have to be performed and up to 10 workers are running in parallel, each submitting one sub-job of 2 nodes per unit at the time. Therefore we need to allocate 20 nodes for the whole job. In this example, each subjob uses 8
tasks-per-node (thus 16 tasks per worker in total) and 6
cpus-per-task for OpenMP threading. For that we export, as usual, the value of
OMP_NUM_THREADS and invoke
srun with the
-c $SLURM_CPUS_PER_TASK option. Remove both (or set
cpus-per-task to 1) and adjust
ntasks-per-node for running without OpenMP.
Here the important points to note are that we background each srun with the "&", and then tell the shell for wait for all child processes to finish before exiting.
If the sub-jobs are well balanced you can, of course, do the following:
Multiple parallel workers with mpiexec (within a node)
It is not possible to run more than one MPI program concurrently using the normal startup. However, within a node you can start an arbitrary number of mpiexec using the communication within the shared memory. Typically you have to specify the processor list for pinning to avoid overlap of the particular programs (However, you can do this if you need).
Here the important points to note are that we background each mpiexec with the "&", and then tell the shell for wait for all child processes to finish before exiting.
If the tasks are not run in the background then they will run one after the other and if the memory is not divided then the first srun will take the entire allocation thus preventing the others from starting which also causes the sequential execution of the calls to mpiexec.
if you want several of these mpiexecs, you can pack the second part of the script above into a shell script, make it executable and execute it on each of your allocated nodes with srun (as described in the previous section).
serial commands with pexec from lrztools
pexec takes a configuration file with serial commands. The number of worker tasks may be much larger than the number of allocated cores. Within the script or wrapper the environment variable $SUBJOB may by used to distinguish between tasks. The next free core will take the next task in the list.
Using R and redis
A simple worker queue for R functions using redis as database
Redisexec can be used in single node mode (default) or in MPI mode (used if 'nodespertask'>1 or 'forcempi'=1). In MPI-mode, redisexec will automatically split up the SLURM host file to create MPI groups of size 'nodespertask' (one of redisexec's arguments). Currently, only Intel-MPI is supported.
For more documentation please see the github page.