MATLAB Parallel Server (MPS)
The MPS (fka MATLAB Distributed Computing Server (MDCS) in R2018b and older releases) extends the functionality of the Parallel Computing Toolbox by allowing parallel jobs across multiple compute nodes. MPS jobs are handled as common parallel jobs using the Slurm queueing system in the background. MPS jobs can be submitted from both the login nodes of the Linux Cluster and the user's remote computer (e. g., laptop, desktop PC). Hereafter, we briefly describe both ways. Please also consult the MPS User Guide for use on CoolMUC-2 for detailed and MATLAB Parallel Server for general information.
Please note: This MATLAB product is available to (particular) TUM users only.
Submit MPS Job to the Linux Cluster
In order to run your parallel MATLAB code exploiting PCT + MPS, you have to obey the steps described by the tables in the following sections.
I want to submit MPS jobs from a MATLAB session on the Linux Cluster Login Node
Step | Comment |
---|---|
1. Login to one of the Linux Cluster login nodes, load a MATLAB module and start MATLAB | |
Getting Started: MATLAB Modules | |
2. Job configuration for CoolMUC-2 | |
>> configCluster(cluster_name, partition_name); | Run cluster configuration. This step is mandatory. Otherwise, MATLAB will use its default cluster settings ('local' cluster) which will not work! Both name of the cluster (e.g. cm2) and the name of the partition (=queue, e.g. cm2_std) have to be passed to configCluster(). Please check the requirements of your job, i. e., the number of tasks (workers) and tasks per node. Then set the correct cluster and partition. |
>> ch = parcluster; | Create a cluster object and return the cluster object handle. |
>> % Job walltime in format hh:mm:ss => --time=00:30:00 in Slurm >> ch.AdditionalProperties.WallTime = '00:30:00'; >> % MPI tasks per node => --tasks-per-node=28 in Slurm >> ch.AdditionalProperties.ProcsPerNode = 28; >> % additional: disabling multi-threading and setting memory requirement >> ch.AdditionalProperties.AdditionalSubmitArgs = '--cpus-per-task=1 --mem=55G'; | Define job parameters (members of the cluster object). MPS will translate all settings to the according Slurm flags (needed by the sbatch command, see documentation of the Slurm Workload Manager at LRZ). Please consider, that only most important Slurm flags are provided by the cluster object. Following parameters can/must be adjusted by the user. Further Slurm flags can be added to the cluster object as a space-separated string using the field AdditionalSubmitArgs. There are pre-defined parameters which may not be changed:
|
>> jobdir = fullfile(getenv('SCRATCH'), 'MdcsDataLocation/coolmuc', version('-release')); >> if ~exist(jobdir), mkdir(jobdir); end >> ch.JobStorageLocation = jobdir; | MPS will store both results of user code (MATLAB's "mat" file format) and job output to the file system. By default the HOME directory is used. Due to performance and capacity reasons, we highly recommend to use the SCRATCH partition. NOTE: Depending on the usecase, the output might exceed the maximum size of a mat file. The job will finish successfully. However, the data will be lost. Hence, we also recommend that the user code directly writes all data to the file system. |
>> ch.saveProfile; | Save settings. |
3. Submit MPS job to Slurm workload manager | |
>> job = ch.batch(@myfunction, n_arg_out, {arg_in_1, ..., arg_in_n}, 'Pool', np); | Submit job, which will run the user code 'myfunction.m', by calling the batch function as a member of the cluster object. The input/output arguments are as follows: Input: @myfunction ... reference to myfunction n_arg_out ..... number of output arguments of myfunction arg_in_# ...... list of input arguments of myfunction 'Pool', np .... key-value pair with size of parallel pool = number of workers Output: job ........... job object providing all job information and member functions to control the job IMPORTANT: Example: The job uses 14 tasks per node and 28 workers in total. Including an additional task a third compute node with only one task will be involved into the job. That results in inefficient resource usage and probably longer waiting times! |
4. Basic job control functions | |
>> job.State >> job.cancel >> myresults = job.fetchOutputs | show current state ('queued', 'running', 'finished', 'failed'), equivalent to "squeue --clusters=cm2 --users=$USER" in Linux command line cancel job, i. e. remove it from the Slurm queue obtain all results (return values) from myfunction |
I want to submit MPS jobs from a MATLAB session on my Laptop or Desktop PC
Step | Comment | ||||
---|---|---|---|---|---|
1. Prerequisites | |||||
1a. MATLAB | |||||
Install Matlab on your computer. | The MATLAB release has to match one of the releases supported by LRZ. Please refer to next step. | ||||
1b. Download LRZ MPS configuration | |||||
| Using file, showing a release mismatch between MPS and installed MATLAB, will cause a failure of MPS jobs. | ||||
1c. Extract zip archive and install files | |||||
For example on Linux terminal: > unzip matlab-RYYYYx.mps.remote.zip > cp -r matlab-RYYYYx.mps.remote/* <MATLAB_PATH>/toolbox/local/ | The zip file matlab-RYYYYx.mps.remote.zip contains the directory MATLAB_PATH refers to the base directory of your MATLAB installation. | ||||
2. Job configuration for CoolMUC-2 | |||||
Please follow the instructions described in step 2 of previous table. | |||||
3. Submit MPS job to Slurm workload manager | |||||
Please follow the instructions described in step 3 of previous table. After execution of the batch command you will be asked
Then, the job will be transferred to the cluster and submitted via Slurm. | |||||
4. Basic job control functions | |||||
Please refer to step 4 in previous table. | Now, you are working remotely on the Linux Cluster. Execute job control functions inside your MATLAB installation. The commands will be transferred to the cluster via ssh (in the background). |
MPS Examples
The following table shows two examples using either spmd environment or parfor loop. For convenience, the MATLAB file "job_config.m" summarizes all configuration steps and submits the job to Slurm. Using this example, you may test work with MPS on both login node and your remote computer. Start MATLAB and run job_config, for example:
>> myfunction = 'myfunction_spmd'; >> % or >> myfunction = 'myfunction_parfor'; >> >> cluster_name = 'cm2_tiny'; >> partition_name = 'cm2_tiny'; >> walltime = '00:30:00'; >> tasks_per_node = 28; >> num_worker = 16; >> >> [job,ch] = job_config(myfunction, cluster_name, partition_name, walltime, tasks_per_node, num_worker);
Configuration script | Implementation of user-defined function |
---|---|
spmd example | |
parfor example | |