CoolMUC-2: Open issues after the Cluster Hardware and Software Upgrade

After the maintenance of CoolMUC-2, the system is still in test operation. Quite a number of frequently used applications are working Ok. But there are still open issues with respect to stability of the system and the availability of the software stack as well. Main issues:

  • Software: The provisioning of the LRZ software stack is still work in progress. Changes might be applied at short notice.

Please check the following tables for details on particular software packages and system issues.


Software Packages from the Spack stack that currently do not work yet

PackageModule namePriorityComments
ANSYS Mechanicalansys

(Haken)

(Haken)

(Haken)

(Fehler)

Version 2020.R1

Version 2019.R3

Version 2019.R2

all older versions of ANSYS Mechanical are not able to run due to library dependency errors

All ANSYS Mechanical tests were based on Intel MPI 2018 (provided through SPACK). Working moduls published, documentation updated.

ANSYS Fluentfluent

(Haken)

(Haken)

(Haken)

(Haken)

(Haken)(Warnung)

(Haken)(Warnung)

Version 2020.R1 - (with Intel MPI 2018)

Version 2019.R3 - (with Intel MPI 2018)

Version 2019.R2 - (with Intel MPI 2018)

Version 2019.R1 - (with Intel MPI 2018)

Version 19.2 - works only with IBM-MPI : "-mpi=ibmmpi -pib.dapl", see documentation and module comments

Version 19.1 - works only with IBM-MPI : "-mpi=ibmmpi -pib.dapl", see documentation and module comments

Earlier ANSYS Fluent versions 19.1 and 19.2 are provided, but are no longer recommended and will be retired soon.
Working modules of ANSYS CFX published on the module system. ANSYS Fluent documentation updated.

ANSYS CFXcfx

(Haken)

(Haken)

(Haken)

(Haken)

(Haken)

(Haken)

Version 2020.R1 (with Intel MPI 2018)

Version 2019.R3 (with Intel MPI 2018)

Version 2019.R2 (with Intel MPI 2018)

Version 2019.R1 (with Intel MPI 2018)

Version 19.2 (with Intel MPI 2018, also the software originally uses Intel MPI 2017 - to be retired soon!)

Version 19.1 (with Intel MPI 2018, also the software originally uses Intel MPI 2017 - to be retired soon!)

Working modules of ANSYS CFX published on the module system. ANSYS CFX documentation updated.

ANSYS ICEM/CFDicem(Haken)tested, Ok.
ANSYS EMansysedt(Fehler)

concerns HFSS and Maxwell-2d/-3d : SLES15 is a not supported OS; therefore almost guaranteed not to work on CM2

HFSS and Maxwell, Version 2019.R3 work under SLURM on SLES12 on CoolMUC-3.

ANSYS Ensightensight(Haken)tested, Ok.
ANSYS Workbenchwb(Haken)please use on RVS only (which is still SLES12)
COMSOLcomsol(Haken)

Tested Version 5.5 (best chance to get it running):

https://www.comsol.de/system-requirements (SLES 15 is supported)
https://www.comsol.com/support/knowledgebase/1001 (Slurm is supported)

Internal MPI Support: Intel(R) MPI Library, Version 2018 Update 2 Build 20180125 (id: 18157)

External MPI Support: via -mpiroot flag

  • GUI on login nodes works (Haken)
  • For single node share memory (Haken)
  • For >=2 Nodes:
    • nach module load intel-mpi/2018.4.274
      comsol batch -mpibootstrap slurm -mpifrabrics shm:ofa -inputfile micromixer_clean.mph -outputfile out_cm2.mph -study std1 -tmpdir $TMPDIR -mpidebug 10
OpenFOAMopenfoam(Haken)Spack-provided Modules are installed in Spack 20.1.1.
Paraviewparaview(Haken)new Spack-module are installed in Spack 20.1.1.
CP2kcp2k     2ELPA version in the build is not supported; as an intermediate remedy use `PREFERRED_DIAG_LIBRARY SL` in the `@global` section of your input.
Moldenmolden(Haken)fixed with `spack/staging/20.1.1`
Cube Analysis Toolcube(Haken)fixed with `spack/staging/20.1.1`
MSC Nastranmscnastran(Haken)

Version 20182 - (Haken) - tested

Version 20190 - (Haken) - tested

Version 20191 - (Haken) - tested

Version 20200 - (Haken) - tested

Scalable Molecular Dynamicsnamd2
NetCDFnetcdf(Haken)new Spack-module is installed in Spack 20.1.1.
Quantum Espressoquantum-espresso(Haken)fixed with `spack/staging/20.1.1`
Scalasca Analysis Toolkitscalasca(Haken)fixed with `spack/staging/20.1.1`
Siemens PLM StarCCM+

starccm

starccm_dp

(Haken)

(Haken)

(Haken)

(Haken)

Version 2020.1.1 - fixed for Intel MPI 2018 : "-mpi intel", works with "-mpi openmpi -fabric ofi";

Version 2020.1 - fixed for Intel MPI 2018 : "-mpi intel", works with "-mpi openmpi -fabric ofi";

Version 2019.3.1 - fixed for Intel MPI 2018 : "-mpi intel", works with "-mpi openmpi -fabric ofi";

Version 2019.2.1 - tested and working with Intel MPI 2018 : "-mpi intel", works with "-mpi openmpi -fabric ibv";

Working modules of StarCCM+ published on the module system. StarCCM+ documentation updated.

Intel MPI legacy versionsintel-mpi/2018(Haken)

SPACK module provided and successfully tested. 

As long as cgroups is deactivated (see list entry below), no Out-of-Memory errors at the end of Intel MPI 2018 jobs should occure anymore.

Matlabmatlab/R2019a_Update5-generic
matlab/R2019b-generic
(Haken)
  • Matlab itself works (batch jobs, interactive mode)
  • Slurm jobs, submitted from MPS, fixed
GNU Compiler 9.3.0gcc/9.3.0(-nv)(Haken)

g++ compiler requires libivonv.so for linking (module marked with -g++=broken). Use the new module with the -fixed suffix, if you encounter the problem. Will be resolved in the release version of the spack software stack.

System Services that do not work or have operational restrictions

ItemStatusComments
dssusrinfo command

fully available since 13:00


SLURM control of resource usage

workaround applied on  

cgroups have for now been deactivated since they appear to have too many side effects (spurious out-of-memory kills.

Filesystem issues

correction applied to Infiniband setup on  

We currently believe that job failures by not being able to load module depdendencies caused by non-availability of the GPFS filesystem,
should be resolved.

salloc does not workavailablePlease use lxlogin1, ..., lxlogin4 to submit interactive jobs on cm2_inter!
Please use lxlogin8 to submit interactive jobs on mpp3_inter!
sporadic job crashes due to Node Failure

available since  

The node failure problems should now be resolved.

However, you need to submit the jobs into the new, separate SLURM cluster cm2_tiny now.

  • The specification of the partition is not mandatory.
  • The specification of "qos" is not needed anymore.
#SBATCH --cluster=cm2_tiny
#SBATCH --partition=cm2_tiny

General advice

Although, jobs on cluster cm2_tiny work without setting the partition name, we highly recommend to define both cluster name and partition name in all job scripts!

ANSYS Mechanical with User Fortran failsresolved in spack/staging/20.1.1Issue fixed and spack/staging/20.1.1 rolled out.