Table of Contents
Writing good support requests is not only good for the support team, it is also better for you!
We are receiving a high number of LRZ Support Requests every day, so the time to understand and analyze each problem is an important factor. The easier it is to understand your intention and the observed issue, the faster we will provide you with a goal-leading answer. Below is a list of good practices.
Please take a minute to read this document to improve your communication with us at LRZ HPC Application Support. Thank you.
Never send support requests to staff members directly
Always enter your LRZ Support Requests via the webpage of the LRZ Service Desk. Staff members will pick them up there and will get back to you shortly. On the LRZ Service Desk they get tracked and have higher visibility. Some service requests require staff with certain specialization. Some staff members work in support only part time. Support staff you are sending your request to by email might be busy otherwise, might be bound in a full week HPC course, might be on vacation for the next 4 weeks,...
So sending your request to the LRZ Service Desk makes sure that somebody will pick it up shortly.
If you submit a LRZ Service Request - be responsive!
Is it a good idea to submit a LRZ support request and then to go for 2 weeks on a business trip or vacation? Most likely not.
In LRZ Service Desk and the CXS Application Support Group there are working engaged and highly-skilled scientists, who like to help you as quickly as possible. Also if you take into consideration the advices about writing a good and understandable LRZ service request given here on this page, there might be some open questions, there might be something to test on your side, additional deliverables need to be submitted to permit further investigation, etc. So please be prepared to be responsive to additional questions from our experts. If we do not receive responses from your side over a longer period of time, you need to expect, that your LRZ service request is getting closed. And thenon one hand side you have no answer to your question yet and on the other hand side you only have caused non necessary effort and frustration...
Provide us with a descriptive explanation of your issue
Give your request a descriptive subject
Your subject line should be descriptive and self-explaining.
Example of a not very descriptive subject (but nevertheless frequently received like this):
It does not work!
Something like "Problem with MPI" is not a very good support request subject either, since it could be valid for basically every support request we get. The support staff is a team. The subjects are the first thing that we see. We would like to be able to classify the support requests according to subjects before even opening the support request. And we would like to assign the support request to the most knowledgeable expert for your issue.
So it is good practice to mention already in the subject the following information:
- What is the name of your application
- If it is your own program, mention at least the scientific domain of your application like CFD, FEM, Astro, Geo, Life Science, etc.
- On which HPC system / Linux Cluster you are experiencing the issue
So a better example of a descriptive support request subject could read like this:
DLR TAU code: Issue on CoolMUC-2 with HDF5 library
Provide the most important information directly in your LRZ Service Request
Most of the time we are receiving LRZ Service Requests, where the provided information is not sufficient to get started with the analysis of the issue. So you can substantially help to speed-up the work on your reported issue by providing the following information right away in your first submitted LRZ Service Request:
- Is the observed issue reproducible?
- Name of the application
- If the application has been installed by yourself, please provide the installation path
- Name of the LRZ HPC system, you are currently working on (SuperMUC-NG, Cool-MUC-2/3, TeraMem, RVS, HPC Cloud, R-Studio,...)
- Please provide us with a copy of your SLURM script as attachment to your support request
- Please provide us with the path of the SLURM script submission folder (where do you call "sbatch ...") - and if different, with the path of your working directory with the input files of your job
- Please provide us with the received error messages as text files, not as images / screen shots, wherever applicable
- In case of license errors for commercial software applications (e.g. Intel, ANSYS, Abaqus, StarCCM+, Comsol,....):
- Which license server are you accessing?
- Which type of license you like to receive (Research, Teaching,...)
- Did you had in the past access to this license or are you newly requesting access to this type of licenses?
- Name of your user account (UID, whoami, $USER)
- Name of your computer (hostname, $HOSTNAME)
- In case of SSH access issues, please provide us with the full output of the ssh connection attempt with specified increased verbosity "ssh -vvv <your_target_host> -l $USER"
Please, do not screen shoot your ssh terminal and send us pictures (jpg, png, tiff…) of what you saw on your monitor! From these, we would be unable to cut & paste commands or error messages, unnecessarily slowing down our research on your problem. Your sample output does not need at all to “look good”, and we don’t need to know what fancy ssh or terminal software you have: a simple text-based cut & paste directly into the mail or ticket is the best we can work on with.
New problem – new LRZ Service Request
Please do not use one and the same LRZ Service Request to sent us a whole catalogue of unrelated questions. And do not send support requests by replying to unrelated and older (already resolved) issues.
Every reported issue gets a number and this is the number that you see in the reply. Replying to unrelated issues means that your email gets filed under the wrong thread and risks of being overlooked or ending up with the wrong LRZ support staff person. Combining several unrelated issues in one LRZ Service Request will substantially slow down the analysis process and will delay answers to you, since several different experts might be to get involved and a single thread cannot be shared with them at the same time.
Do not manhandle us as a "human interface" to the documentation or as simple “Let me Google that for you” assistants
Seriously: have you searched the internet with the exact error message and the name of your application…? Other scientists may have had the very same problem and might have been successful in solving it. By the way: that’s almost always how we start to research, too…
Reporting an issue
Specify your environment
Have you by yourself or your colleague compiled the code? Where and how?
Which modules were loaded before the code execution? Are you using a non-modified LRZ user environment or are you belonging to the class of users who like to load dozens of modules already in their .bashrc or .profile (usually not a good idea, since user environment differs across different HPC systems)? Are you heavily using Linux aliases? If you use non-default modules, user environment, aliases, etc. and you do not tell us about it, we will waste time when debugging within a different environment. Or we will not be able to reproduce the reported issue at all.
Simple cases: Be specific, include commands and errors
Whatever you do, don’t say that “X didn’t work”. Exactly give the commands you ran, environment (see above), and output error messages. The actual error messages mean a lot - include all of the output, do not cut it down because you think, that parts of it might be not important for our analysis. It is easy to include it in your LRZ Service request as an attached teext file.
The better you describe the problem the less we have to guess and ask.
Sometimes, just seeing the actual error message is enough to give an useful answer. For all but the simplest cases, you will need to make the problem reproducible, which you should always try anyway. See the following points.
Complex cases: Create an example which reproduces the problem
Create an example that we can ideally just copy under our own accounts and run and which demonstrates the problem. It is otherwise very time consuming, if the support team needs to write input files and run scripts based on your possibly incomplete description. See also next point. Make this example available to us, e.g. in a separated folder in your file space (which you should tell us). We do not search and read read-protected files or mess with your current simulation runs without your explicite permission.
Make the example as small and fast as possible
You run a calculation which crashes after running for one week on thousands of CPU cores. You are tempted to write to support right away with this example but this is not a good idea. Before you send a support request with this example, first try to reduce it. Possibly and probably the crash can be reproduced with a much smaller example (less CPU time and cores, smaller system size or grid or input data). It is so much easier to schedule and debug a problem which crashes after few seconds compared to a run which crashes after many hours. Of course this requires some effort from you but this is what we expect from you in order to create a useful support request. Often when isolating the problem, the problem and solution crystallize before even writing the support request.
The XY problem
This is a classic problem in user support. Please read http://xyproblem.info. Often we know the solution but sometimes we don’t know the problem.
In short (quoting from http://xyproblem.info):
- User wants to do X.
- User doesn’t know how to do X, but thinks they can fumble their way to a solution if they can just manage to do Y.
- User doesn’t know how to do Y either.
- User asks for help with Y.
- Others try to help user with Y, but are confused because Y seems like a strange problem to want to solve.
- After much interaction and wasted time, it finally becomes clear that the user really wants help with X, and that Y wasn’t even a suitable solution for X.
To avoid the XY problem, if you struggle with Y but really what you are after is X, please also tell us about X. Tell us what you really want to achieve. Solving Y can take a long time. We have had cases where after enormous effort on Y we realized that the user wanted X and that Y was not the best way to achieve X on the available LRZ HPC resources, while at the same time the problem X could have been solved with a little effort and consulting by using method Z.