2018-06-21 PRACE Workshop: HPC code optimisation workshop

Date:	Thursday, June 21 - Friday, June 22 2018, 9:00-18:00
Location:	LRZ Building, University campus Garching, near Munich, Kursraum (lecture room) 2, H.U.010
Contents:	Contents In the ever-growing complexity of computer architectures, code optimization has become the main route to keep pace with hardware advancements and effectively make use of current and upcoming High Performance Computing systems. Have you ever asked yourself: Where does the performance of my application lay? What is the maximum speed-up achievable on the architecture I am using? Is my implementation matching the HPC objectives? In this workshop, we will answer these questions and provide a unique opportunity to learn techniques, methods and solutions on how to improve code, how to enable the new hardware features and how to use the roofline model to visualize the potential benefits of an optimization process. We will begin with a description of the latest micro-processor architectures and how the developers can efficiently use modern HPC hardware, in particular the vector units via SIMD programming and AVX-512 optimization and the memory hierarchy. The attendees are then conducted along the optimization process by means of hands-on exercises and learn how to enable vectorization using simple pragmas and more effective techniques, like changing data layout and alignment. The work is guided by the hints from the Intel® compiler reports, and using Intel® Advisor. We provide also an N-body code, to support the described optimization solutions with practical hands-on. The course is a PRACE training event. Learning Goals Through a sequence of simple, guided examples of code modernization, the attendees will develop awareness on features of multi and many-core architecture which are crucial for writing modern, portable and efficient applications. A special focus will be dedicated to scalar and vector optimizations for the latest Intel® Xeon® Scalable processor, code-named Skylake, which is going to be utilized in the upcoming SuperMUC-NG machine at LRZ. The tutorial will have presentations and demo session. We will provide to the attendees access to Skylake processors and Intel® tools using VM instances provided by Google Cloud Platform. The workshop interleaves frontal and practical sessions. Here the outline: Day 1 09:00-09:45 Introduction 09:45-10:30 Login to Google cloud machines 10:30-11:00 Coffee Break 11:00-12:00 Code modernization approach 12:00-12:30 Scalar optimization 12:30-13:30 Lunch 13:30-14:30 Compiler autovectorization 14:30-15:00 Data layout from AoS to SoA 15:00-15:30 Coffee Break 15:30-16:00 Memory access optimization 16:00-16:30 SDLT (Intel SIMD Layout Templates) 16:30-17:00 Explicit vectorization 17:00-17:45 Skylake optimization 17:45-18:00 Wrap-up Day 2 09:00-09:30 Introduction to roofline model 09:30-10:30 Intel Advisor analysis 10:30-11:00 Coffee Break 11:00-12:30 Intel Advisor hands-on 12:30-13:30 Lunch 13:30-14:00 What’s new in Intel Advisor 2019 14:00-15:00 Introduction to MKL 15:00-15:30 Coffee Break 15:30-16:30 Hands-on MKL 16:30-17:00 What’s new in Intel Parallel Studio 2019 17:00-17:30 Open discussion and feedback 17:30-18:00 Wrap-up Please bring your own laptop (with X11 support and an ssh client installed) for the hands-on sessions! For GUI applications we require the installation of vncviewer (https://www.realvnc.com/en/connect/download/viewer/ )”. About the Lecturers Fabio Baruffa is a software technical consulting engineer in the Developer Products Division (DPD) of the Software and Services Group (SSG) at Intel. He is working in the compiler team and provides customer support in the high performance computing (HPC) area. Prior at Intel, he has been working as HPC application specialist and developer in the largest supercomputing centers in Europe, mainly the Leibniz Supercomputing Center and the Max-Plank Computing and Data Facility in Munich, as well as Cineca in Italy. He has been involved in software development, analysis of scientific code and optimization for HPC systems. He holds a PhD in Physics from University of Regensburg for his research in the area of spintronics device and quantum computing. Luigi Iapichino holds a position of scientific computing expert at LRZ and he is member of the Intel Parallel Computing Center (IPCC). His main tasks are code modernization for many-core systems, and HPC support. He got in 2005 a PhD in physics from TU München, working at the Max Planck Institute for Astrophysics. Before moving to LRZ in 2014, he worked at the Universities of Würzburg and Heidelberg, involved in research projects related to computational astrophysics.
Prerequisites	Attendees should be comfortable with either C/C++ or Fortran programming language and basic Linux command, like make and ssh. The installation of the VNC viewer is helpful. No previous experience in vectorization and parallelization is required and profiling tools, as well.
Language:	English
Teachers:	Dr. Fabio Baruffa (Intel), Dr. Luigi Iapichino (LRZ), Zakhar Matveev (Intel)
PRACE-PAGE:	https://events.prace-ri.eu/event/727/
Registration:	https://events.prace-ri.eu/event/727/registration/register
Contact:	Dr. Volker Weinberg (LRZ)

Contents

Learning Goals

Please bring your own laptop (with X11 support and an ssh client installed) for the hands-on sessions! For GUI applications we require the installation of vncviewer (https://www.realvnc.com/en/connect/download/viewer/ )”.

About the Lecturers