Compiler Options

OptionDescription
-O0Disables all optimizations. Recommended for program development and debugging
-O1Enables optimization for speed, while being aware of code size (e.g no loop unrolling)
-O2Default optimization. Optimizations for speed, including global code scheduling, software pipelining, predication, and speculation.
-O3-O2 optimizations plus more aggressive optimizations such as prefetching, scalar replacement, and loop transformations. Enables optimizations for technical computing applications (loop-intensive code): loop optimizations and data prefetch.
-OiInline expansion of intrinsic functions
-xcode

SSE4.2: On the Westmeer Fat Nodes of SuperMUC Phase1: generate SSE4.2 instructions
AVX: On SandyBridge cores of the thin nodes of SuperMUC: generate Intel Advanced Vector Extensions.
CORE-AVX2: On Haswell cores of Phase2 of SuperMUC or on CollMUC2 of the Linux-Cluster: generate Advanced Vector Extensions 2.
MIC-AVX512: On the Knights Landing many-core nodes of the Linux-Cluster, may generate Advanced Vector Extensions 5
CORE-AVX512/COMMON-AVX512: On Skylake Nodes:  may generate Intel® Advanced Vector Extensions 512  

host: Tells the compiler to generate instructions for the highest instruction set available on the compilation host

-xcodeSANDYBRIDGE, HASWELL, KNL, SKYLAKE-512: May generate instructions for processors that support the specified Intel® microarchitecture code name. These keywords are only available for Intel compilers from 18.0 and higher.

-axcode1,code2

This option tells the compiler to generate multiple, processor-specific auto-dispatch code paths for Intel processors if there is a performance benefit. It also generates a baseline code path which can run on non-AVX processors. The Intel processor-specific auto-dispatch path is usually more optimized than the baseline path. May generate Intel(R) Advanced Vector Extensions 2 (AVX2), AVX, SSE4.2,  SSE4.1, SSE3, SSE2, SSE, and SSSE3 instructions for Intel(R) processors.
The  option  tells  the compiler to find opportunities to generate separate versions of functions that take advantage of features of the specified instruction features. If the compiler finds such an opportunity, it first checks whether generating a feature-specific version of a function is likely to result in a performance gain. If this is the case, the compiler generates both a feature-specific version of a function and a baseline version of the function. At run time, one of the versions is chosen to execute, depending on the Intel(R) processor in use.
Three version will be generated with -axAVX,CORE-AVX2: baseline, Sandy-Bridge and Haswell.

qopt-zmm-usage=
[low|high]
low: Tells the compiler that the compiled program is unlikely to benefit from zmm registers usage. It specifies that the compiler should avoid using zmm registers unless it can prove the gain from their usage (default for CORE-AVX512)
high: Tells the compiler to generate zmm code without restrictions (default for COMMON-AVX512)
-fno_aliasSpecifies that aliasing should not be assumed in the program. Allows the compiler to generate faster code.
-ftzEnables flush denormal results to zero (default with -O3)
-ipoEnables interprocedural (IP) optimizations, e.g. inline function expansion for calls to functions defined in separate files
-pCompiles and links for function profiling with gprof. 
-prof_useUse formerly collected profiling information during optimization
-gProduces a symbol tables, i.e. line numbers for profiling are available.
-openmpEnables the parallelizer to generate multithreaded code based on OpenMP directives.
-parallelTells the auto-parallelizer to generate multithreaded code for loops that can be safely executed in parallel. To use this option, you must also specify -O2 or -O3.
-opt_reportgenerate an optimization report to stderr.

Compiler Directives for the Intel compiler

The following table shows the source code directives as supported by the Intel Fortran compiler to help with tuning or debugging applications. Note that for fixed source format the "!" comment symbol in the first column needs to be replaced with a "c" comment symbol.


Directive

Meaning

!DEC$ ivdep

Ignore vector dependencies

!DEC$ loop count N

Software pipelining hint

!DEC$ distribute point

Split large loop

!DEC$ unroll

Unroll inner loop N times. Compiler heuristics used if N omitted.

!DEC$ nounroll

Do not unroll loop

!DEC$ prefetch A

Prefetch Array A

!DEC$ noprefetch A

Do not prefetch array A

!DEC$ vector [CLAUSE]

Vectorize loop,

CLAUSE = { ALWAYS [ASSERT]|ALIGNED|UNALIGNED|TEMPORAL|NONTEMPORAL [(var1 [, var2]...)] }

For further details please see Compiler Documentations.

!DEC$ novector

Do not vectorize loop.

  • No labels