Comparison of Compiler Options (intel vs. pgi vs. gcc)

The following sections discuss the most commonly used compiler switches and extensions implemented by the Intel, PGI, and g95 Compilers. We give an overview of the available optimization switches. If you experience any difficulties, you might have to progressively switch off some of them again.


Option
Intel
Option
PGI
Option 
gcc, gfortran
MeaningComments

Optimization

-O[0-3]-O[0-3]-O[0-3]Specifies the code optimization level for applications.Here O0 specifies no optimization, whereas O3 specifies the highest optimization level (see Compiler Documentations for further details).
-fastt.b.d.-OfastMaximizes speed across the entire program.Sets the following options -ipo, -O3, -no-prec-div, -static, and -xHost .
-xHostt.b.d.-march=cpu-typeTells the compiler to generate instructions for the highest instruction set available on the compilation host processor.

If Host is used: the highest instructions set of the compilation host is used.

Host maybe replaced by AVX-512, AVX2, AVX, SSE4.1, etc.
Starting with Version 19: Host may be SKYLAKE-AVX512. SKYLAKE, etc.

cpu.type may be replaced by haswell, skylake, etc.


For GCC, -march=skylake-512 -mprefer-vector-width=512 -ftree-vectorizer-verbose=5 -fopt-info-vec-missed can be used.

Intel compilers have
-opt-streaming-stores [always|never|auto]
or use the source code directive
!DEC$ VECTOR NONTEMPORAL
instead.
-Mnontemporal
Some programs may slow down with -fastsse due to prefetches used. Adding -Mnontemporal offers a different data movement scheme which may improve performance.Worth a try during code tuning. May especially be useful for memory-bound code, since this supports cache bypass for streaming writes.

Code Transformations, Aliasing and Interprocedural Optimization

-fno-alias (-fno-fnalias)n/an/aAssume no aliasing (within functions)This may give a considerable performance increase. Beware: Check your code yourself for pointer aliasing!
-unroll[<number>]-Munroll[=n:<number>]-funroll-loops, -funroll-all-loopsUnroll loops<number> (optional) gives the maximum number of times for unrolling. 0 disables unrolling, omitting it enables compiler heuristics for unrolling. Note that for the Intel compiler you can instead use a source code directive
!DEC$ UNROLL(<number>)
       do i=1,imax
         ... 
in your code, which might be more useful.
-ip-Minline[=option[,option,...]]-finline-functionsEnables interprocedural optimizations for single file compilationperforms inline function expansion for calls to functions defined within the current source file. For Intel compilers, you can disable full/partial inlining enabled by this option by also specifying -ip_no_inlining/-ip_no_pinlining. For the PGI compiler, please check out man page and user's guide for more information on inlining.
-ipo-Minline and -Mextract with suboptions

-flto

(-fwhole-program)

Enables multifile interprocedural (IP) optimizations (between files).Performs inline function expansion for calls to functions defined in separate files. For the Intel compiler, a set of source files must be specified as an argument. For the PGI compiler, an inline library must be explicitly created.

Linkage Options

-c-c-ccompile only, do not link This follows conventional usage.
-Ldir-Ldir-Ldirlook for libraries in dir as wellThis follows conventional usage.
-lmylib-lmylib-lmyliblink with library libmylib.{a|so}This follows conventional usage.
[no-]heap-arraysn/an/a
Allocate automatic arrays on heap (Fortran; default is to allocate on stack, which may lead to trouble for low stack limits)
-auto


Direct all local variables to be automatic (Fortran)
n/a-g77libsn/aadd GNU Fortran librariesNeeded if g77-built objects are to be linked correctly. The Intel Compiler does not support this.

Source format and Preprocessing

-FI or -fixed [-72|-80|-132]-Mfixed
fixed format source code [with possibly extended width]source file extension .f (Intel: also .ftn .for) automatically assumes fixed form
-FR or -free-Mfree
free format source codesource file extension .f90 automatically assumes free form
-fpp [-P]-F
Invoke preprocessor (C-style includes)Intel Compiler: optional -P switch puts preprocessing results in output_file instead of compiling it.
Open64 Compiler: -o switch required for preprocessing to output_file.
PGI Compiler: source file must have extension .F, output is put into matching file with extension .f.
-Dname[=value]define preprocessor macrothis follows conventional usage.
-Idirlook for include files in dir as well.This follows conventional usage.

Options for Data and I/O

-i{2|4|8}INTEGER and LOGICAL types of unspecified KIND use the indicated amount of bytesDefault value is 4; -i2 not available for Open64
-r{4|8|16}-Mr8-r{4|8}REAL types of unspecified KIND use the indicated amount of bytesDefault value is 4. A value of 8 would change all REAL variables to DOUBLE PRECISION. For the PGI Compilers only promotion from 4 to 8 byte REAL is available.
Controlled via environment run time option. See Section on Big Endian I/O in the Troubleshooting document-Mbyteswapio
-byteswapio
(probably not available)Do unformatted I/O in big endian instead of little endianPGI Compiler: should enable you to read and write data compatible to Sun and SGI platforms.

Diagnostics, Runtime Checking and Debugging

-g-g-gInclude symbols for debuggingUse DDT, totalview, gdb, or idb to debug, or pgdbg for PGI-compiled binaries
-traceback

Generate tracebackTells the compiler to generate extra information in the object file to provide source file traceback information when a severe error occurs at run time.
-check all

This option applies to Fortran Compilers only. T

  •  (except module globals)
-C(g77 had -ffortran-bounds-check)run time checkingFull checking may incur a large performance penalty.

Intel Fortran Compiler:  The argument "all" switches on all available checks. It can be replaced by:

  • arg_temp_created: check for copy-in/copy-out for procedure arguments.
  • bounds: performs run-time checks on array subscripts and substring references
  • format, output_conversion: performs run-time checks on formatted I/O
  • pointers: performs run-time checks on pointers and allocatables
  • uninit: run-time checks on uninitialized variables (except module globals)
-opt-report -opt-report-level[min|max]
n/agenerate optimization reportThe Intel compiler writes the report to stderr
-list-Mlistn/aprovide source listingThe Intel compiler writes the source listing to STDOUT, while the PGI compiler produces a file myprog.lst from myprog.f

Parallelization and Vectorization

-openmp-mp
generate multithreaded code from OpenMP directives in the source codeIf used, this option must also be specified for linkage.
-openmp-stubsn/a
Compile OpenMP programs for serial mode; directives are ignored and a stub library for the function calls is linked.If used, this option must also be specified for linkage.
-openmp-report[0|1|2]n/a
Diagnostic level for OpenMP parallelization
-parallel-Mconcur
[=option[,option]]

perform (shared-memory) auto-parallelizationIf used, this option must also be specified for linkage. Please refer to the PGI User's Guide, Section 3.1.2 for information on the -Mconcur suboptions.
-par-report[0|1|2]n/a
Diagnostic level for automatic parallelization
-par-threshold{n}n/a
set threshold for autoparallelization of loops-par_threshold0 : always parallelize
-par_threshold25 : parallelize if chance of perf. increase is 25%
-par_threshold75 : parallelize if chance of perf. increase is 75% (default)
-par_threshold100 : onlyparallelize if absolutely sure.

For the PGI compiler, the -Mconcur suboptions (q. v.) allow for a finer control of autoparallelization

-vect.b.d.
Enables or disables vectorization.
-simdt.b.d.
Enables or disables the SIMD vectorization feature of the compiler.
-vec-report[0-5]t.b.d.
Controls the diagnostic information reported by the vectorizer.Here 0 specifies to report no diagnostic information, for the other levels please consult the Compiler Documentations.
-vec-threshold[n]t.b.d.
Sets a threshold for the vectorization of loops.-par_threshold0 : always vectorize
-par_threshold75 : vectorize if chance of perf. increase is 50%
-par_threshold100 : only vectorize if absolutely sure (default).

Compiler Directives for the Intel compiler

The following table shows the source code directives as supported by the Intel Fortran compiler to help with tuning or debugging applications. Note that for fixed source format the "!" comment symbol in the first column needs to be replaced with a "c" comment symbol.

DirectiveMeaning

!DEC$ ivdep

Ignore vector dependencies

!DEC$ loop count N

Software pipelining hint

!DEC$ distribute point

Split large loop

!DEC$ unroll

Unroll inner loop N times. Compiler heuristics used if N omitted.

!DEC$ nounroll

Do not unroll loop

!DEC$ prefetch A

Prefetch Array A

!DEC$ noprefetch A

Do not prefetch array A

!DEC$ vector [CLAUSE]

Vectorize loop,

CLAUSE = { ALWAYS [ASSERT]|ALIGNED|UNALIGNED|TEMPORAL|NONTEMPORAL [(var1 [, var2]...)] }

For further details please see Compiler Documentations.

!DEC$ novector

Do not vectorize loop.