GCC can apply many techniques to make the executable program that it generates faster and/or smaller. These techniques all tend to reduce still further the "word-for-word" correspondence between the C program you write and the machine code that the computer reads. As a result, they can make debugging more difficult, and are usually applied only after a program has been tested and debugged without optimization .
There are two kinds of optimization options. You can apply individual optimization techniques by means of options beginning with -f (for flag), such as -fmerge-constants, which causes the compiler to place identical constants in a common location, even across different source files. You can also use the -O options (-O0, -O1, -O2, and -O3) to set an optimization level that cumulatively enables a number of techniques at once.
18.6.1. The -O Levels
Each of the -O options represents a number of individual optimization techniques. The -O optimization levels are cumulative: -O2 includes all the optimizations in -O1, and -O3 includes -O2. For complete and detailed descriptions of the different levels, and the many -f optimization options that they represent, see the GCC reference manual. The following list offers a brief description of each level:
The following example illustrates how -O options are used:
$ gcc -Wall -O3 -o circle circle.c circulararea.c -lm
This command uses -O3 to enable the majority of the supported optimization techniques.
18.6.2. The -f Flags
GCC's many -f options give you even finer control over optimization. For example, you can set a general optimization level using an -O option, and then turn off a certain technique. An example:
$ gcc -Wall -O3 -fno-inline-functions -o circle circle.c circulararea.c -lm
The options -O3 -fno-inline-functions in this command enable all the optimizations grouped in -O3 except inline compiling of functions.
There are also flags to enable many optimizations that are not included in any -O level, such as -funroll-loops; this option replaces loop statements that have a known, small number of iterations with repetitive, linear code sequences, thus saving jumps and loop-counter operations. A full list of the hundred or so -f options that control GCC's individual optimization flags would be too long for this chapter, but the examples in this section offer a hint of the capabilities available. If you need a certain compiler feature, there's a good chance you'll find it in the manual.
18.6.3. Floating-Point Optimization
Some of the optimization options that are not included in the -O groups pertain to floating-point operations. The C99 floating-point environment supports scientific and mathematical applications with a high degree of numeric accuracy, but for a given application, you might be more interested in speed than in the best floating-point math available. For such cases, the -ffast-math option defines the preprocessor macro _ _FAST_MATH_ _, indicating that the compiler makes no claim to conform to IEEE and ISO floating-point math standards. The -ffast-math flag is a group option, which enables the following six individual options:
18.6.4. Architecture-Specific Optimization
For certain system architectures, GCC provides options to produce optimized code for specific members of the processor family, taking into account features such as memory alignment, model-specific CPU instructions, stack structures, increased floating-point precision, prefetching and pipelining, and others. These machine-specific options begin with the prefix -m. If you want to compile your code to make the most of a specific target system, read about the available options in the GCC reference manual.
For several processor types, such as the Sparc, ARM, and RS/6000-PowerPC series, the option -mcpu=cpu generates machine code for the specific CPU type's register set, instruction set, and scheduling behavior. Programs compiled with this option may not run at all on a different model in the same CPU family. The GCC manual lists the available cpu abbreviations for each series.
The option -mtune=cpu is more tolerant. Code generated with -mtune=cpu uses optimized scheduling parameters for the given CPU model, but adheres to the family's common instructions and registers, so that it should still run on a related model.
For the Intel x86 series, the -mcpu=cpu option is the same as -mtune=cpu. The option to enable a model-specific instruction set is -march=cpu. An example:
$ gcc -Wall -O -march=athlon-4 -o circle circle.c circulararea.c -lm
This command line compiles a program for the AMD Athlon XP CPU.
18.6.5. Why Not Optimize?
Sometimes there are good reasons not to optimize . In general, compiling with optimization takes longer and requires more memory than without optimization. How much more depends on what techniques are applied. Furthermore, the performance gains obtained by a given optimization technique depend on both the given program and the target architecture. If you really need optimum performance, you need to choose the techniques that will work in your specific circumstances.
You can combine both -O and -f optimization options with GCC's -g option to include debugging information in the compiled program, but if you do, the results may be hard to follow in a debugging program; optimization can change the order of operations, and variables defined in the program may not remain associated with one register, or may even be optimized out of existence. For these reasons, many developers find it easier to optimize only after a program has been debugged.
Some optimization options may also conflict with strict conformance to the ISO C standard, such as merging variables declared with const as if they were constants. If standards-conformance is critical, and sometimes it is, there are certain optimizations you may not wish to pursue.
Another issue you may encounter is that some optimization techniques result in nondeterministic code generation. For example, the compiler may use randomness in guessing which branch of a conditional jump will be taken most often. If you are programming real-time applications, you'll probably want to be careful to ensure deterministic behavior.
In any case, if you want to be sure of getting the greatest possible runtime performance, or if you need to know in detail how GCC is arriving at the exact machine code for your C program, you will need to study the detailed optimization options in the GCC manual.