While function inlining (function expansion) increases execution time by removing the runtime overhead of function calls, inlining can, in many cases, increase code size, code complexity, and compile times.
In the Intel compiler, inline function expansion typically favors relatively small user functions over functions that are relatively large; however, the compiler can inline functions only if the conditions in the three main components match the conditions listed below:
Call-site:
The call-site is the site of the call to the function that might be inlined.
For each call site, the following conditions must exist:
The number of actual arguments must match the number of formal arguments of the callee.
The number of return values must match the number of return values of the callee.
The data types of the actual and formal arguments must be compatible.
Caller and callee must be written in the same source language. No multilingual inlining is permitted.
Caller:
The caller is the function that contains the call-site.
The Caller must meet the following conditions:
Is approximately the right size; at most, 2000 intermediate statements will be inlined into the caller from all the call-sites being inlined into the caller.
Is static; the function must be called if it is declared as static; otherwise, it will be deleted.
Callee:
The callee is the function being called that might be inlined.
The Callee, or target function, must meet the following conditions:
Does not have variable argument list.
Is not considered unsafe for other reasons.
Is not considered infrequent due to the name. For example, routines which contain the following substrings in their names are not inlined: abort, alloca, denied, err, exit, fail, fatal, fault, halt, init, interrupt, invalid, quit, rare, stop, timeout, trace, trap, and warn.
If the minimum criteria identified are met, the compiler picks the routines whose inline expansions will provide the greatest benefit to program performance. This is done using the default heuristics. The inlining heuristics used by the compiler differ based on whether or not you use Profile-Guided Optimizations (PGO): -prof-use (Linux*) or /Qprof-use (Windows*).
When you use PGO with -ip or -ipo (Linux) or /Qip or /Qipo (Windows), the compiler uses the following heuristics:
The default heuristic focuses on the most frequently executed call sites, based on the profile information gathered for the program.
The default inline heuristic will stop inlining when direct recursion is detected.
The default heuristic always inlines very small functions that meet the minimum inline criteria.
By default, the compiler does not inline functions with more than 230 intermediate statements. You can change this value by specifying the following:
Platform |
Command |
---|---|
Linux |
-Qoption,c,-ip_ninl_max_stats=n |
Windows |
/Qoption,c,-ip_ninl_max_stats=n |
where n is a new value.
There is a higher limit for functions declared by the user as inline or __inline.
See Using Qoption Specifiers and Profile-Guided Optimization Overview.
When you do not use PGO with -ip or -ipo (Linux) or /Qip or /Qipo (Windows), the compiler uses less aggressive inlining heuristics: it inlines a function if the inline expansion does not increase the size of the final program.
The compiler offers some alternatives to using the Qoption specifiers; see Developer Directed Inline Expansion of User Functions for a summary.
If you do not use profile-guided optimizations with -ip or -ipo, the compiler uses less aggressive inlining heuristics:
Inline a function if the inline expansion will not increase the size of the final program.
Inline a function if it is declared with the inline or __inline keywords.
The compiler uses characteristics of the source code to estimate which function calls are executed most frequently. It applies these estimates to the PGO-based heuristics described previously. The estimation of frequency, based on static characteristics of the source, is not always accurate. Hence, use of /Qip and /Qipo without PGO can result in longer compilation times and even slower application execution.