In a recent discussion over Twitter, it was pointed out that optimizers failed to eliminate a function-scope static variable with no uses. This article explores why the optimizer struggles with such code patterns, how static variables are stored and initialized, and also how certain C++ keywords can help the optimizer do its job.
Disclaimer: Godbolt links will be used but, instead of inspecting x86 assembly, the target will be LLVM's Intermediate Representation (IR). It is almost always simpler to use IR instead of assembly:
- Most operations and types are spelled out explicitly.
- The data section is easier to visualize.
- The compiler transformations we're interested in are, more often than not, architecture-independent and happen before the compiler generates assembly. Therefore we can keep a higher level of abstraction (IR) that is easier to reason about.
- We get to see what the compiler does to obey the C++ standard much earlier: all the rules must be captured in the translation from C++ to IR and, from there on, all optimizations are game - the standard doesn't exist anymore.
Everything necessary about IR will be explained, but if you want to learn more, I presented a tutorial during the EuroLLVM developers conference in 2019.
The offending code.
The original tweet used this example:
The author mentions how, in the static case,
unused_variable doesn't get
optimized away but, in the local variable case, the optimizer does a much
std::string has a complicated constructor, I'll rewrite this code
using the simplest class possible:
It's reasonable to expect that
State optimize_me will be optimized away: it
is a function-scope static variable with no uses. Unfortunately, both GCC and
Clang fail to do so.
First, a type
struct.State is defined:
Our C++ struct has no data members, and yet its equivalent in IR contains an
8-bit integer (
i8). What is
sizeof(State)? Having seen the IR, the answer is
easy to guess: 1 byte 2.
Then, a global variable of that type is defined and initialized:
Note that this variable is initialized with the
zeroinitializer keyword. That
means its memory region will be set to zero before the program starts.
But... We haven't initialized our C++ variable at all! We'll talk more about
There is another global variable in our module, with a very similar name:
Note again how this variable is zero initialized, this time by writing
(this is equivalent to
This is very mysterious: an 8-bit integer that we never wrote in the original
C++ code. Let's look at the body of our
get_value function to find out more:
The first line loads the mysterious variable and the second line compares it to 0. If the value is zero, the code branches to this block:
A function call with our static variable as its first argument -- this is a
State's constructor! Right after that, we update the value of the
mysterious variable by storing
1 to it.
Afterwards, or if the original comparison to zero failed, code execution
proceeds to return
A visual representation can be found in the control flow graph for this function:
This example illustrates the code generated in order to initialize function-scope static variables. The compiler must guarantee that the constructor is called exactly once, during the first time execution passes through the static variable declaration. This is accomplished with the pattern:
- Global counter initialized to 0
- If counter is zero:
- Call constructor.
- Set counter to 1.
Is the optimizer able to remove all of the code in that function? Take a look:
Note that the optimizer:
- Deleted the static variable.
- Deleted the call to its constructor.
- Transformed the global counter into a boolean (from
- Failed to optimize away this global boolean.
Point #4 is a hard problem because the counter will make the first call to
get_value take a different code path from subsequent calls. Furthermore,
the two paths have distinct behaviors: one writes to a global variable, the
other doesn't. To delete the counter:
- The optimizer needs to prove that the change in the counter's value isn't meaningful to the program.
- But it is meaningful because it affects control flow inside
- So the optimizer needs to prove that control flow inside
get_valueisn't meaningful to the program.
- But it is meaningful because control flow affects the counter's value.
... And now we're stuck in a loop! It's not an unsolvable problem, but it illustrates challenges the optimizer can't overcome right now3.
This situation gets worse if the constructor call isn't as simple as an empty
function. Our motivating example,
std::string, definitely doesn't have a
##Can we do better ?
We can. We can help the compiler by expressing our intent more appropriately. But first, we need to understand how static variables are initialized.
Static variables have what is known as static storage. From cppreference:
The storage for the object is allocated when the program begins and deallocated when the program ends. Only one instance of the object exists.
In practice, we see the storage for static variables in the data segment of the program, in other words, the storage is available when the program is loaded. Moreover, the initial contents of that memory region are also specified in the data segment and available when the program is loaded; but what exactly are those contents and can we influence them?
Zero or Constant initialization
Let's look at what cppreference tells us (emphasis mine):
Variables declared at block scope with the specifier static[...] are initialized the first time control passes through their declaration( unless their initialization is zero - or constant - initialization, which can be performed before the block is first entered) . [static local variables]
Intuitively, zero-initialization is what it sounds like: when the program is loaded, that region of memory gets zero initialized. Typically, if other initialization is necessary, like running constructors or evaluating constructor arguments, it will happen at runtime. Not very exciting.
Constant-initialization, when possible, happens instead of zero-initialization. The details are complicated, but it essentially boils down to whether you have a constant expression initializing the static variable. Cppreference uses the following notation to explain this idea:
static T object = constexpr;
The best part is that constant-initialization will typically remove the need for runtime initialization. Let's look at what this looks like in IR.
Constant initialization to the rescue!
Let's make our example slightly more complicated, disable all
optimizations, but have a
constexpr functions are a mechanism through which programmers express their
desire to have the function evaluated at compile time if the function is called
with compile time constant arguments.4
Because our static variable is now initialized with a constant expression, the IR for this function now becomes much simpler:
To emphasize, this happens with no optimizations, this is a built in mechanism of the language, not a compiler transformation. See for yourself!
What happened here? Constant initialization took place, because we have a
constant expression (the constructor is
constexpr and it is called with
constant arguments) initializing the
In the non-
constexpr version, the IR global variable corresponding to the C++
static variable was initialized by
zeroinitializer, and inside the
get_value function we had a constructor call wrapped by some boilerplate to
ensure the variable was initialized exactly once. In other words, zero
initialization + runtime initialization took place.
constexpr version, all the boilerplate is gone because constant
initialization happened instead of zero-initialization + runtime
initialization. This is the core idea of this post: if you enable constant
initialization, unnecessary code disappears.
The generated assembly contains the already-initialized variable in the data segment of the program:
With optimizations enabled, the static variable will be completely removed.
Don't let slow code compile.
C++20 adds a new keyword
constinit to ensure a variable only has constant
initialization, otherwise the program is ill-formed. For example, the following
code does not compile (note the absence of a
This is desirable because it prevents inefficient code from compiling. If we
constexpr, the program is now legal and uses
efficient constant initialization. Godbolt link
But we can't constexpr all the things
The original example dealt with a
std::string static variable, which may
perform dynamic memory allocation - which is not allowed in
contexts. This is lifted in C++20 and most methods of
std::string are made
constexpr thanks to [Louis Dionne's paper]. No compilers implement this at the
time of writing, but you can check GCC's progress and Clang's progress on
Edit (2020-03-21): As Jason Turner pointed out on Twitter,
allocation, while allowed in C++ 20, still needs to be freed in the same
constexpr context that allocated it. This implies that big
are not going to be allowed.
Without entering the discussion of when/if static variables should be used,
it's important to be aware of the price that is paid for their correct
initialization. In most cases, the programmer can completely avoid this price
by using constant initialization (usually in the form of
Furthermore, by expressing their intent properly to the compiler, it's possible
to ensure a compilation error when code changes trigger inefficient
initialization; this is accomplished by marking the static variable as
constinit. A lot of new features in the C++ language are driven by the desire
to allow programmers to communicate intent to the compiler (and to other
I also hope to have shown that using the LLVM IR makes it simpler to explore architecture-agnostic missed optimizations. In the case explored here, there is no reason why a static variable should be optimized away when targeting x86, but not when targeting ARM, for instance.
- We're compiling the code without support for thread-safe static initialization to keep things simple. However, most of our conclusion still hold if we enable thread safe statics.↩
- If you're curious why, the creator of C++ answers it in his website.↩
- Other challenges are possible. For example, if
this function gets inlined elsewhere, we will have multiple functions accessing
the same global variable and the compiler will struggle reasoning about this.
Note also that we don't have to consider other translation units because static
variables have internal linkage, that is, they can only be accessed from the
translation unit in which it is defined; this is represented by the
internalkeyword in IR.↩
- There is a stronger form of this in the form of
constevalkeyword. When applied to a function, it is a compile-time error if the function is not evaluated at compile time. It is a useful mechanism to ensure that an expensive function is never evaluated during program execution.↩