constinit
,
constexpr
2020-03-06
In a recent discussion over Twitter, it was pointed out that optimizers failed to eliminate a function-scope static variable with no uses. This article explores why the optimizer struggles with such code patterns, how static variables are stored and initialized, and also how certain C++ keywords can help the optimizer do its job.
Disclaimer: instead of inspecting x86 assembly for compiler output, we will look at LLVM’s Intermediate Representation (IR). It is almost always simpler to use IR:
Everything necessary about IR will be explained, but if you want to learn more, I presented a tutorial during an LLVM developer’s conference in 2019.
The original tweet used this example:
#include <string>
int static_version() {
static const std::string unused_variable{"static"};
return 42;
}
int local_version() {
const std::string unused_variable{"local"};
return 42;
}
In the static case, unused_variable
doesn’t get
optimized away, but, in the local variable case, it does.
std::string
has a complicated constructor, so let’s
rewrite this code using the simplest class possible:
struct State {
() {}
State};
int get_value(){
static State optimize_me;
return 42;
}
It’s reasonable to expect that State optimize_me
will be
optimized away: it is a function-scope static variable with no uses.
Unfortunately, both GCC and Clang fail to do so.
To understand what’s going here, let’s look at the IR produced by Clang without optimizations1.
First, a type struct.State
is defined:
%struct.State = type { i8 }
Our C++ struct has no data members, and yet its equivalent in IR
contains an 8-bit integer (i8
). What is
sizeof(State)
? Having seen the IR, the answer is easy to
guess: 1 byte 2.
Then, a global variable of that type is defined and initialized:
@_ZZ10get_value2vE11optimize_me = internal global %struct.State zeroinitializer
Note that this variable is initialized with the
zeroinitializer
keyword. That means its memory region will
be set to zero before the program starts. But we
haven’t initialized our C++ variable at all! We’ll talk more about this
later.
There is another global variable in our module, with a very similar name:
@_ZGVZ10get_value2vE11optimize_me = internal global i8 0
Note again how this variable is zero initialized, this time by
writing i8 0
(this is equivalent to
zeroinitializer
).
This is very mysterious: an 8-bit integer that we never wrote in the
original C++ code. Let’s look at the body of our get_value
function to find out more:
define i32 @_Z10get_valuev() {
%1 = load i8, i8* @_ZGVZ10get_value2vE11optimize_me
%2 = icmp eq i8 %1, 0
br i1 %2, label %3, label %4
The first line loads the mysterious variable and the second line compares it to 0. If the value is zero, the code branches to this block:
call void @_ZN5StateC2Ev(%struct.State* @_ZZ10get_value2vE11optimize_me)
store i8 1, i8* @_ZGVZ10get_value2vE11optimize_me
br label %4
A function call with our static variable as its first argument – this
is a call to State
’s constructor! This is followed by an
update to the value of the mysterious variable with a
store i8 1
.
Afterwards, or if the original comparison to zero failed, code
execution proceeds to return 42
:
4:
ret i32 42
}
A visual representation can be found in the control flow graph for this function:
This example illustrates the code generated in order to initialize function-scope static variables. The compiler must guarantee that the constructor is called exactly once: during the first time execution passes through the static variable declaration. This is accomplished with the pattern:
Is the optimizer able to remove all of the code in that function? Take a look:
@_ZGVZ9get_valuevE11optimize_me = internal unnamed_addr global i1 false
define i32 @_Z9get_valuev() {
%1 = load i1, i1* @_ZGVZ9get_valuevE11optimize_me
br i1 %1, label %3, label %2
2:
store i1 true, i1* @_ZGVZ9get_valuevE11optimize_me
br label %3
3:
ret i32 42
}
Note that the optimizer:
i8
to i1
).Point #4 is a hard problem because the counter will make the first
call to get_value
take a different code path from
subsequent calls. Furthermore, the two paths have distinct behaviors:
one writes to a global variable, the other doesn’t. To delete the
counter:
get_value
.get_value
isn’t meaningful to the program.… And now we’re stuck in a loop! It’s not an unsolvable problem, but it illustrates challenges the optimizer can’t overcome right now3.
This situation gets worse if the constructor call isn’t as simple as
an empty function. Our motivating example, std::string
,
definitely doesn’t have a simple constructor.
We can help the compiler by expressing our intent more appropriately. But first, we need to understand how static variables are initialized.
Static variables have what is known as static storage:
The storage for the object is allocated when the program begins and deallocated when the program ends. Only one instance of the object exists. [storage duration]
In practice, the storage for static variables is in the data segment of the program, in other words, the storage is available when the program is loaded. Moreover, the initial contents of that memory region are also specified in the data segment and available when the program is loaded; but what are those contents and can we influence them?
Let’s look at what cppreference tells us (emphasis mine):
Variables declared at block scope with the specifier static […] are initialized the first time control passes through their declaration (unless their initialization is zero - or constant - initialization, which can be performed before the block is first entered) . [static local variables]
Zero-initialization is what it sounds like: when the program is loaded, that region of memory gets zero initialized. If other initialization is necessary, like running constructors or evaluating constructor arguments, it will happen at runtime. Not very exciting.
Constant-initialization, when possible, happens instead of zero-initialization. As with anything C++, the details are difficult to pin down, but it comes down to having a constant expression initializing the static variable. Cppreference uses the following notation to explain this idea:
static T object = constexpr;
The interesting observation is that constant-initialization removes the need for runtime initialization.
Let’s make our State
class more complicated, disable all
optimizations, but have a constexpr
constructor:
struct State {
constexpr State(char c1, char c2, char c3)
: value1{c1}, value2{c2}, value3{c3} {}
char value1;
char value2;
char value3;
};
int get_value() {
static State optimize_me(1, 2, 3);
return 42;
}
constexpr
functions denote functions that can
be evaluated at compile time if the function is called with compile time
constant arguments.4
Because the static variable is initialized with a constant expression, the IR for this function is simpler:
@_ZZ9get_valuevE11optimize_me = internal global %struct.State { i8 1, i8 2, i8 3 }
define i32 @_Z10get_valuev() {
ret i32 42
}
To emphasize, this happens with no optimizations, this is a built-in mechanism of the language, not a compiler transformation. See for yourself!
What happened here? Constant initialization took place, because we
have a constant expression initializing the optimize_me
variable.
In the non-constexpr
version, the variable in IR
corresponding to the C++ static variable was initialized by
zeroinitializer
, and inside the get_value
function we had a constructor call wrapped in boilerplate, ensuring the
variable was initialized once. In other words, zero
initialization + runtime initialization took place.
In the constexpr
version, all the boilerplate is gone
because constant initialization happened instead of
zero-initialization + runtime initialization. This is the core
idea here: enabling constant initialization causes simpler code to be
generated. It’s not an optimization, it’s different code generation.
The generated assembly contains the already-initialized variable in the data segment of the program:
.data
get_value()::optimize_me:
.byte 1
.byte 2
.byte 3
.size get_value()::optimize_me, 3
With optimizations enabled, the static variable will be completely removed.
C++20 adds a new keyword constinit
to ensure a variable
must be constant initialized, otherwise the program is ill-formed. For
example, the following code does not compile:
struct State {
(char c1, char c2, char c3) : value1{c1}, value2{c2}, value3{c3} {}
Statechar value1;
char value2;
char value3;
};
int get_value(){
constinit static State optimize_me(1,2,3);
return 42;
}
Note the absence of a constexpr
constructor.
This is desirable because it prevents inefficient code from
compiling. If we make State
’s constructor
constexpr
, the program is now legal and uses efficient
constant initialization. Godbolt
link
constexpr
All The ThingsThe original example dealt with a std::string
static
variable, whose constructor may perform dynamic memory allocation, which
is not allowed in constexpr
contexts. This is lifted in
C++20 and most methods of std::string
are made
constexpr
thanks to Louis
Dionne’s paper.
Edit (2020-03-21): As Jason Turner pointed out on Twitter,
constexpr
dynamic allocation, while allowed in C++ 20,
still needs to be freed in the same constexpr
context that
allocated it. This implies that big constexpr
strings are
not going to be allowed.
Understanding how static
variable initialization works
enables us to understand its associated costs and how to avoid them
through constant initialization. Furthermore, by expressing our intent
to the compiler, it’s possible to ensure a compilation error when code
changes trigger inefficient initialization; this is accomplished by
marking the static variable as constinit
.
Using LLVM IR to inspect code makes it simpler to explore architecture-agnostic missed optimizations. In the case explored here, there is no reason why a static variable should be optimized away when targeting x86, but not when targeting ARM, for instance.
We’re compiling the code without support for thread-safe static initialization to keep things simple. However, most of our conclusions still hold if we enable thread safe statics.↩︎
If you’re curious why, the creator of C++ answers it in his website.↩︎
Other challenges are possible. For example, if this
function gets inlined elsewhere, we will have multiple functions
accessing the same global variable and the compiler will struggle
reasoning about this. Note also that we don’t have to consider other
translation units because static variables have internal linkage, that
is, they can only be accessed from the translation unit in which it is
defined; this is represented by the internal
keyword in
IR.↩︎
Note the emphasis on “can”: it doesn’t mean it
will be. An intuitive rule here is that the compiler will do it if
compile-time evaluation of the function is required to perform semantic
analysis of the program, or if the standard requires it, as is the case
for constant initialization. There is a stronger keyword -
consteval
- that, when applied to a function, makes it is a
compile-time error to not evaluate calls at compile time.↩︎