Home

LLVM’s IR Structure and Global Symbols

2022-03-20

(This is an early draft)

Modules

One LLVM IR file (.ll) represents an LLVM IR Module, a top-level entity encapsulating all other data structures in the IR. There are four such data structures:

  1. A structure describing the target architecture and platform.
  2. Global Symbols:
    1. Global Variables
    2. Functions
  3. Metadata: debug information, optimization hints, etc.

We will focus on global symbols (variables and functions).

Global Symbols

Global symbols are top-level Values visible to the entire Module. Their names always start with the @ symbol, for example: @x, @__foo and @main.

Unlike registers, the name of a global symbol may have semantic meaning in the program; in other words, global symbols have linkage. For example, a global symbol may have external linkage, which means its name is visible to other Modules. For such a symbol, it would be illegal to rename it: doing so could invalidate code in other Modules.

Global symbols define memory regions allocated at compilation time. For this reason, the Value of a global symbol has a pointer type.

For example, if we declare a global variable of type i32 called x, the type of the Value @x is ptr. To access the underlying integer, we must first load from that address.

There are two kinds of global symbols: global variables and functions.

Global Variables

As a global symbol, global variables have a name and linkage. Additionally, they require a type and a constant initial Value:

@gv1 = external global float 1.0

In this example, we have a global symbol that:

External linkage is the default and can be omitted:

@gv1 = global float 1.0

From here on, we will be omitting linkage for all global symbols.

Recall that, because all global symbols define a memory region, the Value @gv1 has a pointer type. As such, to read or write the Value in that memory location we use loads and stores:

%1 = load float, ptr @gv1
store float 2.0, ptr @gv1

There is one other important variation of global variables, we may replace global with the constant keyword:

@gv1 = constant float 1.0

This means that stores to this memory region are illegal and the optimizer can assume they do not exist.

Global Variables: examples from C++ to LLVM IR

Let’s compile some C++ global declarations and look at the corresponding IR global variable:

int just_int;
// @just_int = dso_local global i32 0, align 4

The keyword dso_local is used to indicate, roughly, that this variable is not going to be “patched in” at runtime, like in the case of dynamic libraries. This information is useful for the optimizer.

Note that, while we didn’t explicitly initialize the C++ variable, it is zero-initialized in IR. Zero initialization is required by C++ in this case, so we see it captured in the C++ to IR translation.

Finally, there is alignment information: the address of this variable is guaranteed to be a multiple of 4.

extern int extern_int;
// @extern_int = external global i32, align 4

If we make our variable extern, a few things change:

Let’s look at more examples:

const int const_int = 1;
// @_ZL9const_int = internal constant i32 1

static int static_int = 2;
// @_ZL10static_int = internal global i32 2

static const int static_const_int = 3;
// @_ZL16static_const_int = internal constant i32 3

Compare these static variables to what happens with a class static variable:

class MyClass {
public:
    static int static_class_member;
    // @_ZN7MyClass19static_class_memberE = external global i32, align 4

    static const int static_const_class_member;
    // @_ZN7MyClass25static_const_class_memberE = external constant i32, align 4
};

You can see these in action in Godbolt.

Functions

A function declaration in LLVM IR has the following syntax:

declare i64 @foo(i64, ptr)

A function definition is very similar to the declaration, but we use a different keyword (define), provide names to the parameters and include the body of the function:

define i64 @foo(i64 %val, ptr %myptr) {
  %temp = load i64, ptr %myptr
  %mul = mul i64 %val, %temp
  ret %mul
}

This function loads an i64 Value from %ptr, multiplies it with %val and returns the result (ret instruction).

What is the type of @foo? Like all global symbols, it defines a memory region and therefore its type is a pointer type (ptr).

Further Reading

It is a useful exercise to read the LLVM documentation on some of the topics discussed:

Up Next

Now that we understand the core concepts in LLVM, discussed global symbols and explored some basic instructions, we are ready to dig into the biggest piece of the puzzle: function bodies.