Home

LLVM’s IR Structure and Global Symbols

2022-03-20

(This is an early draft)

Modules

One LLVM IR file (.ll) represents an LLVM IR Module, a top-level entity encapsulating all other data structures in the IR. There are four such data structures:

  1. A structure describing the target architecture and platform.
  2. Global Symbols:
    1. Global Variables
    2. Functions
  3. Metadata: debug information, optimization hints, etc.

We will focus on global symbols (variables and functions).

Global Symbols

Global symbols are top-level Values visible to the entire Module. Their names always start with the @ symbol, for example: @x, @__foo and @main.

Unlike registers, the name of a global symbol may have semantic meaning in the program; in other words, global symbols have linkage. For example, a global symbol may have external linkage, which means its name is visible to other Modules. For such a symbol, it would be illegal to rename it: doing so could invalidate code in other Modules.

Global symbols define memory regions allocated at compilation time. For this reason, the Value of a global symbol has a pointer type.

For example, if we declare a global variable of type i32 called x, the type of the Value @x is ptr. To access the underlying integer, we must first load from that address.

There are two kinds of global symbols: global variables and functions.

Global Variables

As a global symbol, global variables have a name and linkage. Additionally, they require a type and a constant initial Value:

@gv1 = external global float 1.0

In this example, we have a global symbol that:

External linkage is the default and can be omitted:

@gv1 = global float 1.0

From here on, we will be omitting linkage for all global symbols.

Recall that, because all global symbols define a memory region, the Value @gv1 has a pointer type. As such, to read or write the Value in that memory location we use loads and stores:

%1 = load float, ptr @gv1
store float 2.0, ptr @gv1

There is one other important variation of global variables, we may replace global with the constant keyword:

@gv1 = constant float 1.0

This means that stores to this memory region are illegal and the optimizer can assume they do not exist.

Functions

A function declaration in LLVM IR has the following syntax:

declare i64 @foo(i64, ptr)

A function definition is very similar to the declaration, but we use a different keyword (define), provide names to the parameters and include the body of the function:

define i64 @foo(i64 %val, ptr %myptr) {
  %temp = load i64, ptr %myptr
  %mul = mul i64 %val, %temp
  ret %mul
}

This function loads an i64 Value from %ptr, multiplies it with %val and returns the result (ret instruction).

What is the type of @foo? Like all global symbols, it defines a memory region and therefore its type is a pointer type (ptr).

Further Reading

It is a useful exercise to read the LLVM documentation on some of the topics discussed:

Up Next

Now that we understand the core concepts in LLVM, discussed global symbols and explored some basic instructions, we are ready to dig into the biggest piece of the puzzle: function bodies.