2022-03-20
(This is an early draft)
One LLVM IR file (.ll
) represents an LLVM IR Module, a top-level entity encapsulating all other data structures in the IR. There are four such data structures:
We will focus on global symbols (variables and functions).
Global symbols are top-level Value
s visible to the entire Module. Their names always start with the @
symbol, for example: @x
, @__foo
and @main
.
Unlike registers, the name of a global symbol may have semantic meaning in the program; in other words, global symbols have linkage. For example, a global symbol may have external
linkage, which means its name is visible to other Modules. For such a symbol, it would be illegal to rename it: doing so could invalidate code in other Modules.
Global symbols define memory regions allocated at compilation time. For this reason, the Value
of a global symbol has a pointer type.
For example, if we declare a global variable of type i32
called x
, the type of the Value
@x
is ptr
. To access the underlying integer, we must first load from that address.
There are two kinds of global symbols: global variables and functions.
As a global symbol, global variables have a name and linkage. Additionally, they require a type and a constant initial Value
:
@gv1 = external global float 1.0
In this example, we have a global symbol that:
gv1
.float
Value
.Value
float 1.0
.External linkage is the default and can be omitted:
@gv1 = global float 1.0
From here on, we will be omitting linkage for all global symbols.
Recall that, because all global symbols define a memory region, the Value
@gv1
has a pointer type. As such, to read or write the Value
in that memory location we use loads and stores:
%1 = load float, ptr @gv1
store float 2.0, ptr @gv1
There is one other important variation of global variables, we may replace global
with the constant
keyword:
@gv1 = constant float 1.0
This means that stores to this memory region are illegal and the optimizer can assume they do not exist.
A function declaration in LLVM IR has the following syntax:
declare i64 @foo(i64, ptr)
declare
,i64
),foo
),i64
, ptr
).A function definition is very similar to the declaration, but we use a different keyword (define
), provide names to the parameters and include the body of the function:
define i64 @foo(i64 %val, ptr %myptr) {
%temp = load i64, ptr %myptr
%mul = mul i64 %val, %temp
ret %mul
}
This function loads an i64
Value
from %ptr
, multiplies it with %val
and returns the result (ret
instruction).
What is the type of @foo
? Like all global symbols, it defines a memory region and therefore its type is a pointer type (ptr
).
It is a useful exercise to read the LLVM documentation on some of the topics discussed:
Now that we understand the core concepts in LLVM, discussed global symbols and explored some basic instructions, we are ready to dig into the biggest piece of the puzzle: function bodies.