There are three key abstractions on top of which LLVM IR is built: values, registers and memory.
In LLVM IR, a
Value is a piece of data, and data is described by a type. For example, the
42 of type 32-bit integer is written
This notion is so important that we will be writing
Value with a special font to emphasize that this definition is being used.
There are two places where
Values may live: in a register or in memory.
A register is an entity that holds exactly one
Values are placed into registers through instructions (more on this later). Once a register is defined, its
Value - and also its type - never changes.
A register will have a “size” big enough to hold its
Value regardless of the
Value’s type; for example, a register may hold a single integer or even an entire array.
Registers have names, and we use their name to access the underlying
Value. Any name starting with the
% symbol is the name of a register. For example:
%0, %hi, %___ are all register names.
The exact name of a register carries no semantic meaning in the program, registers may be renamed at will.
When working with LLVM IR, we have access to infinitely many registers.
Memory is a sequence of bytes, each of which has an address. Addresses, also known as pointers, are
Values and therefore may be placed into a register.
Values are typically moved from or to memory using loads or stores.
In this characterization, memory is just a sequence of bytes. Memory does not hold information about the types of
Values that were previously stored in it; it is how we use memory addresses that give meaning (a type) to a sequence of bytes. We will come back to this when we talk about instructions.
Note the difference in the definition of registers and memory: registers have names but not addresses (registers are not memory locations). Memory does not have names, only addresses.
This is a core principle, so excuse the repetition: to access a
Value inside a register, we use the register’s name. To access a
Value in memory, we use its memory address, which may be placed into a register.
Values, registers, and memory, we’re now ready to talk about instructions.
An instruction is an operation that may have
Values as input, may define a register as output, and may modify state in a program (like writing
Values to memory). Each instruction has semantics describing the expected input, the produced output and changes it makes to the program state (“side effects”).
Here’s an example instruction:
%result = add i32 10, %two
It adds the
i32 10 and the
Value inside register
%two, and defines (creates) a new register
%result to hold the resulting
LLVM’s type system is very strict, the
add instruction requires both operands to be
Values of the same type. This is statically checked, and the IR is invalid otherwise.
Instructions can also interact with memory:
%address = alloca i32 store i32 %result, ptr %address
alloca i32 instruction allocates enough memory to contain an
Value. It returns a
Value corresponding to the address of that memory location, and that
Value is placed in the register named
%address. What is the type of this
Value? It is a pointer type:
The second instruction,
store i32, does not produce a
Value. It takes the memory address in the register
%address, an integer in the register
%result, and stores the integer into that memory location.
Recall this paragraph from our memory definition:
Memory does not hold information about the types of
Values that were previously stored in it; it is how we use memory addresses that give meaning (a type) to a sequence of bytes.
In the case of the
store i32 instruction, it interprets the input address as a memory region containing a
Value of type
i32. In other words, the store instruction gave meaning (a type) to that address.
If you’re using a version of LLVM prior to April 2022, you may see pointer types that carry a “base type” with them, like
i32*. These are being phased out, soon there will only be
In the next post, we will see how a program - functions and global variables - is structured in LLVM’s IR!