2022-03-20
There are three key abstractions on top of which LLVM IR is built: values, registers and memory.
In LLVM IR, a Value
is a piece of data, and data
is described by a type. For example, the Value
42
of type 32-bit integer is written
i32 42
.
This notion is so important that we will be writing
Value
with a special font to emphasize that this definition
is being used.
There are two places where Value
s may live: in a
register or in memory.
A register is an entity that holds exactly one Value
;
Value
s are placed into registers through instructions (more
on this later). Once a register is defined, its Value
- and
also its type - never changes.
A register will have a “size” big enough to hold its
Value
regardless of the Value
’s type; for
example, a register may hold a single integer or even an entire
array.
Registers have names, and we use their name to
access the underlying Value
. Any name starting with the
%
symbol is the name of a register. For example:
%0, %hi, %___
are all register names.
The exact name of a register carries no semantic meaning in the program, registers may be renamed at will.
When working with LLVM IR, we have access to infinitely many registers.
Memory is a sequence of bytes, each of which has an address.
Addresses, also known as pointers, are Value
s and therefore
may be placed into a register.
Value
s are typically moved from or to memory using loads
or stores.
In this characterization, memory is just a sequence of
bytes. Memory does not hold information about the types of
Value
s that were previously stored in it; it is how we use
memory addresses that give meaning (a type) to a sequence of bytes. We
will come back to this when we talk about instructions.
Note the difference in the definition of registers and memory: registers have names but not addresses (registers are not memory locations). Memory does not have names, only addresses.
This is a core principle, so excuse the repetition: to access a
Value
inside a register, we use the register’s
name. To access a Value
in memory, we use its
memory address, which may be placed into a register.
Having defined Values
, registers, and memory, we’re now
ready to talk about instructions.
An instruction is an operation that may have Value
s as
input, may define a register as output, and may modify state in a
program (like writing Value
s to memory). Each instruction
has semantics describing the expected input, the produced output and
changes it makes to the program state (“side effects”).
Here’s an example instruction:
%result = add i32 10, %two
It adds the Value
i32 10
and the
Value
inside register %two
, and defines
(creates) a new register %result
to hold the resulting
Value
.
LLVM’s type system is very strict, the add
instruction
requires both operands to be Value
s of the same type. This
is statically checked, and the IR is invalid otherwise.
Instructions can also interact with memory:
%address = alloca i32
store i32 %result, ptr %address
The alloca i32
instruction allocates enough memory to
contain an i32
Value
. It returns a
Value
corresponding to the address of that memory location,
and that Value
is placed in the register named
%address
. What is the type of this Value
? It
is a pointer type: ptr
.
The second instruction, store i32
, does not produce a
Value
. It takes the memory address in the register
%address
, an integer in the register %result
,
and stores the integer into that memory location.
Recall this paragraph from our memory definition:
Memory does not hold information about the types of
Value
s that were previously stored in it; it is how we use memory addresses that give meaning (a type) to a sequence of bytes.
In the case of the store i32
instruction, it interprets
the input address as a memory region containing a Value
of
type i32
. In other words, the store instruction gave
meaning (a type) to that address.
If you’re using a version of LLVM prior to April 2022, you may see
pointer types that carry a “base type” with them, like
i32*
. These are being phased out, soon there will only be
ptr
.
In the next post, we will see how a program - functions and global variables - is structured in LLVM’s IR!