Home

A Git Introduction With No Commands

2020-05-03

Git is an ubiquitous tool in software engineering, but it is difficult to find an introduction to the mental model behind Git. Git users often repeat a set of memorized commands that work for some situations, resorting to colleagues when the cheat sheet fails.

Having understood Git’s conceptual model, I am able to use the tool more effectively than before, employing it in ways that I didn’t know were possible: by framing a problem as an operation on the commit history graph, I can find the Git command I need to solve it.

In this post, we’ll go over Git’s conceptual model without mentioning a single command-line operation. Once the model is clear, then we look at daily actions performed by developers and map those actions to Git concepts.

Saving Snapshots of the Project

A version control system is a program that keeps track of the state of a repository as it evolves through time. It allows us to go back and forth between states, to record new states, and to inspect the history of the repository.

In Git, saving a new state of the repository consists of:

  1. Making changes to files or adding new ones.
  2. Specifying which changes should be recorded by adding those changes to the staging area.
  3. Performing a commit operation.

Staging

The staging area consists of a set of changes that will be included by the next commit operation. It partitions the repository into three categories of files:

During development, we are editing files, staging changes, and finally doing a commit operation:

Nothing stops us from editing a file, staging it and editing the file again; this effectively creates a new kind of file that has both staged and unstaged changes. It’s up to us to decide what we want the next commit to include: if it should include the new changes, then we have to stage them too.

Git also allows us to stage some of the changes in a file, in fact the mental model is that we stage changes done to a file, not the file itself. Staging only a subset of the changes done to a file is helpful in case they don’t all logically belong on the same commit.

Commit Definition

So what is a commit operation?

It is the act of taking a snapshot of the entire repository is taken and storing it into an internal data structure. A commit operation creates a commit object, which consists of:

  1. A pointer to that snapshot.
  2. The author’s name and email.
  3. A commit message.
  4. A pointer to the commit that came directly before this commit.
  5. Some other metadata.

By pointer we mean a hash of the object; it is common to refer to a commit by its hash.

Note: if any of the items above is changed, the commit hash will change too!

Unfortunately, the verb “commit” and the noun “commit” are the spelled the same way in English; when we use it as a verb, we mean the act of performing a commit operation, whereas the noun refers to the commit object (or its hash).

How a Sequence of Commits Form a Graph

Because a commit stores a reference to the preceding commit, in other other words, because a commit has a parent, the repository can be represented as a directed acyclic graph: nodes are commits and a directed edge (commit2, commit1) indicates that commit commit1 is a parent of commit2.

For ease of representation, I’m using names for the commits in the pictures, but commit1 and commit2 actually represent the hash of the respective commits.

Branches: a Name and a Pointer

The concept of a branch is what allows us to navigate through important states of a repository. A branch in Git is a pair (name, pointer to a commit).

In this example, we have two branches named feature1 and master, both pointing to commit commit1, and a branch named feature2 pointing to commit2.

Note: there is nothing special about the branch named master. When you create a repository from scratch, you need a name for the starting branch – master is the default and few repositories bother renaming it.

You Are Where Your HEAD Is.

Since we’re jumping around the history of the repository all the time, how do we know which snapshot we’re looking at? This information is tracked by a special pointer called HEAD. Most of the time, HEAD points to a branch:

In this example, we are looking at the repository as defined by branch feature2, which points to commit2.

When we add a new commit, we advance the branch pointed by the HEAD: