I Built My Own Version of Git

For years, I used Git the way most developers do—confident enough to get by, but not confident enough to explain why it works.

I could commit, push, and rebase my way through projects, but Git always felt a little… magical.

So I decided to remove the magic.

I built a simplified version of Git in Python.

The Goal

I didn’t want to rebuild Git completely. That would take months (or years). Instead, I focused on the core idea:

Can I build a system that tracks file history using hashes?

That led to a small CLI tool with just a handful of commands:

tig init
tig add file.txt
tig commit -m "message"
tig log
tig checkout <hash>

Simple on the surface—but surprisingly deep once you start building it.

The Big Idea: Content-Addressable Storage

The first breakthrough came when I stopped thinking in terms of files and started thinking in terms of content.

In a normal filesystem, you store something like:

file.txt

But in Git (and in my version), you store:

<hash> → file contents

Here’s the core function that made everything possible:

def hash_object(data: bytes) -> str:
    return hashlib.sha1(data).hexdigest()

That’s it.

Every file, every commit, every snapshot—everything is identified by its hash.

This leads to a powerful property:

If two files have the same content, they have the same hash.

Which means:

No duplication
Built-in integrity checking
Deterministic state

Storing Files as “Blobs”

When I implemented add, I realized something interesting: Git doesn’t care about filenames at first—it cares about content.

Here’s the simplified version of what happens:

def add(file_path):
    with open(file_path, "rb") as f:
        content = f.read()

    data = b"blob\n" + content
    blob_hash = write_object(data)

A couple subtle but important details here:

I prefix the content with "blob\n"
The hash is calculated on the entire structure

That means the same content stored as a different type (like a commit) will produce a different hash.

This is how Git avoids collisions between object types.

The Hidden Layer: The Index (Staging Area)

One of the most misunderstood parts of Git is the staging area.

When I built my own version, it finally clicked.

I stored it as a simple JSON file:

{
  "file.txt": "a1b2c3..."
}

Every time you run:

tig add file.txt

You’re not committing—you’re updating this index.

That separation is crucial.

It means:

You can stage multiple files
You control exactly what goes into a commit
Commits become predictable snapshots

Commits Are Just Objects

Before this project, I thought commits were something special.

They’re not.

They’re just structured data.

Here’s what my commit object looks like:

{
  "tree": "abc123",
  "parent": "def456",
  "message": "first commit",
  "timestamp": 1710000000
}

And here’s the key insight:

A commit doesn’t store files—it points to a tree, which points to blobs.

That indirection is what makes Git so powerful.

Rebuilding History

The log command ended up being one of my favorites to implement.

It’s just a loop:

while current:
    commit = read_object(current)
    print(commit["message"])
    current = commit["parent"]

That’s it.

Git history is just a linked list of commits.

No database. No complex indexing.

Just pointers.

The Moment It Clicked

The most satisfying moment came when I implemented checkout.

I took a commit hash, walked to its tree, loaded each blob, and rewrote the files on disk.

And suddenly:

I could travel through time.

Not metaphorically—literally.

I could restore my project to any previous state using nothing but hashes.

That’s when Git stopped feeling like a tool and started feeling like a system.

What I Didn’t Build (and Why It Matters)

My version is intentionally simple. It doesn’t include:

Branching
Merging
Diffs
Remote repositories

And that’s important.

Because it highlights something:

The core of Git is surprisingly small.

Everything else—branches, merges, rebases—is built on top of:

content-addressable storage
immutable objects
commit chains

What I Learned

Building this changed how I think about version control.

1. Git is a database, not just a tool

It’s storing objects, not files.

2. Hashing is the foundation

Everything depends on deterministic hashing.

3. Simplicity scales

The core model is simple—but incredibly powerful.

Final Thoughts

I didn’t build a production-ready replacement for Git.

But I built something more valuable:

Understanding.

And now when I run:

git commit

I know exactly what’s happening under the hood.

If You Want to Try This

I highly recommend building your own version.

Start small:

Store files as hashes
Build a commit object
Walk the history

You don’t need thousands of lines of code.

You just need the right mental model.

What’s Next

If I keep going, I want to explore:

Branching (just pointers!)
Merging (this gets complicated fast)
A better CLI experience

But even if I stopped here, this project was worth it.

Because now Git isn’t magic anymore.

It’s just really elegant engineering.

To check out the code you can see the repository on my github https://github.com/plattnotpratt/tig-repo-clone