For years, I used Git the way most developers do—confident enough to get by, but not confident enough to explain why it works.
I could commit, push, and rebase my way through projects, but Git always felt a little… magical.
So I decided to remove the magic.
I built a simplified version of Git in Python.
The Goal
I didn’t want to rebuild Git completely. That would take months (or years). Instead, I focused on the core idea:
Can I build a system that tracks file history using hashes?
That led to a small CLI tool with just a handful of commands:
tig init
tig add file.txt
tig commit -m "message"
tig log
tig checkout <hash>
Simple on the surface—but surprisingly deep once you start building it.
The Big Idea: Content-Addressable Storage
The first breakthrough came when I stopped thinking in terms of files and started thinking in terms of content.
In a normal filesystem, you store something like:
file.txt
But in Git (and in my version), you store:
<hash> → file contents
Here’s the core function that made everything possible:
def hash_object(data: bytes) -> str:
return hashlib.sha1(data).hexdigest()
That’s it.
Every file, every commit, every snapshot—everything is identified by its hash.
This leads to a powerful property:
If two files have the same content, they have the same hash.
Which means:
- No duplication
- Built-in integrity checking
- Deterministic state
Storing Files as “Blobs”
When I implemented add, I realized something interesting: Git doesn’t care about filenames at first—it cares about content.
Here’s the simplified version of what happens:
def add(file_path):
with open(file_path, "rb") as f:
content = f.read()
data = b"blob\n" + content
blob_hash = write_object(data)
A couple subtle but important details here:
- I prefix the content with
"blob\n" - The hash is calculated on the entire structure
That means the same content stored as a different type (like a commit) will produce a different hash.
This is how Git avoids collisions between object types.
The Hidden Layer: The Index (Staging Area)
One of the most misunderstood parts of Git is the staging area.
When I built my own version, it finally clicked.
I stored it as a simple JSON file:
{
"file.txt": "a1b2c3..."
}
Every time you run:
tig add file.txt
You’re not committing—you’re updating this index.
That separation is crucial.
It means:
- You can stage multiple files
- You control exactly what goes into a commit
- Commits become predictable snapshots
Commits Are Just Objects
Before this project, I thought commits were something special.
They’re not.
They’re just structured data.
Here’s what my commit object looks like:
{
"tree": "abc123",
"parent": "def456",
"message": "first commit",
"timestamp": 1710000000
}
And here’s the key insight:
A commit doesn’t store files—it points to a tree, which points to blobs.
That indirection is what makes Git so powerful.
Rebuilding History
The log command ended up being one of my favorites to implement.
It’s just a loop:
while current:
commit = read_object(current)
print(commit["message"])
current = commit["parent"]
That’s it.
Git history is just a linked list of commits.
No database. No complex indexing.
Just pointers.
The Moment It Clicked
The most satisfying moment came when I implemented checkout.
I took a commit hash, walked to its tree, loaded each blob, and rewrote the files on disk.
And suddenly:
I could travel through time.
Not metaphorically—literally.
I could restore my project to any previous state using nothing but hashes.
That’s when Git stopped feeling like a tool and started feeling like a system.
What I Didn’t Build (and Why It Matters)
My version is intentionally simple. It doesn’t include:
- Branching
- Merging
- Diffs
- Remote repositories
And that’s important.
Because it highlights something:
The core of Git is surprisingly small.
Everything else—branches, merges, rebases—is built on top of:
- content-addressable storage
- immutable objects
- commit chains
What I Learned
Building this changed how I think about version control.
1. Git is a database, not just a tool
It’s storing objects, not files.
2. Hashing is the foundation
Everything depends on deterministic hashing.
3. Simplicity scales
The core model is simple—but incredibly powerful.
Final Thoughts
I didn’t build a production-ready replacement for Git.
But I built something more valuable:
Understanding.
And now when I run:
git commit
I know exactly what’s happening under the hood.
If You Want to Try This
I highly recommend building your own version.
Start small:
- Store files as hashes
- Build a commit object
- Walk the history
You don’t need thousands of lines of code.
You just need the right mental model.
What’s Next
If I keep going, I want to explore:
- Branching (just pointers!)
- Merging (this gets complicated fast)
- A better CLI experience
But even if I stopped here, this project was worth it.
Because now Git isn’t magic anymore.
It’s just really elegant engineering.
To check out the code you can see the repository on my github https://github.com/plattnotpratt/tig-repo-clone