Create your own ChatGPT from scratch

Published on Mar 7, 2026 in LLM from scratch

In a previous article, I explained how a generative LLM works.

Now it’s time to create your own LLM.

Rather than offering you a step-by-step tutorial, I created UnderstandableGPT, a GPT-2 compatible implementation that is as readable as possible and extensively commented.

With the explanations in my previous article, which gives you a theoretical perspective, and this code, which gives you a practical one, you should easily understand how it works.

I then advise you to look at other implementations of GPT-2. You will gain several benefits from this.

You will see that everyone implements the same system in their own way. Because I prioritized readability, I separated query, key, and value, which isn’t the case in most other implementations. There are also many ways to apply the causal matrix. I chose the one that seemed the simplest to understand.

To be compatible with GPT-2 weights, many implementations retain the original variable names. However, these names are obscure and make the code difficult to read. I preferred to use variables with easily understandable names and a conversion script for compatibility.

If you want to be able to read AI research papers, I advise you to master these original names because they are commonly used without explanation, which makes these papers not very accessible. In reality, AI research papers are often quite simple. But it’s the terminology used that makes them seem esoteric.

Here are some popular implementations:

The original version of OpenAI uses TensorFlow. Since the current trend in generative AI, PyTorch dominates, but in the GPT-2 era, it was TensorFlow. I find it less elegant than Pytorch, but it’s not that complicated to read.

The Transformers library version by Hugging Face uses Pytorch but very much inspired by the TensorFlow version.

You can also look at the minGPT version of Karpathy, which may be more accessible. Karpathy also created nanoGPT and now, nanochat.

There are many implementations of GPT-2. Feel free to find others. As you become more familiar with the subject, you’ll notice that many implementations claiming to be GPT-2 compatible are not. But that’s okay. There are always interesting things to learn from this code.

Once you are familiar with the different implementations of GPT-2, you will be able to easily create your own. This is definitely an exercise I recommend, because you learn best by doing.

Don’t miss my upcoming posts — hit the follow button on my LinkedIn profile