Meet GPT, The Decoder-Only Transformer | by Muhammad Ardi | Jan, 2025

Large Language Models (LLMs), such as ChatGPT, Gemini, Claude, etc., have been around for a while now, and I believe all of us already used at least one of them. As this article is written, ChatGPT already implements the fourth generation of the GPT-based model, named GPT-4. But do you know what GPT actually is, and what the underlying neural network architecture looks like? In this article we are going to talk about GPT models, especially GPT-1, GPT-2 and GPT-3. I will also demonstrate how to code them from scratch with PyTorch so that you can get better understanding about the structure of these models.

A Brief History of GPT

Before we get into GPT, we need to understand the original Transformer architecture in advance. Generally speaking, a Transformer consists of two main components: the Encoder and the Decoder. The former is responsible for understanding input sequence, whereas the latter is used for generating another sequence based on the input. For example, in a question answering task, the decoder will produce an answer to the input sequence, while in a machine translation task it is used for generating the translation of the input.

Source link

[aisg_get_postavatar size=64]