Building LLMs from Scratch: Lecture 6: Creating Input-Target Pairs for Large Language Models | by Naveen Pandey | Jul, 2025


Hello everyone, in this article we will be covering about creating input-target pairs for training large language models (LLMs). In the previous lecture, we looked at tokenization using Byte Pair Encoding. Today, we’ll learn how to create the input-output pairs that are essential for training LLMs.
Before we dive into the code, let’s understand what input-target pairs mean for LLMs. Unlike traditional machine learning tasks where input and output are clearly defined (like classifying cats vs dogs), LLMs use a specific technique to create these pairs.
Let’s say we have this sentence: “LLMs learn to predict one word at a time”
Here’s how we create input-target pairs:
Iteration Input Target 1 LLMs learn 2 LLMs learn to 3 LLMs learn to predict 4 LLMs learn to predict one
The key points to remember:
- The input is the text the LLM sees (marked in blue in our example)
- The target is the next word the LLM should predict (marked in red)
- In each iteration, the previous target becomes part of the…
