What Are Tokens? (And Why Do They Matter?)

Imagine You’re Ordering a Pizza…

Picture this: You call your favorite pizza place and order one large margherita pizza. Easy, right? But instead of taking your order as a single request, the restaurant breaks it down into smaller parts:

  1. “One”
  2. “Large”
  3. “Margherita”
  4. “Pizza”

Now imagine a restaurant that takes things even further:

  1. “O”
  2. “ne”
  3. “La”
  4. “rge”
  5. “Mar”
  6. “ghe”
  7. “rita”
  8. “Piz”
  9. “za”

This might sound ridiculous for pizza, but in AI, breaking down text into smaller pieces—called tokens—is essential for how models process language.

Tokens: The Building Blocks of AI Language Models

AI doesn’t read text the way humans do. Instead, it slices and dices words into tokens, which can be whole words, subwords, or even individual characters.

Example: Tokenization in Action

Consider the sentence:
“AI is awesome!”

Depending on the tokenizer, it might be split like this:

  • Whole Word Tokenization: ["AI", "is", "awesome!"]
  • Subword Tokenization: ["AI", "is", "awe", "some", "!"]
  • Character Tokenization: ["A", "I", " ", "i", "s", " ", "a", "w", "e", "s", "o", "m", "e", "!"]

Each proprietary AI model decides for itself which tokenization technique to use for its corpus. The choice of method depends on the type of material being processed and the model’s goals—some approaches work better for code, others for conversational text, and some for scientific literature.

Why Does This Matter?

AI models are trained on trillions of tokens, not words. More tokens mean higher processing costs, more memory usage, and longer training times. When you hear about models processing “14.8 trillion tokens,” remember: it’s like counting every letter in every book instead of counting just the words.

The Takeaway: Think of Tokens Like Lego Bricks

Just like Lego bricks combine to build something bigger, tokens are the fundamental pieces of AI language understanding. The way text is broken down affects how well AI can understand, generate, and process language.

Want to go deeper? In the next blog, we’ll cover how embeddings take these tokens and turn them into mathematical gold. Stay tuned! 🚀


Next Up: NVIDIA and DeepSeek and the future of LLM’s