What Are Embeddings? (And How They Revolutionized AI Thinking)

Imagine Walking Into a Library…

You walk into a massive library with millions of books, but there’s a catch—there’s no cataloging system. No categories, no sections, no alphabetical order. Just a chaotic pile of books. Finding anything would be a nightmare, right?

Now imagine the librarian steps in and magically organizes everything based on meaning. Books about space travel are placed near astronaut biographies, and AI research papers sit next to machine learning textbooks—even if their titles are completely different. That’s what embeddings do for AI.

What Are Embeddings?

Embeddings are a way to represent words, phrases, or even entire documents as mathematical vectors. Instead of treating words as isolated entities, embeddings allow AI to understand relationships between them.

Example: Words on a Map

Imagine plotting words on a two-dimensional map. Similar words (like “king” and “queen”) would be close together, while unrelated words (like “apple” and “galaxy”) would be far apart. In reality, embeddings exist in hundreds or thousands of dimensions, but the idea is the same:

“King” → [2.1, 4.3, -1.2, ...]
“Queen” → [2.0, 4.1, -1.0, ...]
“Apple” → [-3.2, 1.5, 2.8, ...]
“Galaxy” → [5.5, -2.0, 3.1, ...]

The closer the numbers, the more similar the meaning.

Why Do Embeddings Matter?

Embeddings revolutionized AI by allowing models to:

Understand context: “Bank” (as in money) vs. “Bank” (as in river). The surrounding words shift the embedding.
Find similar concepts: AI can retrieve related documents, recommend products, and even generate human-like text based on meaning rather than just matching words.
Reduce computation: Instead of handling billions of unique words, embeddings convert them into a compact, mathematical form, making computations much faster.

OpenAI’s Embedding Models

OpenAI provides three different embedding models that can be used to create embeddings for unstructured content. These models generate embeddings with vector sizes ranging from 1536 dimensions down to small, lightweight versions optimized for efficiency. OpenAI’s embeddings power applications such as semantic search, text classification, and recommendation systems. You can learn more about OpenAI’s embedding models here.

Who Invented Embeddings?

The concept of word embeddings was first popularized by Tomas Mikolov and his team at Google with the development of Word2Vec in 2013. However, the idea of representing words as numerical vectors dates back further, with foundational work by Geoffrey Hinton on distributed representations in the 1980s. More recent approaches like GloVe (from Stanford) and fastText (from Facebook AI) have further refined embedding techniques.

Analogy: The Money of Knowledge

Earlier, I compared tokens to Lego bricks—but embeddings? They’re the currency of AI. Before embeddings, knowledge was like a barter system—you had to manually connect pieces of information. Now, embeddings allow AI to quantify meaning and trade knowledge efficiently, just like money standardized value in economies.

The Takeaway: Why Should You Care?

Embeddings are the reason AI can recommend the perfect song, find relevant search results, and generate coherent text. They take raw, unstructured data and transform it into something mathematically structured, making AI systems incredibly powerful.

Coming up next: Breaking Down 14.8 Trillion Tokens—Why the Numbers Are So Big (And What They Really Mean). Stay tuned! 🚀

Next Up: Breaking Down 14.8 Trillion Tokens