Skip to content

What Are Embeddings? A Plain-English Guide for RAG

Chapter 3 of 8
Builder~18 min

An embedding is just a list of numbers that captures the meaning of a sentence. Similar sentences have similar numbers.

After this chapter, you’ll be able to: Understand what embeddings are, why they work, and use them to find semantically similar text.


In the last chapter, you split documents into chunks. Now you need a way to search those chunks — not by keywords, but by meaning.

If someone asks “How do I get a refund?”, your system needs to find the chunk about “return policy and money-back guarantees” even though the words are completely different. Keyword search would fail. You need semantic search.

To search by meaning, you need a way to represent meaning as something a computer can work with. That’s what embeddings do.


An embedding is a list of numbers — typically 384 to 1,536 decimal numbers — that represents the meaning of a piece of text. [src: alammar_word2vec]

"How do I get a refund?" → [0.12, -0.05, 0.89, 0.23, -0.41, ... +379 more]

That’s it. A vector is just an ordered list of numbers. An embedding is a vector that captures meaning.

Imagine mapping every sentence in English onto a giant map:

  • “How do I get a refund?” and “What’s your return policy?” are plotted close together — they mean similar things
  • “How do I get a refund?” and “Best pasta recipes” are far apart — unrelated topics
  • “King” and “Queen” are near each other; “King” and “Potato” are far apart

An embedding model creates this map automatically. It takes any text and assigns it coordinates. Similar meaning = nearby coordinates. Different meaning = far apart.

PLAIN ENGLISH
An embedding turns text into a list of numbers that captures its meaning. Similar text gets similar numbers, so a computer can measure how related two pieces of text are.

The catch: instead of a 2D map, embeddings work in 384 or 768 dimensions. You can’t visualise 768 dimensions (nobody can), but the math works exactly the same as on a 2D map.


Once you have two vectors, you need to measure how close they are. The standard measure is cosine similarity.

Cosine similarity measures the angle between two vectors:

  • 1.0 — identical direction → same meaning
  • 0.0 — perpendicular → unrelated
  • -1.0 — opposite direction → opposite meaning

Why the angle and not the distance? Because we care about direction (meaning), not magnitude (length). A short sentence and a long paragraph about the same topic should be similar, even though their vectors have different magnitudes.

Loading diagram...
Two sentences with similar meaning produce vectors with high cosine similarity.

Each number in the vector captures one aspect of meaning. No single number maps to a concept you’d recognise (like “is this about food?”). Instead, the meaning is distributed across all dimensions working together.

Think of it like describing a colour. You could say a colour is rgb(255, 165, 0). No single number tells you it’s orange — but all three numbers together do. Embeddings work the same way, just with 768 numbers instead of 3.


Click any point on the scatter plot to see its 3 most similar sentences. Or type your own sentence to see where it lands.

The plot below shows 20 sentences projected from high-dimensional space onto 2D. Notice how related sentences cluster together naturally.

Try It: Find the Nearest Neighbour

Click any point to see its 3 nearest neighbours. Or type your own sentence below.

ML/AIGeographyCookingPhysicsRAG/SearchProgrammingNature/WeatherYour sentence

Not all embedding models are equal. A bigger model captures more nuance but costs more to run. Here are the common choices:

ModelDimensionsQualitySpeedCost
all-MiniLM-L6-v2384GoodVery FastFree (runs in browser)
text-embedding-3-small1536Very GoodFast$0.02 / 1M tokens
text-embedding-3-large3072ExcellentMedium$0.13 / 1M tokens

For learning and prototyping, all-MiniLM-L6-v2 is perfect — it runs entirely in your browser using Transformers.js, with zero API costs. That’s what the Playground uses. [src: transformersjs_docs]

When you embed your chunks with model A, you must search with model A. Different models create different vector spaces — they’re like different maps of the same city. A location from Google Maps doesn’t work on an Apple Maps grid. [src: chromadb_docs]

This is one of the most common RAG bugs, and the Explorer above demonstrates it — every point was embedded with the same model.

WATCH OUT
Mixing embedding models is the number one silent RAG bug. Your similarity scores will look normal but return irrelevant results. Always use the same model for indexing and querying.

“Embeddings understand language the way humans do.”

No — embedding models are trained to be useful, not to understand. They learn that certain words appear in similar contexts and group them together. The model doesn’t “know” that a refund and a return are related — it learned that documents about refunds and documents about returns tend to contain similar surrounding words. The result is practically useful similarity, but the mechanism is statistical, not semantic.

“Higher dimensions always means better.”

Not necessarily. A 384-dimension model can outperform a 3072-dimension model on specific tasks if it was trained on more relevant data. Always check the MTEB leaderboard for your specific use case rather than picking by dimension count alone.

“You need an internet connection or API to use embeddings.”

This is false, and it’s why this course’s Playground works entirely offline. Models like all-MiniLM-L6-v2 run entirely in your browser via Transformers.js — no server, no API key, no internet required after the first load.

“All embedding models can be swapped interchangeably.”

The most dangerous misconception in RAG. Once you’ve embedded your document chunks with model A, every query must also be embedded with model A. Switching models mid-way requires re-embedding your entire document collection. Plan your model choice before you start ingesting at scale.


Before you pick an embedding model, know how quality is measured. The MTEB (Massive Text Embedding Benchmark) is the standard benchmark. It tests models across 56 datasets and 8 task types.

For RAG specifically, the most important task types are:

  • Retrieval — given a query, find the most relevant passages. This is your primary use case.
  • Semantic Textual Similarity (STS) — how well does the model judge if two sentences mean the same thing?

A model with a high retrieval score on MTEB will almost certainly work well for your RAG pipeline. Models like BAAI/bge-large-en-v1.5 and text-embedding-3-small consistently rank near the top for retrieval quality.


Here’s where the RAG pipeline stands after this chapter:

  1. Ingest — Load your documents (Chapter 2)
  2. Chunk — Split into manageable pieces (Chapter 2)
  3. Embed — Convert each chunk into a vector (This chapter)
  4. Store — Save vectors in a database (Chapter 4)
  5. Retrieve — Find relevant chunks by similarity (Chapter 5)
  6. Generate — Answer using retrieved context (Chapter 6)

You’ve now completed steps 1–3. Your text has been transformed from raw words into searchable meaning-vectors. Next, you need somewhere to store them and search them efficiently.


In this chapter, you learned to:

  1. Convert any text into a vector (a list of numbers that captures meaning)
  2. Measure similarity between vectors using cosine similarity
  3. Use an embedding model to find semantically related text
  4. Understand why matching your embedding model matters

Next up: you have vectors, but where do you store millions of them and search them fast? That’s what vector databases do.


Q1

What does cosine similarity measure?

Q2

Why must you use the same embedding model for chunks and queries?


Was this chapter helpful?


Sources: