Skip to content

Lab 2: LlamaIndex Comparison

HANDS-ON LAB~25 minutesBeginner-friendlyBuilder badge

In Lab 1 you built a RAG pipeline with LangChain. Now you build the exact same thing with LlamaIndex. Same input, same output, different framework.

Why bother? Because these are the two most popular RAG frameworks, and you will encounter both in the wild. Understanding how they think differently about the same problem makes you a better engineer — not just a better copy-paster.


LangChain and LlamaIndex solve the same problem, but they come at it from different angles.

LangChain thinks in chains. You pick components (a loader, a splitter, an embedder, a retriever, an LLM) and wire them together step by step. You control every connection. This gives you maximum flexibility, but you write more glue code.

LlamaIndex thinks in indexes. You point it at your data, and it builds a searchable index. Querying that index is one function call. LlamaIndex handles the chunking, embedding, and retrieval internally. You can customize each step, but the defaults work well out of the box.

The short version: LangChain is a toolkit. LlamaIndex is an engine. Both get you to the same destination.


pip install llama-index llama-index-embeddings-huggingface llama-index-vector-stores-chroma chromadb
  • llama-index — The core framework.
  • llama-index-embeddings-huggingface — Lets you use free, local embedding models.
  • llama-index-vector-stores-chroma — Connects LlamaIndex to ChromaDB.
  • chromadb — Same vector database from Lab 1, so you can compare apples to apples.

Let us walk through every step of the pipeline and see both frameworks do the same thing.

LangChain:

from langchain_community.document_loaders import TextLoader
loader = TextLoader("notes.txt")
documents = loader.load()

LlamaIndex:

from llama_index.core import SimpleDirectoryReader
# Reads all files in a directory, or specify a single file
documents = SimpleDirectoryReader(
input_files=["notes.txt"]
).load_data()

Difference: LangChain has specific loaders for each file type (TextLoader, PyPDFLoader, etc.). LlamaIndex’s SimpleDirectoryReader auto-detects file types. Drop a folder of mixed PDFs, text files, and markdown, and it handles them all.


LangChain:

from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=50
)
chunks = splitter.split_documents(documents)

LlamaIndex:

from llama_index.core.node_parser import SentenceSplitter
splitter = SentenceSplitter(
chunk_size=500,
chunk_overlap=50
)
nodes = splitter.get_nodes_from_documents(documents)

Difference: LangChain calls them “chunks” or “documents.” LlamaIndex calls them “nodes.” A node is a chunk with extra features — it knows about its parent document and its relationship to other nodes. This matters when you build more advanced pipelines later.

In practice, both produce the same result here: your text split into overlapping pieces.


LangChain:

from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embedding_model,
persist_directory="./langchain_store"
)

LlamaIndex:

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext, VectorStoreIndex
import chromadb
# Set up ChromaDB
chroma_client = chromadb.PersistentClient(path="./llamaindex_store")
chroma_collection = chroma_client.get_or_create_collection("my_docs")
# Connect LlamaIndex to ChromaDB
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Set up the embedding model
embed_model = HuggingFaceEmbedding(model_name="all-MiniLM-L6-v2")
# Build the index (embeds and stores automatically)
index = VectorStoreIndex(
nodes,
storage_context=storage_context,
embed_model=embed_model
)

Difference: This is where the two frameworks diverge the most. LangChain gives you a vector store object directly. LlamaIndex wraps it in an “index” — a higher-level abstraction that manages storage, embedding, and retrieval together.

LlamaIndex’s setup is more verbose here, but the index object you get back is more powerful. It handles caching, persistence, and query optimization internally.


LangChain:

results = vectorstore.similarity_search("What is the main topic?", k=3)
for doc in results:
print(doc.page_content)

LlamaIndex:

query_engine = index.as_query_engine(similarity_top_k=3)
response = query_engine.query("What is the main topic?")
print(response)

Difference: This is where LlamaIndex shines. One call to query() does retrieval and generation. It finds the relevant nodes, constructs the prompt, calls the LLM, and returns a formatted answer — all in one line.

With LangChain, you need to build the RetrievalQA chain yourself (as you did in Lab 1). More control, more code.

If you just want the raw retrieved chunks without generation in LlamaIndex:

retriever = index.as_retriever(similarity_top_k=3)
nodes = retriever.retrieve("What is the main topic?")
for node in nodes:
print(f"Score: {node.score:.4f}")
print(f"Text: {node.text[:200]}")
print()

LangChain (from Lab 1):

from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain_community.llms import HuggingFacePipeline
from transformers import pipeline
pipe = pipeline("text2text-generation", model="google/flan-t5-base", max_new_tokens=256)
llm = HuggingFacePipeline(pipeline=pipe)
prompt_template = PromptTemplate(
input_variables=["context", "question"],
template="""Use the following context to answer the question.
If you don't know the answer, say "I don't have enough information."
Context: {context}
Question: {question}
Answer:"""
)
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
chain_type_kwargs={"prompt": prompt_template},
return_source_documents=True
)
response = qa_chain.invoke({"query": "What is the main topic?"})
print(response["result"])

LlamaIndex:

from llama_index.core import Settings
from llama_index.core.llms import CustomLLM, LLMMetadata, CompletionResponse
from transformers import pipeline as hf_pipeline
class LocalLLM(CustomLLM):
"""Wrapper to use a local Hugging Face model with LlamaIndex."""
pipe: object = None
class Config:
arbitrary_types_allowed = True
@property
def metadata(self) -> LLMMetadata:
return LLMMetadata(model_name="flan-t5-base")
def complete(self, prompt: str, **kwargs) -> CompletionResponse:
output = self.pipe(prompt, max_new_tokens=256)[0]["generated_text"]
return CompletionResponse(text=output)
def stream_complete(self, prompt: str, **kwargs):
raise NotImplementedError("Streaming not supported")
pipe = hf_pipeline("text2text-generation", model="google/flan-t5-base")
llm = LocalLLM(pipe=pipe)
Settings.llm = llm
Settings.embed_model = embed_model
query_engine = index.as_query_engine(similarity_top_k=3)
response = query_engine.query("What is the main topic?")
print(response)
print("\nSources:")
for node in response.source_nodes:
print(f" - {node.text[:100]}...")

Difference: LlamaIndex’s query engine handles prompt construction and source tracking automatically. LangChain makes you build the prompt template and chain yourself. Both get the same result.


  • You need fine-grained control over every step of the pipeline
  • You are building something unusual that does not fit standard RAG patterns
  • You want to mix and match components from different providers easily
  • You need agents that use RAG as one tool among many
  • You want the larger ecosystem — LangChain has more integrations
  • You want to get a working pipeline fast with minimal code
  • Your use case is document Q&A and you do not need exotic customization
  • You want built-in evaluation tools (LlamaIndex has them natively)
  • You are building multi-document systems where relationships between documents matter
  • You value sensible defaults over manual configuration
  • You are building a simple proof of concept — raw ChromaDB + a few lines of code may be all you need
  • You are an experienced ML engineer who wants no abstraction overhead

FeatureLangChainLlamaIndex
PhilosophyToolkit — assemble your own chainEngine — give it data, get answers
API styleExplicit, step-by-stepHigh-level, convention-over-configuration
ChunkingManual (you pick the splitter)Built-in node parsers with defaults
RetrievalYou build the retrieverBuilt into the query engine
GenerationYou build the chainOne-line query
FlexibilityVery high — swap any componentHigh, but opinionated defaults
Learning curveSteeper — more concepts to learnGentler — works out of the box
Abstraction levelLow to mediumMedium to high
Best forCustom pipelines, agentsDocument Q&A, fast prototyping
Community sizeLargerSlightly smaller but growing fast
Evaluation toolsVia third-party (RAGAS, etc.)Built-in evaluation module

Neither framework is “better.” They are different tools for different situations. The best engineers know both and pick the right one for the job.

The concepts underneath — chunking, embedding, vector search, prompt engineering — are the same regardless of framework. That is why this course teaches the concepts first and the frameworks second. Frameworks change. The fundamentals do not.


Next up: Lab 3: Add a Re-ranker — Take your retrieval quality to the next level by adding a cross-encoder re-ranker.