Lab 5: Deploy to Hugging Face Spaces
You have built a RAG pipeline. You have evaluated it. Now it is time to ship it.
In this lab, you will deploy your RAG pipeline as a live web application on Hugging Face Spaces. Anyone with a link can upload a document, ask questions, and get answers — no setup required on their end. This is how you go from “it works on my laptop” to “here is the link.”
Hugging Face Spaces is free for CPU-based apps. Gradio gives you a web interface with about 50 lines of Python. Together, they are the fastest path from working code to a shareable demo.
Prerequisites
Section titled “Prerequisites”Before starting this lab, you need:
- Python 3.9+ installed
- A Hugging Face account (free — sign up here)
- An OpenAI API key (or any LLM API key for the generation step)
- Git installed
- Familiarity with the RAG pipeline from Labs 1 through 3
Step 1: Structure Your App
Section titled “Step 1: Structure Your App”Your deployed app needs to do four things in a single script:
- Accept a document (file upload)
- Chunk and embed the document
- Search for relevant chunks when the user asks a question
- Generate an answer using an LLM with the retrieved chunks as context
All of this goes into one file: app.py. Hugging Face Spaces runs this file automatically.
Here is the project structure:
my-rag-app/├── app.py # The entire application├── requirements.txt # Dependencies└── README.md # Hugging Face Space metadata (auto-generated)Step 2: Create requirements.txt
Section titled “Step 2: Create requirements.txt”Create a requirements.txt file with the dependencies your app needs:
gradio>=4.0.0langchain>=0.1.0langchain-community>=0.0.10langchain-openai>=0.0.5chromadb>=0.4.0sentence-transformers>=2.2.0Step 3: Build the Gradio App
Section titled “Step 3: Build the Gradio App”This is the complete app.py. Read through it — every section maps to a step in the RAG pipeline you already know:
import osimport gradio as grfrom langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain_community.vectorstores import Chromafrom langchain_community.embeddings import HuggingFaceEmbeddingsfrom langchain_openai import ChatOpenAIfrom langchain.chains import RetrievalQA
# --- Configuration ---CHUNK_SIZE = 500CHUNK_OVERLAP = 50EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"TOP_K = 3
# --- Global state ---vector_store = Noneqa_chain = None
def process_document(file, openai_key): """Ingest a document: chunk it, embed it, store it.""" global vector_store, qa_chain
if not openai_key or not openai_key.strip(): return "Please enter your OpenAI API key."
if file is None: return "Please upload a file."
# Read the uploaded file with open(file.name, "r", encoding="utf-8", errors="ignore") as f: text = f.read()
if not text.strip(): return "The uploaded file is empty."
# Step 1: Chunk the text splitter = RecursiveCharacterTextSplitter( chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP, ) chunks = splitter.split_text(text)
# Step 2: Embed and store in ChromaDB embeddings = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL) vector_store = Chroma.from_texts( texts=chunks, embedding=embeddings, )
# Step 3: Create the QA chain llm = ChatOpenAI( model_name="gpt-3.5-turbo", temperature=0.1, openai_api_key=openai_key, ) qa_chain = RetrievalQA.from_chain_type( llm=llm, chain_type="stuff", retriever=vector_store.as_retriever(search_kwargs={"k": TOP_K}), return_source_documents=True, )
return f"Document processed: {len(chunks)} chunks created and embedded."
def ask_question(question): """Retrieve relevant chunks and generate an answer.""" if qa_chain is None: return "Please upload a document first.", ""
if not question.strip(): return "Please enter a question.", ""
result = qa_chain.invoke({"query": question}) answer = result["result"]
# Format source chunks for display sources = [] for i, doc in enumerate(result["source_documents"]): preview = doc.page_content[:200] sources.append(f"**Chunk {i + 1}:**\n{preview}...")
sources_text = "\n\n---\n\n".join(sources) return answer, sources_text
# --- Gradio Interface ---with gr.Blocks( title="RAG Chatbot", theme=gr.themes.Soft(primary_hue="blue"),) as app: gr.Markdown("# RAG Chatbot") gr.Markdown( "Upload a `.txt` or `.md` file, then ask questions about it. " "Your document is chunked, embedded, and searched in real time." )
with gr.Row(): with gr.Column(scale=1): gr.Markdown("### 1. Setup") api_key_input = gr.Textbox( label="OpenAI API Key", type="password", placeholder="sk-...", ) file_input = gr.File( label="Upload Document (.txt or .md)", file_types=[".txt", ".md"], ) process_btn = gr.Button("Process Document", variant="primary") status_output = gr.Textbox(label="Status", interactive=False)
with gr.Column(scale=2): gr.Markdown("### 2. Ask Questions") question_input = gr.Textbox( label="Your Question", placeholder="What is this document about?", lines=2, ) ask_btn = gr.Button("Ask", variant="primary") answer_output = gr.Markdown(label="Answer") gr.Markdown("**Retrieved Chunks:**") sources_output = gr.Markdown()
# Wire up the buttons process_btn.click( fn=process_document, inputs=[file_input, api_key_input], outputs=[status_output], ) ask_btn.click( fn=ask_question, inputs=[question_input], outputs=[answer_output, sources_output], ) question_input.submit( fn=ask_question, inputs=[question_input], outputs=[answer_output, sources_output], )
# Launch the appapp.launch()Test it locally first:
python app.pyThis opens a Gradio interface in your browser at http://localhost:7860. Upload a text file, enter your API key, process the document, and ask a question. Make sure it works before deploying.
Step 4: Create a Hugging Face Space
Section titled “Step 4: Create a Hugging Face Space”Go to huggingface.co/new-space and fill in:
| Field | Value |
|---|---|
| Space name | my-rag-chatbot (or whatever you like) |
| License | MIT |
| SDK | Gradio |
| Visibility | Public (so you can share the link) |
Click Create Space. Hugging Face gives you a Git repository URL.
Add your API key as a Secret
Section titled “Add your API key as a Secret”Do not hardcode your API key in the code. Instead:
- Go to your Space’s Settings tab
- Scroll to Repository secrets
- Add a secret: Name =
OPENAI_API_KEY, Value = your key
Then update the process_document function to read from the environment as a fallback:
def process_document(file, openai_key): # Use the provided key, or fall back to the Space secret key = openai_key.strip() if openai_key else os.environ.get("OPENAI_API_KEY", "") if not key: return "Please enter your OpenAI API key." # ... rest of the function uses 'key' instead of 'openai_key'Step 5: Push and Deploy
Section titled “Step 5: Push and Deploy”Clone your new Space, copy your files in, and push:
# Clone the Space repositorygit clone https://huggingface.co/spaces/YOUR_USERNAME/my-rag-chatbotcd my-rag-chatbot
# Copy your app filescp /path/to/your/app.py .cp /path/to/your/requirements.txt .
# Commit and pushgit add app.py requirements.txtgit commit -m "Initial RAG chatbot deployment"git pushHugging Face automatically detects the push, installs dependencies from requirements.txt, and starts your app. The build takes 2 to 5 minutes.
Watch the build logs in the Logs tab of your Space. Common issues:
| Error | Fix |
|---|---|
ModuleNotFoundError | Add the missing package to requirements.txt |
| Build timeout | Reduce dependencies or use lighter models |
| Out of memory | The free CPU tier has 16GB RAM — all-MiniLM-L6-v2 fits easily, larger models may not |
Once the build completes, your app is live at:
https://huggingface.co/spaces/YOUR_USERNAME/my-rag-chatbotStep 6: Share Your App
Section titled “Step 6: Share Your App”Your app is now a public URL. Anyone can use it.
Here is what to do with it:
Share the link directly. Send it to friends, colleagues, or post it in the LearnRAG Discord #show-your-project channel.
Embed it in a portfolio. Hugging Face Spaces can be embedded as iframes:
<iframe src="https://YOUR_USERNAME-my-rag-chatbot.hf.space" width="100%" height="600" frameborder="0"></iframe>Add it to your GitHub README. Link to the live demo alongside your source code. Recruiters and hiring managers click live demos far more often than they clone repositories.
Iterate in public. Every git push triggers a rebuild. Add features, improve your chunking strategy, swap in a re-ranker — your live app updates automatically.
What You Built
Section titled “What You Built”In this lab you:
- Structured a complete RAG app as a single Python file ready for deployment
- Built a Gradio interface with file upload, document processing, and question answering
- Deployed to Hugging Face Spaces with zero infrastructure management
- Configured secrets so your API key is not exposed in code
- Shipped a shareable link that anyone can use
This is the full loop: learn, build, evaluate, deploy. You now have a live RAG application on the internet that you built from scratch.
Sources: