Sample Research Paper · LearnRAG

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela

Facebook AI Research · University College London · New York University

Abstract

Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures.

Additionally, providing provenance for their decisions and updating their world knowledge remain open research problems. Pre-trained models with a differentiable access mechanism to explicit non-parametric memory can overcome this issue, but have so far been only investigated for extractive downstream tasks.

We explore a general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) — models which combine pre-trained parametric and non-parametric memory for language generation. We introduce RAG models where the parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever.

We compare two RAG formulations, one which conditions on the same retrieved passages across the whole generated sequence, the other can use different passages per token. We find that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.

1. Introduction

Knowledge-intensive NLP tasks, such as question answering, fact verification, and dialogue, require models to access large amounts of world knowledge. Traditional approaches relied on external knowledge bases or information retrieval systems. Recent large language models like GPT-2 and BART have shown the ability to store significant factual knowledge in their parameters.

However, parametric-only models have several limitations:

They cannot easily update their knowledge without retraining
They provide no mechanism for citing or verifying their sources
They hallucinate facts, producing fluent but incorrect statements
Their knowledge is bounded by the training data cutoff

RAG addresses these limitations by combining the generative capabilities of pre-trained language models with the precision of neural information retrieval. At query time, the model retrieves relevant passages from an external corpus and conditions its generation on both the input and the retrieved documents.

2. Architecture

The RAG architecture consists of two main components:

Retriever: A bi-encoder model that maps both questions and documents to dense vector representations. We use DPR (Dense Passage Retrieval) which uses two BERT models — one for encoding questions and one for encoding documents.

Generator: A pre-trained seq2seq transformer (BART-large) that generates output tokens conditioned on the input and retrieved documents. The retriever and generator are jointly fine-tuned, allowing the model to learn what to retrieve and how to use retrieved information.

3. Experimental Results

We evaluate RAG on four knowledge-intensive benchmarks:

Natural Questions (open-domain QA): 44.5 EM
TriviaQA (open-domain QA): 56.8 EM
WebQuestions (open-domain QA): 45.2 EM
FEVER (fact verification): 72.1 accuracy

RAG outperforms all existing approaches on Natural Questions and achieves competitive results on the other benchmarks. Notably, RAG generates more diverse and factual responses compared to purely parametric baselines. The key contribution is showing that retrieval-augmented generation can be applied as a general-purpose architecture.