What is Retrieval-Augmented Generation (RAG)? | Oximy Glossary

What is Retrieval-Augmented Generation?

Retrieval-Augmented Generation (RAG) is an AI framework that combines the generative capabilities of large language models with external knowledge retrieval. Instead of relying solely on trained parameters, RAG systems fetch relevant information from a knowledge base to inform and ground their responses.

How RAG Works

Query Processing: User query is received
Retrieval: Relevant documents are fetched from knowledge base
Augmentation: Retrieved context is added to the prompt
Generation: LLM generates response using both query and context
Response: Grounded answer returned to user

RAG Architecture

┌─────────┐     ┌─────────────┐     ┌─────────┐
│  Query  │────►│  Retriever  │────►│   LLM   │
└─────────┘     └──────┬──────┘     └────┬────┘
                       │                  │
                ┌──────▼──────┐          │
                │  Knowledge  │          │
                │    Base     │          │
                └─────────────┘          ▼
                                   ┌──────────┐
                                   │ Response │
                                   └──────────┘

Benefits

Reduces hallucinations
Enables current information access
Provides source attribution
No retraining needed for new data
Cost-effective compared to fine-tuning

RAG Components

Retriever

Dense retrieval (embeddings)
Sparse retrieval (BM25)
Hybrid approaches

Knowledge Base

Vector databases
Document stores
APIs and databases

Reranker (optional)

Improve retrieval relevance
Cross-encoder models

Best Practices

Chunk documents appropriately
Use quality embeddings
Implement relevance filtering
Consider hybrid retrieval
Monitor retrieval quality