Polaris ML/AI Training
🔍

RAG & Retrieval

Retrieval-Augmented Generation, vector databases, embeddings, and chunking strategies for grounding LLMs in external knowledge.

27 concepts7 questions4 projects

Overview

Retrieval-Augmented Generation (RAG) is a technique that enhances large language models by grounding their responses in external knowledge sources. Instead of relying solely on what the model learned during training, RAG retrieves relevant documents at inference time and includes them as context.

The RAG pipeline typically involves three stages: indexing (chunking documents and creating vector embeddings), retrieval (finding the most relevant chunks for a given query using similarity search), and generation (feeding the retrieved context to an LLM to produce a grounded answer).

Key concepts include chunking strategies, embedding models, vector databases (Pinecone, ChromaDB, Weaviate), hybrid search (combining dense and sparse retrieval), re-ranking, and evaluation metrics like faithfulness and answer relevance. RAG is now a foundational pattern for building knowledge-grounded AI applications.

ML Concepts

Deep-Dive Concepts (from Projects)