Retrieval-Augmented Generation (RAG) क्या है?

Question

Accepted Answer

**RAG (Retrieval-Augmented Generation)** एक LLM को एक **retrieval system** के साथ जोड़ता है — एक knowledge base से relevant जानकारी प्राप्त करना और इसे LLM को context के रूप में प्रदान करना ताकि accurate, grounded answers generate किए जा सकें। यह custom data पर LLM applications बनाने के लिए एक key technique है।

## What RAG does

```text
RAG → augment an LLM's generation with RETRIEVED relevant information:
  1. RETRIEVE → search a knowledge base (your documents/data) for info relevant to the query
  2. AUGMENT → add the retrieved info to the LLM's prompt as CONTEXT
  3. GENERATE → the LLM answers using the provided context (grounded in your data)
→ gives the LLM relevant, up-to-date, specific knowledge it wasn't trained on
```

## How RAG typically works

```text
→ index your data: split documents into chunks → create EMBEDDINGS → store in a VECTOR DATABASE
→ at query time: embed the query → find the most SIMILAR chunks (semantic search) →
  retrieve them
→ build a prompt: 'Using this context: [retrieved chunks], answer: [query]'
→ the LLM generates an answer grounded in the retrieved context
```

## Why RAG is valuable

```text
✓ Use your OWN/CURRENT data → answer questions about documents the LLM wasn't trained on
  (private docs, recent info, specific knowledge)
✓ Reduce HALLUCINATION → grounding answers in retrieved facts → more accurate, less made-up
✓ Up-to-date → retrieve current info (vs the model's fixed training cutoff)
✓ CITATIONS → can show sources (the retrieved chunks) → trust/verification
✓ cheaper/easier than fine-tuning for adding knowledge
→ a key pattern for building LLM apps over custom data
```

## यह क्यों महत्वपूर्ण है

RAG को समझना मूल्यवान है क्योंकि यह custom data पर **व्यावहारिक LLM applications बनाने के लिए एक key technique** है, इसलिए यह developers के लिए तेजी से महत्वपूर्ण AI ज्ञान है।

RAG — relevant जानकारी प्राप्त करने और इसे grounded generation के लिए context के रूप में प्रदान करने हेतु एक LLM को एक retrieval system के साथ जोड़ना — वास्तविक दुनिया के LLM applications के लिए एक मौलिक pattern है।

**RAG क्या करता है** को समझना — एक knowledge base से relevant जानकारी प्राप्त करना, LLM के prompt को इससे context के रूप में augment करना, और उस data में grounded answers generate करना — स्पष्ट करता है कि RAG LLMs को उस knowledge तक पहुँच कैसे देता है जिस पर वे trained नहीं थे।

**RAG आमतौर पर कैसे काम करता है** को समझना — documents को chunks में विभाजित करके, embeddings बनाकर, और उन्हें एक vector database में store करके data को index करना; फिर query time पर query को embed करना, semantic search के माध्यम से similar chunks खोजना, और LLM के लिए retrieved context के साथ एक prompt बनाना — व्यावहारिक architecture प्रदान करता है (embeddings और vector databases से जुड़ना)।

**RAG क्यों मूल्यवान है** को समझना key अंतर्दृष्टि है: यह LLMs को **आपके खुद के और current data** का उपयोग करने देता है (private documents, recent जानकारी, और specific knowledge के बारे में answer देना जिस पर LLM trained नहीं था), **hallucination को कम करता है** (accuracy के लिए retrieved facts में answers को ground करना — एक critical LLM limitation को संबोधित करना), **up-to-date जानकारी** प्रदान करता है (model के fixed training cutoff के बजाय), **citations** सक्षम करता है (trust के लिए sources दिखाना), और knowledge जोड़ने के लिए fine-tuning से सस्ता और आसान है।

ये लाभ RAG को custom data पर LLM applications बनाने के लिए go-to technique बनाते हैं (एक बहुत आम आवश्यकता)।

RAG, LLM applications के लिए सबसे महत्वपूर्ण व्यावहारिक patterns में से एक है, AI features बनाने वाले developers के लिए तेजी से आवश्यक।

चूँकि RAG custom data पर व्यावहारिक LLM applications बनाने के लिए एक key technique है (LLMs को आपके खुद के/current data में ground करना, hallucination को कम करना, citations सक्षम करना) — एक बहुत आम आवश्यकता — और इसे समझना AI features बनाने वाले developers के लिए तेजी से महत्वपूर्ण है, इसलिए RAG को समझना मूल्यवान, तेजी से आवश्यक AI ज्ञान है — custom data पर LLM applications बनाने के लिए एक मौलिक pattern (hallucination को कम करने और आपके खुद के/current knowledge का उपयोग करने हेतु retrieved जानकारी में answers को ground करना), developers के लिए तेजी से महत्वपूर्ण, और LLMs के व्यावहारिक अनुप्रयोग में एक key technique।