Glossary

RAG (Retrieval-Augmented Generation)

In one sentence

RAG (Retrieval-Augmented Generation) is an AI architecture where the model fetches relevant context from an external knowledge source — documents, databases, or APIs — before generating its response, so the answer is grounded in specific data rather than relying only on what the model learned during training.

1. How RAG works in three steps

A RAG system runs three steps for every query: (1) retrieval — search a knowledge source for content relevant to the question, (2) augmentation — pass the retrieved content to the LLM as additional context, (3) generation — the LLM produces an answer grounded in the retrieved context. The user sees a single response; the system did three things to produce it.

2. Why RAG instead of fine-tuning

Fine-tuning teaches a model new patterns by updating its weights. RAG keeps the model unchanged and gives it new information at query time. For fast-changing data (today's pipeline, this week's support tickets, the latest customer conversation), RAG is the right pattern — fine-tuning would require retraining every time the data changes. For permanent behavioral changes (tone, format, domain reasoning), fine-tuning still wins.

3. Retrieval methods: keyword, semantic, hybrid

Keyword retrieval (BM25) ranks documents by literal word overlap. Semantic retrieval uses embeddings to find conceptually related content even when words differ. Hybrid retrieval combines both — keyword catches exact terms, semantic catches paraphrases. Most production RAG systems use hybrid retrieval because pure semantic misses queries with specific names, numbers, or IDs.

4. The freshness advantage

RAG's biggest practical advantage is freshness. A fine-tuned model knows what it knew at training time. A RAG system knows whatever its knowledge source knows, updated in real time. Ask "what was the latest message in the Acme thread?" and a RAG system retrieves the actual message; a non-RAG model can only describe what such a message might look like.

5. RAG and permissions

RAG retrieval must respect access control. If a user does not have permission to read a document, the RAG system must not retrieve it for that user's query — otherwise sensitive information leaks through the answer even if the source is technically locked down. Permission-aware retrieval is what separates production-grade enterprise RAG from a demo.

How this works in Cogito

Cogito uses permission-aware RAG over every connected business system. When a sales rep asks about a customer account, Cogito retrieves the relevant Slack threads, Notion pages, Intercom conversations, and Linear issues — but only the ones that specific user is allowed to see, inherited live from each source system. The retrieval respects native permissions; the generation cites every source with a deep link back to it.

Related terms

AI agent →

See the concept in action.

Try Cogito free with $20€20CHF 20 credit. No card required.

Get Started Free

Last reviewed May 13, 2026