RAG — Retrieval Augmented Generation — is the most important technical concept for GEO outside of static training data. It determines whether an AI system can retrieve and use your content at the moment a user asks a question.
What is RAG?
When a user submits a prompt, the system first searches a data source for relevant fragments. Those fragments are added as additional context. The model generates an answer based on both training knowledge and retrieved information.
Click through the six phases to see what happens with a concrete energy
question:
1. The user asks a question
Input phase
A user types a question into an AI assistant. Unlike a search engine, the user doesn’t need to think in keywords — they write the way they would speak. This is what the system has to interpret in step 2.
Which energy supplier currently offers the best rate for a small business in Flanders?
2. The question becomes a vector
Embedding phase
The system converts the question into a mathematical representation — a long sequence of numbers that captures the meaning of the sentence. Questions with similar meaning get similar vectors, even if they use different words. This makes semantic search possible.
[ 0.184, -0.291, 0.503, 0.072, -0.118, 0.447, -0.226, 0.339, 0.061, -0.412, 0.288, 0.156, -0.073, 0.394, … + about 1500 more dimensions ]
3. The system retrieves candidates
Retrieval phase
The vector of the question is compared against vectors of thousands of documents in the database. The system surfaces the most similar candidates. At this point these are still raw matches — relevance is estimated by vector distance, not by factual quality.
Engie — rates for small professionals
0.851
Luminus — SME energy contracts
0.823
VREG — comparison module 2026
0.798
TotalEnergies — pro offer Belgium
0.776
Old forum thread from 2021
0.692
4. Candidates are re-ranked
Ranking phase
A second filtering layer. The system weighs additional signals: how recent is the source, what is its authority, how specifically does the text answer the question. The top 3 to 5 make it to the next phase. Many candidates that looked relevant on a vector level fall away here.
VREG — comparison module 2026
recent + neutral
Engie — rates for small professionals
specific + recent
Luminus — SME energy contracts
specific
TotalEnergies — pro offer Belgium
commercial
Old forum thread from 2021
outdated
5. The context is assembled
Context phase
The selected fragments are packaged together with the original question into one prompt. This is what the language model finally sees: your question plus the external knowledge just retrieved. The model doesn’t know the answer on its own — it is handed the building blocks.
SYSTEM: Answer based on the following sources. CONTEXT: [1] VREG comparison module (Apr 2026): ... [2] Engie rates for small professionals: ... [3] Luminus SME energy contracts: ... USER: Which energy supplier currently offers the best rate...
6. The model generates an answer
Generation phase
Only now does the language model itself come into play. It synthesises the retrieved fragments into a readable answer and — in citation-first systems like Perplexity — adds explicit source references. For GEO this is the moment that reveals whether your content was retrieved, how deeply it is described, and where it sits in the answer.
For small businesses in Flanders, Engie currently offers one of the sharpest fixed rates, followed by Luminus and TotalEnergies. [1][2][3]
The conditions vary per supplier — Engie has a fixed price for three years, Luminus offers a green option with no surcharge, TotalEnergies has a discount for multi-product customers.
How does RAG select which content gets retrieved?
RAG systems work on the basis of semantic similarity. The user’s question is converted into a mathematical representation, and the data source is searched for fragments with a similar representation.
Practical consequence: content written from the user's question — not from internal terminology — is retrieved more often.
Writing RAG-friendly content
- Use the language of your audience, not internal jargon
- Answer one specific question per section or page
- Give direct, factual answers — no detours
- Use clear subheadings that mirror the question
- Add FAQ sections with explicit question-answer structure
- Avoid long introductions that delay the core of the answer
Related in the hub
→ Want to know how training data and live crawl work together? Read 2.3.