To understand GEO, you need to understand how a Large Language Model decides what it tells you. Two fundamental sources: what it learned during training, and what it retrieves at the moment of your query.
Source 1: Training data
An LLM is trained on a vast amount of text. That training has a cut-off date. Content that has been online for a long time, consistently, and is widely cited, has a higher chance of being part of the training data — and therefore part of the model’s base knowledge.
Implication: authority and presence over time count. GEO is not a sprint.
Source 2: Live retrieval (RAG)
Many modern AI systems are extended with a retrieval component. When you ask a question, the system fetches relevant documents as context. ChatGPT with web search, Perplexity and Google AI Overviews all work on a form of RAG.
Practical implication: fresh, well-indexed content can be retrieved and used, even after the training cut-off.
What does this mean in practice?
- Write content that gives direct answers to specific questions
- Ensure technical accessibility: fast load times, correct HTML, no blocks for crawlers
- Build long-term authority through consistent publishing and external mentions
- Use structured data to make context explicit
- Be present on platforms that AI systems use as sources
Related in the hub
→ Want to understand how RAG actually works? Read on in 2.2 — RAG explained.