Every measurement methodology has limitations. GEO measurement has more than most. Anyone who doesn’t acknowledge that is selling false certainty. This page is the honest counterpart to the rest of this cluster: what measurement structurally cannot tell you, and how to deal with that.
AI answers are not deterministic
The same question, asked at two different moments to the same model, can produce two different answers. Sometimes subtly different, sometimes substantially. That is not a measurement error — that’s how language models work. No GEO measurement gives you “the one truth”. Every measurement is a sample from a distribution.
Implication: repeat measurements multiple times and report averages with spread. One measurement moment is never enough basis for a decision.
You don't know who's asking
An AI model increasingly personalises answers based on what it knows about the user: previous conversations, location, preferences, history. Your measurement — in a neutral test environment — doesn’t capture this. The actual answers your audience receives can systematically differ from what you measure.
Implication: measure deliberately in different contexts. Test with and without conversation history. Vary geographic signals. Treat your measurement as indicative, not definitive.
Models change without warning
A model update can fundamentally shift your visibility from one day to the next. Not because you changed something, but because the provider has adjusted the model, expanded training data, or modified retrieval logic. A drop in your measurement could be a signal about your content — or about an external model change you have no visibility into.
Implication: track releases and changes from the major AI platforms. When sudden swings occur: check whether general model changes have happened before drawing conclusions about your own content.
Prompt selection determines the result
You measure on the prompts you choose. But you don’t know which prompts your actual audience asks. A prompt set built from a marketing perspective rarely covers the full reality of user behaviour. Every prompt set is an assumption about how people ask — and that assumption is always partly wrong.
Implication: update your prompt set periodically based on new insights. Combine internal knowledge with external sources on search behaviour. Accept that you’ll have blind spots.
Not all platforms are equally measurable
Perplexity shows sources explicitly and is relatively easy to measure. ChatGPT often gives no source attribution and requires more interpretive work. Google AI Overviews sit embedded in search results and require yet another methodology. Your measurement reflects the platforms you can measure — not necessarily the platforms where your audience actually is.
Implication: document which platforms you measure and which you don’t. Be cautious about extrapolating to platforms outside your measurement.
Measurement works backwards, not forwards
What you measure is what has happened. What a new AI system says tomorrow, you cannot predict. What an upcoming model version changes, you cannot either. GEO measurement gives you a picture of the current state — not a prediction of the next.
Implication: use measurement for course correction, not for strategic prediction. Your strategy should be robust across models and time — not optimised for one snapshot.
The measurement data is itself incomplete
Even within a platform you measure, you only capture a fraction of reality. The prompts you test are a sample from an infinite set. The runs you do are a sample from infinite variation. Every figure you report is an estimate with an uncertainty margin.
Implication: report results with spread, not as absolute truth. A drop from 35% to 32% is within measurement noise. A drop from 35% to 18% is a signal.
Why this is still useful
All these limitations don’t make GEO measurement worthless — they make it realistic. A measurement that honestly tells you what it does and doesn’t know is strategically more reliable than a dashboard that offers false certainty. Use GEO measurement to spot trends, recognise patterns, and validate interventions. Don’t use it as if it were GA4 for AI.
A methodology that doesn't name its own limits isn't a methodology. It's marketing.
Related in the hub
→ Want to know how tactics differ per platform? Go to Cluster 5 — Platforms.