How to Open a Closed AI Citation Gate

Q: How can you test whether an LLM actually used a cited source?

Run a controlled removal test. Get the model to cite the source, then re-ask with that URL removed or swapped out of the retrieval context and check whether the substance of the answer changes. If the answer survives unchanged, the citation was attached after the fact rather than causal — attached to a conclusion already reached. This is the Gate 5 faithfulness check, and under stress-test conditions a large share of citations do not survive it.

By Manuel Hürlimann for GaryOwl.com | Published: June 21, 2026 | Updated: June 21, 2026
Expertise: Digital Authority Engineering | AI citation pipeline diagnostics | Source-Trust Mechanics
Time to read: ~47 minutes · ~8,855 words
Series: Operative Article 7 — DAE Glossary

When an AI assistant names your competitors but never you, the problem is rarely your writing — it is which of the five AI citation gates is closed. This article shows how to tell which gate is shut, why better content usually cannot reopen it, and what actually does.

📌 Navigate

Authority Intelligence Lab · DAE Framework · DAE Glossary · Article 6: The Five Gates of AI Citation

📌 Where to Start

This guide to opening a closed AI citation gate is organized so you can jump to your symptom:

If you read one section: “Gate 1: Opening a Parametric Gate” — why the hardest closed gate cannot be opened with better content, and what actually moves it.

If your brand is simply never mentioned: Start with “Gate 1,” then “Gates 2–4: The Openable Gates” to rule out the cheaper fixes first.

If you are cited but it never seems to help: Read “Gate 5: Overcoming Citation Bias” — the gate where reputation outweighs on-page work.

If you want to measure, not guess: Read “The Measurement Layer” — how each gate maps to a Generative Citation Score dimension, and which ones you can measure from the outside.

TL;DR — Key Takeaways

The five gates of AI citation are not equal, and treating them as if they were is the most common way AI-visibility budgets are wasted. Three gates — retrievability, extraction, and consensus — open with technical hygiene any publisher can do and any tool can measure. Two do not. Gate 1, whether the model has a parametric representation of your entity at all, is set by training-data frequency, not content quality: you cannot optimize your way out of a resolution failure with better content. Gate 5, whether a citation reflects genuine grounding or is rationalized after the fact, is where reputation outweighs prose, because a large share of citations are not causally faithful.

The operational rule that follows is to match the lever to the gate: close the hygiene gates efficiently, and reserve patient, compounding effort — independent description, entity-backbone presence, original data others must cite — for the two gates competitors cannot copy their way past. The honest and the effective strategy keep converging on the same answer: earn it.

📌 Key Insights — What This Article Establishes

1. Gate 1 is an identity problem, not a content problem. Parametric recall is frequency-gated (Kandpal et al., ICML 2023; Mallen et al., ACL 2023) [Tier A] and dense retrieval inherits the same long-tail entity bias (Sciavolino et al., EMNLP 2021) [Tier A] — so a better page does not move it.

2. Schema does not raise AI citations (Ahrefs controlled study, May 2026) [Tier E]. Its real, narrower role is entity disambiguation through a separate pipeline that must actively retrieve the structured data (Pons et al., ISWC 2024) [Tier A] — not a citation lever, not a runtime switch.

3. Being cited is not being used. Up to 57% of citations are post-rationalized under adversarial conditions (Wallat et al., ICTIR 2025) [Tier A], which is why Gate 5 rewards reputation over on-page work.

4. LLMs amplify a heightened citation-popularity bias (Algaba et al., NAACL 2025) [Tier A] — prevalence is read as authority, so repetition substitutes for faithful grounding.

5. The one on-page lever that moves Gate 5 is supplying data the model cannot get elsewhere, making the citation causally necessary rather than decorative.

📌 Glossary: Key DAE Terms in This Article

Generative Citation Score (GCS) · Five Gates of AI Citation · Parametric vs. Retrieval Channel · Entity Coherence · Root-Source Positioning

The Question Article 6 Left Open

Article 6 gave the diagnosis a coordinate system. It described AI citation as a sequence of five gates across two channels — the parametric channel, where your entity is encoded in the model’s weights at training time, and the retrieval channel, where your content is fetched live at query time — and it established the rule that a citation requires at least one channel to clear all five gates. That model answers the question “where did my citation die?” What it deliberately did not answer is the one every reader asks next: my gate is closed — what now?

The instinct is to reach for the same lever every time: write more, write better, add schema, refresh the date. That instinct is right for some gates and actively wasteful for others, and the difference is not obvious from the outside. A brand that is never named in answers and a brand that is cited but never benefits look like the same problem — “we lack AI visibility” — but they are failures at opposite ends of the pipeline, and the fix for one does nothing for the other. Spending a content budget on the wrong gate is the quiet, common way these projects fail.

This article sorts the five gates into two groups. Gates 2, 3, and 4 — retrievability, extraction, consensus — are openable by hygiene: crawlability, clean structure, indexing, corroboration, all levers a publisher controls and existing tools measure. They matter, but they are not where durable advantage lives. Gate 1 and Gate 5 are the opposite. Gate 1 asks whether the model can resolve your entity at all — governed by training-data frequency, not today’s page. Gate 5 asks whether a citation was causally used or merely attached after the fact — where reputation, not prose, is what survives. Both reward being the kind of source others already point to, and neither yields to a content refresh.

So Gate 1 and Gate 5 each get a full treatment; Gates 2 through 4 get one compact section. Each closes with a Measure with line pointing to the Generative Citation Score dimension that tracks it.

The five gates at a glance — where a citation can fail and what opens each gate. Tap any gate for the full treatment, including how to measure it.

Gate	What it asks	The lever that works
Gate 1 — Identity earned, not content	Is your entity in the model at all?	Entity-backbone presence (Wikipedia/Wikidata) + independent, repeated description
Gate 2 — Retrievability hygiene	Crawled, indexed, reachable at all?	Permit AI crawlers; serve raw HTML, not JS-only; stable URLs
Gate 3 — Extraction hygiene	Does a fetched page yield clean, liftable claims?	Clear headings, fact-dense passages that survive being lifted out of context — not schema
Gate 4 — Consensus hygiene	Corroborated by independent sources?	Genuine cross-source presence — referenced and corroborated by others
Gate 5 — Faithfulness earned, not content	Was the citation causally used, or rationalized after?	Reputation + supply data the model can’t get elsewhere, so the citation is necessary

Gate 1: Opening a Parametric Gate

↩ Section guide

Of the five gates, Gate 1 is the one most people misdiagnose, because its failure looks like every other failure. You ask an AI assistant about your category; it answers fluently, names three or four competitors, and never mentions you — not as the best option, not as a runner-up, not even as a candidate it considered and rejected. The natural conclusion is “my content isn’t good enough,” and the natural response is to write more of it. For a Gate 1 failure, that response is close to useless, and understanding why is the single most valuable thing in this article.

Gate 1 is the question of whether the model has a stable internal representation of your entity at all — whether, when the query is parsed and routed, “your brand” exists as something the system can resolve, disambiguate, and reason about. This is the parametric channel: knowledge compressed into the model’s weights during training, not fetched live from the web. And the uncomfortable property of the parametric channel is that it is mostly fixed at training time. A page you publish today does not edit weights that were frozen months ago.

How much any given answer leans on those frozen weights rather than on live retrieval varies sharply by engine — and that split is now measured rather than assumed. Across 55,936 queries, one study found Grok returned no cited source in 82% of responses and Gemini in 38% (Source Coverage and Citation Bias, 2025) [Tier A*]; another measured the average number of web pages consulted per query at roughly 0.4 for GPT-4o using search as an optional tool against 8.5 for Gemini and 8.6 for Google’s AI Overviews (Characterizing Web Search in the Age of Generative AI, 2025) [Tier A*]. Google’s own FACTS benchmark shows the same gap inside one model: Gemini 3 Pro scores 76.4 on closed-book parametric recall versus 83.8 when allowed to search (FACTS, 2025) [Tier B, conflict of interest disclosed]. No single “percent parametric” figure holds across the industry, but the practical lesson is concrete: whether Gate 1 or the retrieval gates dominate your problem depends on which assistant you are losing in.

📌 The core asymmetry

Gates 2–4 ask whether your content is good enough to be retrieved, extracted, and corroborated. Gate 1 asks whether your entity exists in the model’s memory in the first place. The first is a content problem. The second is an identity problem — and you cannot optimize your way out of a resolution failure with better content.

Why content optimization does not open Gate 1

The evidence that parametric recall is governed by training-data frequency rather than content quality is unusually strong, because it comes from controlled academic work rather than vendor observation. Nikhil Kandpal and colleagues, in “Large Language Models Struggle to Learn Long-Tail Knowledge” [Tier A] (ICML 2023), linked a model’s ability to answer a factual question to the number of pretraining documents relevant to that question, identified through entity linking of the training corpora. The relationship held across model sizes up to 176 billion parameters, and the authors estimated that closing the long-tail gap by scale alone would require models orders of magnitude larger. In other words: if the web does not already talk about your entity often, the model does not reliably know it — and making one more excellent page does not change the statistic that matters.

Alex Mallen, Akari Asai and colleagues reached the same conclusion from a different direction in “When Not to Trust Language Models” [Tier A] (ACL 2023). Using PopQA — a benchmark of 14,000 questions built from Wikidata triples, with entity popularity measured by Wikipedia page views — they found that accuracy tracks entity popularity, and that scaling the model up does not appreciably improve recall of the long tail. Popular entities are remembered; unpopular ones are not; and the cliff between the two is not bridged by a bigger model or a better-written page. A more recent preprint, “How Knowledge Popularity Influences and Enhances LLM Knowledge Boundary Perception” [Tier A*] (peer-review pending), finds the same popularity effect across recent models, with relation popularity an even stronger predictor of recall than entity popularity.

The retrieval side of Gate 1 shows the same bias. Christopher Sciavolino and colleagues, in “Simple Entity-Centric Questions Challenge Dense Retrievers” [Tier A] (EMNLP 2021), showed that the dense vector retrievers underpinning modern RAG underperform older sparse methods like BM25 on entity-centric questions, generalizing only to common entities. Even the retrieval channel has a long-tail entity problem: the mechanism meant to rescue you from a thin parametric memory struggles on precisely the entities that need rescuing. The system is trying to resolve which entity you are, not merely which page ranks — and an unresolvable entity is invisible to it. Targeted data augmentation can push a dense retriever past BM25 on long-tail questions (RPDR, EMNLP 2025) [Tier A], so this is a strong default, not a law.

Put together, these three results define the wall. Parametric recall is frequency-gated (Kandpal, Mallen); dense retrieval inherits the same entity bias (Sciavolino); and none of it is responsive to the quality of a single new page. This is why a Gate 1 failure is an identity problem, not a content problem.

Content quality is not irrelevant to Gate 1 — but its only path there is indirect and slow: excellent work can earn the third-party description that, at the next training cycle, raises the frequency the model learns from. What it cannot do is move that frequency directly, from your own page, today.

Recent work makes the wall starker still. If even direct weight surgery cannot cleanly install a fact, then publishing certainly cannot. Wanying Ren and colleagues, in “Revisiting Parameter-Based Knowledge Editing in Large Language Models: Theoretical Limits and Empirical Evidence” [Tier A*] (peer-review pending), show that localized parameter edits propagate along fragile directions in the representation space and consistently degrade the model’s core capabilities rather than neatly overwriting a single association. The parametric store resists even deliberate, internal rewriting; it is not a surface that outside content can reach at all.

The “slot,” and what it is not

It is tempting to picture this as the model having — or lacking — a dedicated internal “slot” for your entity. The metaphor is useful, but the mechanistic reality is messier. Interpretability work shows factual associations do live in localizable places — Kevin Meng and colleagues’ ROME [Tier A] (NeurIPS 2022) traced recall to mid-layer feed-forward modules — but the strong reading of one clean slot per entity does not hold: encoding is heterogeneous and distributed across overlapping neurons, and recent work finds it even more distributed at frontier and mixture-of-experts scale (Hernandez et al., ICLR 2024; MoE localization, 2026) [Tier A/A*]. Treat “slot” as a working metaphor. The practical claim that survives is modest and solid: whether the model has a usable internal representation of your entity is set largely by training-time exposure, and you do not control it at runtime by editing a page.

What actually opens Gate 1

If the lever is not content quality, what is it? Become the kind of entity the training corpus and grounding layer already describe — slower and less glamorous than publishing. Three moves matter, in rough order of leverage.

Get into the entity backbone. The corpora that shape entity knowledge are the structured, cross-referenced ones — Wikipedia, Wikidata, and the knowledge graphs built from them. The “WildHallucinations” benchmark [Tier B*] (Zhao et al., 2024) found that across 7,919 real-world entities, the 52 percent with no Wikipedia page drew consistently more hallucination. Knowledge-graph-aligned pretraining (ERNIE, KEPLER, KELM) [Tier A] points the same way. The honest caveat: this is association evidence, not proof that adding a Wikidata entry flips a switch in an already-deployed model.

Earn frequency through independent description. Because recall is frequency-gated, the durable move is to be described independently and often: genuine coverage, academic or industry citation, third-party reference. The honest strategy and the effective strategy keep converging.

Use schema for what it can actually do — which is narrower than the industry claims. Two claims must not be confused. First, schema does not raise AI citations: a controlled difference-in-differences study by Ahrefs (May 2026) [Tier E, COI disclosed] tracked 1,885 pages adding JSON-LD against ~4,000 controls and found no meaningful lift, with independent crawler analysis (Vercel/MERJ, 2025) [Tier B] confirming most AI crawlers do not privilege hidden structured data at retrieval. Second, schema may help an entity become resolvable: sameAs links can help a knowledge graph disambiguate you (Pons et al., ISWC 2024) [Tier A] — but only when the data is actively retrieved and injected, not merely present. Markup is an eligibility signal for a separate pipeline, not a citation lever and not a runtime switch on a deployed model.

A living-lab illustration

The entity-resolution problem is easiest to see in a name collision — and “Hürlimann” is a crowded one, mapping to a Swiss brewery, a tractor brand, and a former federal councillor long before anyone writing about AI citation mechanics. Cross-domain collisions like those mostly resolve by context. The one that bites is in-field: in our own living-lab checks, asking about “Hürlimann” in an AI context tended to surface a more academically visible researcher who shares the surname — not from a content gap, but because parametric recall tracks entity frequency (Kandpal; Mallen). That is the cleaner, more dangerous Gate 1 failure: a more-described entity in your own domain wins the slot, and no on-page writing changes which entity the model resolves. The fix is the slow one — independent description, entity-backbone presence, disambiguating markup a grounding layer can use — applied to GaryOwl.com as the series’ standing living lab.

📌 Measure with

SubQueryCov (Sub-Query Coverage) — the share of a query’s fan-out sub-queries for which your content holds a semantically matching passage — is the externally measurable proxy for Gate 1’s downstream reach, anchored in Soyeong Jeong and colleagues’ Adaptive-RAG [Tier A] (NAACL 2024) and Harsh Trivedi and colleagues’ IRCoT [Tier A] (ACL 2023). For the resolution layer itself, the practical diagnostic is an anchoring probe: ask the model to name the entities in your category with retrieval disabled, and observe whether you appear at all. If you do not surface in the parametric channel and you do not cover the fan-out, Gate 1 is closed — and content is not the remedy.

Gate 5: Overcoming Citation Bias

↩ Section guide

If Gate 1 is the gate where your entity does not exist, Gate 5 is the gate where it exists, gets cited — and the citation does not mean what you think it means. This is the subtlest failure in the pipeline, because from the outside it looks like success. Your URL appears in the answer’s sources. A tracking tool logs a citation. And yet nothing moves: the citation was decorative, attached to a conclusion the model had already reached, rather than a source it actually reasoned from. Gate 5 is where you learn that “being cited” and “being used” are different events.

The gate has two parts. The first is survival: does your content make it into the final answer as a citation at all, or get dropped during synthesis? The second is faithfulness: when it is cited, was it causally used — did the cited passage actually produce the claim — or was it rationalized after the fact? The second is the load-bearing one, and it is where reputation quietly overtakes prose.

Why “cited” does not mean “used”

The central evidence here is Jonas Wallat and colleagues’ “Correctness is not Faithfulness in Retrieval Augmented Generation Attributions” [Tier A] (ICTIR 2025). It separates two things the industry conflates: a citation can be correct (the document supports the claim) without being faithful (the document actually produced the claim). Their finding is that up to 57 percent of citations lack faithfulness — the model reaches an answer, then goes looking for a document to attach to it, a pattern called after-the-fact attachment. Reported with context: the figure comes from stress-test conditions on a single large model, not a survey of every system. The honest reading is not that most citations everywhere are unreliable, but that the pattern is common enough that a citation cannot be assumed to reflect causal use. A recent preprint, “Cited but Not Verified” (Onweller et al.), compiles hallucination rates of 11 to 57 percent across deployed models [Tier A*] (peer-review pending).

This sits on an older baseline: Nelson Liu, Tianyi Zhang and Percy Liang, in “Evaluating Verifiability in Generative Search Engines” [Tier A] (EMNLP 2023), found only about half of generated sentences fully supported by their citations — a 2023-era figure, but the direction has held across three years and multiple groups. And the pattern survives even with grounding switched on: verifying generated legal citations against a graph of millions of court decisions, Ovcharov (2026) [Tier B*] still found a meaningful share hallucinated across five commercial systems. A citation on screen is not evidence that the source was used.

📌 The Gate 5 trap

A dashboard that counts citations is measuring survival, not faithfulness. You can raise your citation count and change nothing about whether models actually reason from your content. The lever that works is not “be more citable” — it is being the source the model already trusts enough to reach for, which is a reputation property, not a page property.

Why repetition gets mistaken for authority

The reason reputation dominates this gate is structural: the systems amplify whatever is already prevalent. Andres Algaba and colleagues [Tier A] (NAACL 2025) showed that when LLMs suggest references they reproduce human citation patterns with a heightened bias toward already-highly-cited work — a Matthew effect that persists after controlling for year, venue, author count, and title length, reproduced in a 2025 preprint across 274,951 generated references (arXiv 2504.02767) [Tier A*]. Prevalence is read as authority, and the pipeline has no signal for whether that prevalence was earned or manufactured. Repetition is a substitute for, not a sign of, faithful grounding.

This differs from classic search, where repetition is legitimate: PageRank treats an inbound link as a vote, a structural computation hard to manufacture at scale. A language model has no such computation — it conflates corpus frequency with authority. AuthorityBench [Tier B*] separated the two constructs and found models perceive true authority only weakly, and that adding the page’s own text degrades their authority judgment.

What actually opens Gate 5

Long term — build the reputation the parametric channel rewards. Because models reach for sources they already trust, and that trust is built from independent, repeated description, the durable Gate 5 move is the same earned-media work that opens Gate 1: be cited, covered, and referenced across sources you do not control.

📌 Honest limitation — the weakest claim in this article

“Reputation beats on-page content at the citation stage” is the most intuitive claim here and the least directly proven — a directional hypothesis, not a settled fact. It is supported indirectly by the popularity bias above, but the correlation is confounded, and some vendor data cuts the other way: Yext’s 2025 analysis reported 86 percent of AI citations came from brand-controlled sources. The evidence is genuinely mixed; what survives is the narrower claim that prevalence and independent description are read as authority — not a guarantee that reputation outranks every on-page lever. It earns its place because the alternative — out-publishing incumbents on a closed gate — has a documented record of not working.

Short term — make the retrieval-channel citation hard to drop and hard to fake. The one content-side lever that genuinely matters: supply something the model cannot produce without you. A unique statistic, an original dataset, a figure that exists nowhere else makes the citation causally necessary — the model cannot attach after the fact a number it can only get from your page. Generic restatements of common facts do the opposite: maximally substitutable, maximally decorative.

📌 Measure with

Faithfulness — the share of your received citations genuinely grounded in your content rather than attached after the fact — is the Gate 5 dimension, anchored in Wallat (ICTIR 2025) and ReDeEP [Tier A] (ICLR 2025). The practical diagnostic is a controlled withdrawal test: take your page out of the retrieval context and see whether the citation still holds. If it does, it was never causal. One caution: a reference-free faithfulness check can only score precision, and therefore quietly rewards citing less, so read any single faithfulness number as one side of a two-sided quantity (Santillana, 2026) [Tier A*].

Gates 2–4: The Openable Gates

↩ Section guide

The two gates above are hard because you cannot reach them with a content budget. The remaining three are the opposite: they respond directly to technical hygiene and corroboration, the levers a publisher already controls. They still close — often — but when they do, the fix is known, the tooling exists, and the right move is to do it and move on rather than to theorize. This section is deliberately compact for that reason: these gates do not need a new apparatus, they need a checklist.

Gate 2 — Retrievability. The question is whether your content appears in the candidate set at all: is it crawled, indexed, and reachable by the systems that fetch live? This is where the most common silent failure lives — a site that blocks AI crawlers in robots.txt, or renders its content only via client-side JavaScript that the crawlers never execute. A subtler variant happens one layer earlier: depending on bot-management configuration, a CDN or web-application firewall — Cloudflare, AWS WAF, Fastly, Sucuri — can challenge or 403 the AI user-agents at the edge, before the request ever reaches your server or your robots.txt (Cloudflare, “Block AI bots”) [Tier B]. Because Googlebot is usually allowed by default, Search Console still reports clean indexing, so the site looks healthy while assistants cannot read it at all. The diagnostic is a server-log check: look for GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot and Google-Extended, and treat their outright absence — rather than 200s — as the signature of an upstream block. The retrieval layer cannot rank what it cannot fetch. The remedies are unglamorous and well-understood: permit the named AI user-agents, serve core content and structure in raw HTML rather than behind JavaScript, keep URLs stable, and confirm indexing in the channels that matter. The Agent-Readiness audit against The Website Specification (the open standard maintained by Joost de Valk) [Tier B] is the concrete G2/G3 checklist here — stable URLs, llms.txt, AI-crawler robots.txt policy, per-page machine-readable formats, HTTP Link headers — and it doubles as living-lab dogfooding: a brand whose thesis is “engineer content to be citable” should be the most agent-readable site in its niche.

Gate 3 — Extraction. The question is whether a retrieved page yields clean, well-structured, extractable claims. Article 5 (“Where Structure Actually Works”) covers this gate in full; the short version is that extractable structure — clear headings, fact-dense passages, claims that survive being lifted out of context — is what lets a candidate become a usable source. Note the boundary established earlier: this is about structure that aids extraction, not schema markup as a citation lever. Those are different mechanisms, and S2 explained why conflating them is a mistake.

Gate 4 — Consensus. The question is whether your content agrees with what other sources say — whether it survives the reranking step that treats multi-source corroboration as a proxy for credibility. A claim that is credible but isolated can be filtered here; a claim corroborated across independent sources passes. The lever is genuine cross-source presence: being referenced, reviewed, and corroborated by parties other than yourself. This is the openable face of the same consensus machinery the source-trust companion examines critically — useful to you when the corroboration is real, dangerous to the ecosystem when it is manufactured.

📌 The three modulators (how these gates flex)

Three cross-cutting dials modify how Gates 2–4 behave, and they are best understood as adjustments to these gates rather than as gates of their own. Temporal freshness raises retrievability and consensus weight for time-sensitive queries — recency is read as relevance, so stale content is quietly demoted at G2/G4. Extractability is the continuous dial behind Gate 3: the same content extracted more cleanly clears the gate more reliably. Consensus pressure is the dial behind Gate 4: how heavily a given query rewards agreement with the mainstream versus tolerating a well-supported outlier. None of the three is a separate strategy; each is a reason the same gate opens more or less easily depending on the query and the moment.

What unites these three gates is that they are diagnosable and fixable from the publisher side, which is exactly why they are also the wrong place to look for durable advantage. Everyone can run the same audit and ship the same fixes; hygiene is table stakes, not a moat. The disciplined posture is to close these gates efficiently with existing instruments and reserve real strategic investment for the two gates — entity identity and citation faithfulness — that competitors cannot copy their way past.

📌 Measure with

The Gate 2–4 dimensions of the Generative Citation Score are RetrievalToCit (of the times your content was retrieved for relevant queries, how often it was actually cited) and RefSurvival (whether a reference that entered the context survived into the final answer). Both are anchored in the citation-generation literature — ALCE (Gao et al., EMNLP 2023) [Tier A] and Self-RAG (Asai et al., ICLR 2024) [Tier A] — but both require retrieval traces or pipeline observability to measure directly, which most publishers do not have. The externally measurable substitutes are the practical ones: confirm crawler access and raw-HTML rendering for G2, compare your extractability against cited competitors for G3, and check cross-source corroboration for G4. The cheaper existing tools — Search Console, Bing Webmaster Tools, a basic retrieval check — cover most of this without a custom pipeline.

The Measurement Layer: A Generative Citation Score

↩ Section guide

Every gate above closes with the same instruction — Measure with — and those instructions are not loose. They are the six dimensions of a single diagnostic instrument, the Generative Citation Score (GCS). The point of naming it is to replace “we lack AI visibility” with a sentence that can be acted on: which gate is closed, by how much, and is it moving? The score does not tell you to write more; it tells you where to spend.

The construction is deliberately conservative. For each dimension, you observe a number of trials and count successes — sub-queries your content covers, retrievals that converted to citations, citations that proved faithful — giving a success rate. Because the number of trials is usually small, the raw rate is misleading, so each dimension reports a Wilson score interval’s lower bound rather than the bare fraction: a one-line, century-old correction (Wilson, 1927) that keeps a “3 out of 4” from being mistaken for settled evidence. The small-sample discipline matters, and the case against pretending otherwise is made directly in recent work arguing against naive large-sample assumptions in LLM evaluation with few datapoints (Bowyer, Aitchison & Ivanova, ICML 2025) [Tier A]. Higher is better: a higher GCS means more citable.

Dimension	What it measures	Anchor
SubQueryCov	Share of a query’s fan-out sub-queries your content semantically covers (Gate 1 reach)	Jeong et al., Adaptive-RAG (NAACL 2024); Trivedi et al., IRCoT (ACL 2023) [A]
RetrievalToCit	Of retrievals where you appeared in the candidate set, the share that became citations (Gate 2→4)	Gao et al., ALCE (EMNLP 2023) [A]
RefSurvival	Whether a reference that entered the context survived into the final answer (Gate 5 survival)	Asai et al., Self-RAG (ICLR 2024) [A]
Faithfulness	Of citations you receive, the share causally grounded in your content (Gate 5 faithfulness)	Wallat et al. (ICTIR 2025); Sun et al., ReDeEP (ICLR 2025) [A]
ToolInclusion	Whether your tool/endpoint is discovered and its result used in agentic systems (Gate 2, tool substrate)	Schick et al., Toolformer (NeurIPS 2023) [A]
BridgeCentrality	Your node’s bridge position in the topic’s co-citation graph (Gate 4 reranking / Matthew effect)	Algaba et al. (NAACL 2025) [A]

Two things about this instrument have to be stated plainly, because the temptation to overclaim a metric is exactly the failure this series exists to resist.

📌 The weights are deliberately uncalibrated

A GCS aggregates its six dimensions with weights — and those weights are not yet set. Calibrating them honestly requires a labelled record of real outcomes: content scored on the six dimensions, paired with whether it was actually cited by AI systems for relevant queries. That record does not yet exist at the scale calibration needs, so any specific weight vector today would be a guess dressed as a result. The weights stay open, the calibration is named as future work, and any prior anyone proposes — including a faithfulness-dominant one — is a hypothesis to test, not a fact to publish. (This is why this article contains no example score: a fabricated number would contradict the whole point of the instrument.)

📌 Two honest methodological seams

Not all six are externally measurable. Three dimensions — RetrievalToCit, RefSurvival, and ToolInclusion — require retrieval traces or agent/tool access that a publisher observing from the outside does not have. The externally measurable subset is SubQueryCov, Faithfulness (on observed citations), and BridgeCentrality (from public co-citation data). An honest GCS reports which dimensions it actually measured rather than implying full pipeline visibility. One partial, recent exception is worth naming precisely, because it narrows this gap without closing it. Since February 2026, the AI Performance report in Bing Webmaster Tools [Tier B] exposes first-party citation counts and the internal grounding queries Copilot reformulates to retrieve sources — a genuine window onto the citation side of RetrievalToCit and onto the fan-out behind SubQueryCov. But it is bounded in ways that matter: it covers only the Microsoft surface (Copilot, Bing AI summaries, and undisclosed partner integrations), its grounding queries are sampled rather than exhaustive, its surfaces are aggregated so Copilot cannot be separated from Bing summaries, it counts citation frequency rather than faithfulness, it reports citations without the retrieved-but-not-cited denominator, and it shows nothing for ChatGPT, Perplexity, or Gemini. It informs these dimensions for one channel; it does not measure them.

BridgeCentrality does not fit the success-rate model. Betweenness centrality is a continuous graph quantity, not a count of successes over trials, so it does not slot into the Wilson construction the way the others do. The clean fix is to declare it explicitly: either treat it as a Freeman-normalized value on a 0–1 scale outside the Wilson framework, or define an explicit threshold (success = node above a centrality cutoff) before applying the bound. Either is defensible; pretending the seam is not there is not.

There is also a limit the score shares with the whole pipeline, carried over from the source-trust companion and worth repeating where the metric is defined: a GCS measures whether you are cited, not whether the citation was earned. It cannot, today, certify that a high score reflects genuine authority rather than manufactured prevalence. That is not a flaw to hide; it is the boundary of what the instrument honestly claims, and stating it is what separates a diagnostic from a vanity metric.

Honest Limitations

This article makes operational claims about a system that is moving quickly and only partly observable from the outside. Several of those claims rest on evidence that is strong but bounded, and naming the bounds is the difference between a diagnostic and a sales pitch.

The “slot” is a metaphor, and the mechanistic evidence is sub-frontier. The claim that an entity either is or is not parametrically resolvable is grounded in long-tail recall results that are robust. The picture of how it is stored — the interpretability work behind the “slot” — is suggestive rather than settled, and almost all of it was run on models in the 1.5B–13B range. Whether it generalizes cleanly to frontier-scale models is plausible but unproven; treat the storage story as illustration, not proof.

“Reputation beats content” at Gate 5 is a directional hypothesis. As flagged where it appears, this is the least directly proven claim in the article. It is consistent with the popularity-bias evidence and with the documented failure of out-publishing incumbents, but there is no head-to-head peer-reviewed result establishing it, and the supporting correlation is confounded. It earns its place as a well-motivated bet, not a fact.

The schema findings are one controlled vendor study. The result that schema does not raise AI citations comes from a single difference-in-differences experiment (Ahrefs, May 2026) [Tier E] with a disclosed conflict of interest and a short observation window. It is the best controlled evidence currently available and it is corroborated in mechanism by independent crawler analysis, but it has not been independently replicated, and a critique that the window is too short to detect a slow knowledge-graph effect is itself unresolved. Read it as directional.

The faithfulness figures are condition-specific. The 57% post-rationalization rate is from adversarial conditions on a single 104B model; the 51.5% support rate is from 2023-era systems. Both establish that the citation-to-claim link is weaker than it looks. Neither should be quoted as a universal, current rate.

The pipeline mechanics are dated to mid-2026 and will drift. Specific facts this article leans on — that major AI crawlers fetch HTML but do not execute JavaScript, that ChatGPT Search draws heavily on the Bing index, that AI Mode fans queries out — are accurate as of writing and are exactly the kind of detail that changes without notice. Re-verify them before acting on a months-old reading.

The measurement layer is defined, not yet calibrated. The GCS is a construct with named dimensions and an honest accounting of which are externally measurable. Its weights are uncalibrated, its BridgeCentrality dimension has an unresolved normalization seam, and it certifies citation, not earned authority. It is offered as a diagnostic frame to build on, not a finished score.

What Comes Next

The through-line of this article is a single discipline: match the lever to the gate. Spend hygiene effort on the hygiene gates and stop there; reserve the patient, compounding work — independent description, entity-backbone presence, original data others must cite — for the two gates that competitors cannot copy their way past. The reason that division holds is the reason it is defensible: the openable gates are open to everyone, so they confer no advantage, while the earned gates are slow precisely because they cannot be manufactured.

Two threads continue from here. The first is calibration. The GCS becomes more than a frame the moment there is a labelled record pairing content features with real citation outcomes — the by-product of measuring, engagement by engagement, whether a brand was actually cited. That record is the asset; it is also what would let the uncalibrated weights finally be set, and it is being accumulated rather than assumed. The second is the platform dimension: the same content fares differently across ChatGPT, Perplexity, Google AI Mode, Gemini, and Claude, and that asymmetry deserves its own treatment rather than a guess from thin data here.

A third thread is governance, and it is no longer hypothetical. As the optimization market matures, the open question is who can audit it. Yizhu Wen and colleagues, in “Position: Generative Engine Optimization Creates Underexamined Risks, Governance Must Target Concentration, Disclosure, and Academic Blind Spots” [Tier B*] (peer-review pending), formalize the GEO pipeline and locate three structural risks in it — concentrated influence where contestability is low, undisclosed commercial influence embedded in the evidence and framing of answers, and blind spots created by the visibility gap between offline academic setups and deployed systems. The earned-versus-optimized distinction this article draws is, read at the system level, exactly the distinction those governance questions turn on.

📌 Where the line sits — what is shared and what is not

Everything diagnostic in this article is open by design: the five gates, the openable-versus-earned distinction, and the definition of the Generative Citation Score with its six dimensions and its honest caveats. What is not published is the data-tuned scoring recipe — the calibrated weights that only a real citation-outcome dataset can produce. That boundary is deliberate and it follows from the defensibility logic of the whole project: a method protected by secrecy is weakly protected against anyone with more reach, whereas an advantage that comes from accumulated data, independence, and track record stays valuable even in full view. The framework is open for use with attribution; the calibration earned from real outcomes is the part that compounds.

The honest close, then, is the same one the series keeps arriving at from every direction. You cannot optimize your way into an identity the model never learned, and you cannot fake your way into a citation the model actually reasons from. Both are earned. The work of authority is to become the kind of source that opens those gates on the merits — and to measure, plainly, whether it is working.

Frequently Asked Questions

My brand is never mentioned by AI. Will more content fix it?
Probably not, if the problem is Gate 1. If the model has no parametric representation of your entity, a better or longer page does not change the training-data frequency that determines recall. The fix is to become an entity the training corpus and grounding layer already describe — independent coverage, knowledge-graph presence, disambiguating markup a retrieval system can actually use — which is slower than publishing and works for a different reason.

I added schema markup. Why didn’t my citations go up?
Because schema is not a citation lever. The best controlled evidence available (Ahrefs, May 2026) found no meaningful lift in AI citations from adding JSON-LD, and independent crawler analysis shows most AI crawlers do not privilege hidden structured data at retrieval time. Schema’s legitimate role is helping a knowledge graph disambiguate your entity — a separate pipeline, and only when the data is actively retrieved and injected, not merely present.

I am cited but it doesn’t seem to help. What’s going on?
You may be hitting Gate 5. A citation can be attached to a conclusion the model already reached rather than reasoned from — post-rationalization. The test is adversarial: remove or swap your URL from the retrieval context and see whether the citation survives. If it does, it was never causal. The remedy is to supply something the model cannot produce without you — a unique statistic, original data — so the citation becomes necessary.

Which gate should I fix first?
Diagnose before spending. Rule out the cheap hygiene gates (2–4) with existing tools — crawler access, raw-HTML rendering, indexing, cross-source corroboration. If those are clean and you still do not surface, the closed gate is Gate 1 or Gate 5, and the remedy is earned rather than optimized. Spending content budget on the wrong gate is the usual way these projects fail.

What is a Generative Citation Score and can I compute one?
It is a six-dimension diagnostic of how citable your content is, each dimension a Wilson-bounded success rate. You can measure three of the six from the outside (sub-query coverage, faithfulness on observed citations, and bridge centrality from public co-citation data); the other three need pipeline access most publishers lack. The aggregate weights are deliberately uncalibrated until real citation-outcome data exists, so treat it as a diagnostic frame, not a finished number.

What is the difference between entity authority and domain authority for AI citations?
Domain authority is a site-level link metric; entity authority is whether the model recognizes the thing itself. For AI citation they behave differently: a citation begins at Gate 1, whether the model has a parametric representation of your entity at all, which is set by how often the training corpus and grounding layer describe you — not by your domain’s link profile. A low-authority domain that is a clearly described, consistently named entity can be recalled where a high-authority domain that the model has never resolved cannot.

How can you test whether an LLM actually used a cited source?
Run an adversarial removal test. Get the model to cite the source, then re-ask with that URL removed or swapped out of the retrieval context and check whether the substance of the answer changes. If the answer survives unchanged, the citation was post-rationalized rather than causal — attached to a conclusion already reached. This is the Gate 5 faithfulness check, and under adversarial conditions a large share of citations do not survive it.

Sources & Methodology

This article rests on external, published evidence, tiered by strength. Peer-reviewed academic work governs; vendor studies are flagged for conflict of interest and treated as directional. The framework, the gate-mapping, and the Generative Citation Score are the original DAE contribution; every empirical claim is attributed.

[Tier A] Kandpal, N., Deng, H., Roberts, A., Wallace, E., & Raffel, C. (2023). “Large Language Models Struggle to Learn Long-Tail Knowledge.” ICML 2023 (PMLR 202). arxiv.org/abs/2211.08411 (Accessed: June 21, 2026)

[Tier A] Mallen, A., Asai, A., Zhong, V., Das, R., Khashabi, D., & Hajishirzi, H. (2023). “When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories.” ACL 2023. arxiv.org/abs/2212.10511 (Accessed: June 21, 2026)

[Tier A] Sciavolino, C., Zhong, Z., Lee, J., & Chen, D. (2021). “Simple Entity-Centric Questions Challenge Dense Retrievers.” EMNLP 2021. arxiv.org/abs/2109.08535 (Accessed: June 21, 2026)

[Tier A] De Cao, N., Izacard, G., Riedel, S., & Petroni, F. (2021). “Autoregressive Entity Retrieval” (GENRE). ICLR 2021 (Spotlight). arxiv.org/abs/2010.00904 (Accessed: June 21, 2026)

[Tier A] Meng, K., Bau, D., Andonian, A., & Belinkov, Y. (2022). “Locating and Editing Factual Associations in GPT” (ROME). NeurIPS 2022. arxiv.org/abs/2202.05262 (Accessed: June 21, 2026)

[Tier A] Geva, M., Schuster, R., Berant, J., & Levy, O. (2021). “Transformer Feed-Forward Layers Are Key-Value Memories.” EMNLP 2021. arxiv.org/abs/2012.14913 (Accessed: June 21, 2026)

[Tier A] Hernandez, E., Sharma, A. S., Haklay, T., et al. (2024). “Linearity of Relation Decoding in Transformer Language Models.” ICLR 2024. arxiv.org/abs/2308.09124 (Accessed: June 21, 2026)

[Tier A] Pons, M., Bilalli, B., & Queralt, A. (2024). “Knowledge Graphs for Enhancing Large Language Models in Entity Disambiguation.” ISWC 2024 (LNCS 15231). arxiv.org/abs/2505.02737 (Accessed: June 21, 2026)

[Tier A] Wallat, J., Heuss, M., de Rijke, M., & Anand, A. (2025). “Correctness is not Faithfulness in Retrieval Augmented Generation Attributions.” ICTIR 2025. DOI 10.1145/3731120.3744592; arxiv.org/abs/2412.18004 (Accessed: June 21, 2026)

[Tier A] Liu, N. F., Zhang, T., & Liang, P. (2023). “Evaluating Verifiability in Generative Search Engines.” Findings of EMNLP 2023. arxiv.org/abs/2304.09848 (Accessed: June 21, 2026) — figures reflect 2023-era systems; treat as a historical baseline.

[Tier A] Algaba, A., Mazijn, C., Holst, V., Tori, F., Wenmackers, S., & Ginis, V. (2025). “Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias.” Findings of NAACL 2025. arxiv.org/abs/2405.15739 (Accessed: June 21, 2026)

[Tier A] Gao, T., Yen, H., Yu, J., & Chen, D. (2023). “Enabling Large Language Models to Generate Text with Citations” (ALCE). EMNLP 2023. arxiv.org/abs/2305.14627 (Accessed: June 21, 2026)

[Tier A] Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2024). “Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection.” ICLR 2024. arxiv.org/abs/2310.11511 (Accessed: June 21, 2026)

[Tier A] Sun, Z., et al. (2025). “ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability.” ICLR 2025. arxiv.org/abs/2410.11414 (Accessed: June 21, 2026)

[Tier A] Schick, T., Dwivedi-Yu, J., Dessì, R., et al. (2023). “Toolformer: Language Models Can Teach Themselves to Use Tools.” NeurIPS 2023. arxiv.org/abs/2302.04761 (Accessed: June 21, 2026)

[Tier A] Jeong, S., Baek, J., Cho, S., Hwang, S. J., & Park, J. C. (2024). “Adaptive-RAG: Learning to Adapt Retrieval-Augmented LLMs through Question Complexity.” NAACL 2024. arxiv.org/abs/2403.14403 (Accessed: June 21, 2026)

[Tier A] Trivedi, H., Balasubramanian, N., Khot, T., & Sabharwal, A. (2023). “Interleaving Retrieval with Chain-of-Thought Reasoning” (IRCoT). ACL 2023. arxiv.org/abs/2212.10509 (Accessed: June 21, 2026)

[Tier A] Bowyer, S., Aitchison, L., & Ivanova, D. R. (2025). “Position: Don’t Use the CLT in LLM Evals With Fewer Than a Few Hundred Datapoints.” ICML 2025 (Spotlight). arxiv.org/abs/2503.01747 (Accessed: June 21, 2026)

[Tier B*] Zhao, W., et al. (2024). “WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries.” arXiv preprint (under review; treated as directional). arxiv.org/abs/2407.17468 (Accessed: June 21, 2026)

[Tier B] KG-aligned pretraining line — ERNIE (Zhang et al., 2019, arxiv.org/abs/1905.07129); KEPLER (Wang et al., 2021); KELM (Agarwal et al., 2021). Cited collectively for the claim that KG/Wikipedia-aligned entity information in pretraining improves factual recall and disambiguation. (Accessed: June 21, 2026)

[Tier B] Vercel / MERJ (2025). “The Rise of the AI Crawler” — analysis of 500M+ crawler fetches finding AI crawlers fetch HTML but do not execute JavaScript. vercel.com/blog/the-rise-of-the-ai-crawler (Accessed: June 21, 2026)

[Tier B] Microsoft / Bing Webmaster Tools (2026). “Introducing AI Performance in Bing Webmaster Tools (Public Preview)” — first-party citation counts and reformulated grounding queries across Microsoft Copilot, Bing AI summaries, and undisclosed partner integrations; grounding queries sampled, surfaces aggregated, frequency not faithfulness, excludes ChatGPT/Perplexity/Gemini. blogs.bing.com/webmaster/February-2026/Introducing-AI-Performance-in-Bing-Webmaster-Tools-Public-Preview (Accessed: June 21, 2026)

[Tier D] The Website Specification — Agent Readiness category (Joost de Valk, CC BY 4.0). specification.website/spec/agent-readiness/ (Accessed: June 21, 2026)

[Tier E] Linehan, L., & Guan, X. / Ahrefs (2026). “We Tracked 1,885 Pages Adding Schema. AI Citations Barely Moved.” ahrefs.com/blog/schema-ai-citations (Accessed: June 21, 2026)

[Tier E] Yext (2025). “86% of AI Citations Come from Brand-Managed Sources” — analysis of 6.8M AI citations across ChatGPT, Gemini, and Perplexity. yext.com/research/article/ai-citations-user-locations-query-context (Accessed: June 21, 2026)

[Tier A*] Onweller et al. (2026). “Cited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agents.” arXiv preprint. arxiv.org/abs/2605.06635 (Accessed: June 21, 2026)

[Tier A*] “Source Coverage and Citation Bias in LLM-based vs. Traditional Search Engines” (2025). arXiv preprint (HKUST-Guangzhou & Rutgers). arxiv.org/abs/2512.09483 (Accessed: June 21, 2026)

[Tier A*] “Characterizing Web Search in the Age of Generative AI” (2025). arXiv preprint (Ruhr University Bochum / MPI-SWS). arxiv.org/abs/2510.11560 (Accessed: June 21, 2026)

[Tier B, conflict of interest disclosed] FACTS Benchmark Suite / Google DeepMind (2025). Leaderboard and technical report. arXiv: arxiv.org/abs/2512.10791 (Accessed: June 21, 2026)

[Tier A*] “How Knowledge Popularity Influences and Enhances LLM Knowledge Boundary Perception” (2025). arXiv preprint. arxiv.org/abs/2505.17537 (Accessed: June 21, 2026)

[Tier A*] “How Deep Do LLMs Internalize Scientific Literature and Citation Practices?” (2025). arXiv preprint. arxiv.org/abs/2504.02767 (Accessed: June 21, 2026)

[Tier A] Zhang, Y., Zhang, S., Zhao, J., & Zhao, C. (2025). “RPDR: A Round-trip Prediction-Based Data Augmentation Framework for Long-Tail Question Answering.” EMNLP 2025 (ACL Anthology 2025.emnlp-main.1119). arxiv.org/abs/2602.17366 (Accessed: June 21, 2026)

[Tier A*] “Knowledge Localization in Mixture-of-Experts LLMs” (2026). arXiv preprint. arxiv.org/abs/2603.17102 (Accessed: June 21, 2026)

[Tier A*] “Tracing Relational Knowledge Recall in LLMs” (2026). arXiv preprint. arxiv.org/abs/2604.19934 (Accessed: June 21, 2026)

[Tier B] Cloudflare (2026). “Block AI bots” — bot-management documentation: a managed rule can challenge or block AI-crawler user-agents (GPTBot, ClaudeBot, and others, including verified bots) at the edge, above robots.txt. developers.cloudflare.com/bots/additional-configurations/block-ai-bots — primary platform documentation for the CDN/WAF edge-block mechanism. (Accessed: June 21, 2026)

[Tier A*] Ren, W., Song, X., Wang, F., He, G., & Sun, A. (2026). “Revisiting Parameter-Based Knowledge Editing in Large Language Models: Theoretical Limits and Empirical Evidence” — localized parameter edits propagate along fragile directions (dimensional Collapse Hypothesis) and consistently damage core LLM capabilities; reinforces that the parametric store is not surgically rewritable. arxiv.org/abs/2606.00570 (peer-review pending) (Accessed: June 21, 2026)

[Tier B*] Yao, Z., Zhang, H., & Bi, K. (2026). “AuthorityBench: Benchmarking LLM Authority Perception for Reliable Retrieval-Augmented Generation” — separates PageRank-based source authority (DomainAuth, 10,000 domains) from popularity-based entity authority (EntityAuth, 22,000 entities); adding webpage text consistently degrades authority judgment, indicating authority is distinct from textual style. arxiv.org/abs/2603.25092 (peer-review pending) (Accessed: June 21, 2026)

[Tier B*] Ovcharov, V. (2026). “Citation Grounding: Detecting and Reducing LLM Citation Hallucinations via Legal Citation Graphs” — verifies generated citations against a graph of 100.8 million Ukrainian court decisions; 13–21% of citations hallucinated across five commercial systems even with grounding enabled. arxiv.org/abs/2606.00898 (peer-review pending) (Accessed: June 21, 2026)

[Tier A*] Santillana, J. S. (2026). “Precision Is Not Faithfulness: Coverage-Aware Evaluation of Grounded Generation with a Complete Oracle” — reference-free faithfulness metrics measure only precision and therefore reward abstention; adds a coverage/recall dimension via a complete oracle. arxiv.org/abs/2606.09376 (peer-review pending) (Accessed: June 21, 2026)

[Tier B*] Wen, Y., et al. (2026). “Position: Generative Engine Optimization Creates Underexamined Risks, Governance Must Target Concentration, Disclosure, and Academic Blind Spots” — formalizes the GEO pipeline and flags concentrated influence, undisclosed commercial influence, and academic-industry blind spots. arxiv.org/abs/2606.12439 (peer-review pending) (Accessed: June 21, 2026)

[Tier A*] Shi, K., Sun, W., Zhang, Z., Sun, L., Chawla, N. V., & Ye, Y. (2026). “CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era.” — benchmarks citation hallucination across commercially deployed models via a multi-agent verification pipeline; the source compiled by Onweller et al. for the 11–57% range. arXiv:2602.23452 (v3, 1 May 2026; v1, 26 Feb 2026 listed Yuan et al. as first author). arxiv.org/abs/2602.23452 (peer-review pending) (Accessed: June 21, 2026)

Tier letters per the Article 1 standard: [Tier A] peer-reviewed academic research · [Tier B] primary platform statement or large-scale/credible study pending or outside peer review · [Tier C] independent meta-analysis aggregating ≥10 external sources · [Tier D] reputable journalism or industry study with documented methodology, not vendor-self-published · [Tier E] vendor study (self-published, COI disclosed inline). A starred marker — [Tier A*] or [Tier B*] — denotes work of that tier’s methodological quality whose peer review is still pending; the star is dropped on acceptance or independent replication (Living-Tier). Single-experiment figures are treated as directional. Mechanistic-interpretability evidence is sub-frontier scale (≤13B). Pipeline-mechanics facts are time-stamped to late June 2026 and should be re-verified.

Update Log

V1.0 (June 21, 2026): First publication. State of knowledge as of late June 2026.

About the Author

Manuel Hürlimann is the creator of Digital Authority Engineering (DAE) — the systematic discipline of building machine-verifiable expertise that AI systems recognize, cite, and recommend. Based in Switzerland, he works as a consultant and lecturer at the intersection of AI search behavior, citation analysis, and brand authority. Through the Authority Intelligence Lab at GaryOwl.com, he publishes original research on how AI systems select, evaluate, and cite sources — applying every principle to GaryOwl.com itself as a living lab. This article is the operational sequel to the Five-Gate × Two-Channel model, turning diagnosis into a gate-by-gate remediation method.

Connect: GaryOwl.com · LinkedIn · manuel@octyl.io

Framework Disclosure: The DAE framework is independently developed and not affiliated with any vendor whose products or research are evaluated in this article. The author has no equity, employment, or paid-advisory relationship with Google, OpenAI, Anthropic, Microsoft, Perplexity, Ahrefs, SE Ranking, Semrush, or Muck Rack as of publication date. Where vendor-published research is used (Tier D / Tier E), the conflict of interest is disclosed inline and again in Sources & Methodology; when Tier-A and lower-tier evidence conflict, Tier-A governs. The DAE framework is applied to GaryOwl.com itself as a living lab. The framework is open for use with attribution; the calibration earned from real citation-outcome data is proprietary. Validation is ongoing and published transparently; no guarantees implied. AI behavior varies by model and platform, and the findings here are explicitly time-stamped to late June 2026 because this space changes weekly.

GaryOwl.com – Authority Intelligence Lab
“You cannot optimize your way into an identity the model never learned, and you cannot fake your way into a citation the model actually reasons from. Both are earned.” — Manuel Hürlimann, Digital Authority Engineering