The Five-Gate × Two-Channel Coordinate System for Agentic-RAG Authority
By Manuel Hürlimann for GaryOwl.com | Published: May 30, 2026 | Updated: May 30, 2026
Expertise: Digital Authority Engineering | Agentic-RAG Architecture | AI Citation Pipeline Diagnostics
Time to read: 56 minutes · ~13,900 words
Series: Operative Article 6 — DAE Glossary
📌 Navigate
Authority Intelligence Lab · DAE Framework · DAE Glossary · Article 4 · Article 5
📌 Reading Guide
If you read one section: “The Five Gates” — the gate definitions and the dual-channel principle are the diagnostic instrument.
If you are a content strategist: Start with “The Six-Step Triage Protocol” and the “GummySearch Case Study”, then read the gate sections relevant to your current bottleneck.
If you already know the classical Five-Gate model: Skip to “Three Forces Driving This Framework” and the new boxes in §G1b (Fan-Out Inflection) and §G2 (Asymmetric Bot Blocking) for what is new in the agentic-era reframing.
If you are skeptical of yet another agentic-RAG framework: Read “Honest Limitations” and “Independent Industry Validation” first. The framework’s structural claims are externally triangulated by practitioners who arrived at the same diagnosis without knowledge of DAE.
📌 Core Definition
The Five-Gate × Two-Channel Coordinate System is a behavioral model of how agentic AI search systems decide which content survives long enough to be cited. Five Gates (G1a Resolution/Routing, G1b Fan-Out Planning, G2 Retrievability, G3 Credibility Filter, G4 Consensus Pool, G5 Generation-Time Citation × Faithfulness) operate over Two Channels (Parametric, Retrieval). Each Gate is mapped onto one or more of the six DAE Authority Types (#63–#68), modulated by four Cross-Cutting Modifiers (#69 Temporal, #70 Platform, #71 Consensus, #72 Reflection-Iteration).
The five gates are a diagnostic model — a synthesis abstracted from peer-reviewed studies, patent literature and practitioner observation, not a verified blueprint of any one vendor’s pipeline. Within this model, a piece of content must clear all of Gate 1 → Gate 5 on at least one channel to appear as a citation in a generated answer. Closing any gate strongly suppresses the citation; opening four out of five is necessary but not sufficient. Citation suppression in production AI systems is dominated by cliff-shaped failure modes, not by smooth ranking gradients.
TL;DR — Key Takeaways
Agentic RAG is no longer single-shot retrieval. Production pipelines from Google AI Mode to ChatGPT Search route through five sequential gates that decide which content survives long enough to be cited. The visible symptom of failure — your name does not appear in the answer — is identical across all five failure modes. The diagnostic work is to determine which gate is closed before spending budget on remedies that target the wrong gate.
Relative to the standard Five-Gate model, this article — the architectural backbone of the DAE framework’s Agentic-AEO layer — integrates four agentic-era updates: Gate 1 splits into G1a Resolution/Routing and G1b Fan-Out Planning (5–20 sub-queries per query); Gate 5 becomes two-dimensional (Survival × Faithfulness), since up to 57% [Tier A] of citations can be post-rationalized rather than causal; Tool/Endpoint Authority joins #67 Structural Authority as MCP scales to 97 million [Tier D] monthly SDK downloads; and a new #72 Reflection-Iteration Modifier extends the Article-4 modifier set (#69 Temporal, #70 Platform, #71 Consensus). Each is detailed in the Key Insights below.
The framework is triangulated across three independent sources per core claim, Wilson-bounded on every GCS sub-metric, and every industry source has its conflict-of-interest explicitly disclosed.
📌 Key Insights — What This Article Establishes
1. Agentic RAG is a planner → router → tool-mediated retrieval → critic → synthesis loop. One user query produces 5–20 internal sub-queries (King 2026 [Tier E] (COI: iPullRank Founder/CEO); Trivedi IRCoT, ACL 2023 [Tier A]; Jeong Adaptive-RAG, NAACL 2024 [Tier A]). Gate 1 therefore has two distinct sub-stages.
2. Tool surfaces — MCP servers, OpenAPI endpoints, function-callable APIs — are first-class anchorable substrates within the Retrieval channel. (Throughout, the two channels are the parallel paths a citation can travel: the Parametric channel — training-time, encoded in model weights — and the Retrieval channel — query-time, fetched from a live index. They pass and fail independently; this is the second axis of the coordinate system, developed in full below.) The MCP donation to the Linux Foundation’s Agentic AI Foundation on 9 December 2025 cements this as production infrastructure (March 2026 adoption snapshot: 97M monthly SDK downloads, 10K+ active servers).
3. Correctness ≠ Faithfulness. Up to 57% of citations on Cohere Command-R+ / NaturalQuestions are post-rationalized (Wallat et al., ICTIR 2025). Citation survival alone is no longer a sufficient KPI.
4. The dominant production reality as of mid-2026 is single-LLM-multi-prompt, not multi-agent — though planner/executor and specialist-agent architectures are emerging. Three independent sources triangulate: King (iPullRank, 2026) (COI: iPullRank Founder/CEO), Anthropic “Building effective agents” [Tier E] (Schluntz & Zhang, Dec 19, 2024), and Singh Survey §3.4 [Tier C] (arXiv:2501.09136v4, 2026).
5. The Generative Citation Score (GCS) is six-dimensional and Wilson-bounded. Default weights are deliberately unset and will be empirically calibrated in Article 7. The metric-construction methodology is validated by Aggarwal et al. (GEO, KDD 2024) [Tier A].
📌 Evidence Tiers Used in This Article
[Tier A] Peer-reviewed academic research [Tier B] Large-scale industry dataset (>100K samples, vendor-independent) [Tier C] Independent Meta-Analysis (aggregates ≥ 10 external sources, transparent methodology, vendor affiliation disclosed) [Tier D] Industry study with documented methodology, not vendor-self-published [Tier E] Vendor study (self-published, regardless of sample size or methodology quality); COI disclosed inline [Tier DAE] Framework term (synthesized from empirical sources, attributed to DAE) Triangulation principle — applied throughout: every core claim must be supported by three independent sources (or be flagged explicitly as industry consensus without peer-reviewed support).
Source-hierarchy principle: Peer-reviewed primary sources outweigh vendor reports. When the two conflict, the peer-reviewed evidence governs and the vendor figure appears with a COI flag.
Vendor sources include Conflict-of-Interest (COI) disclosures — commercial or affiliation-based interests that may influence findings — in the Sources section. This article follows the DAE Tier System established in Operative Article 1.
📌 First Publication — Original DAE Contributions in This Article
The following constructs appear for the first time in DAE-framework literature in this article:
- The Five-Gate × Two-Channel Coordinate System (the integrated matrix itself)
- The Gate 1 split into G1a (Resolution/Routing) + G1b (Fan-Out Planning)
- The Two-Dimensional Gate 5 (Survival × Faithfulness)
- The Tool/Endpoint Authority sub-type within #67 Structural Authority
- The #72 Reflection-Iteration Modifier as the fourth Cross-Cutting Modifier (extending the Article-4 set of #69 Temporal, #70 Platform, #71 Consensus)
- The Six-Step Triage Protocol
- The Six-Dimensional Generative Citation Score (GCS)
Disclaimer: These constructs are first published here, on this site, by this author. The synthesis draws on Mike King’s Beyond RAG (iPullRank, 20 May 2026), the Quality-Gate-Audit conducted in late May 2026, and the same-week integration of Bettinga’s LinkedIn double-gate analysis (which first publicly surfaced the substrate problem; the robots.txt facts are independently re-verified first-hand here), Landwehr’s Peec-AI Fan-Out-Inflection observation (COI: Peec AI CPO/CMO), and Cummins/Ramp’s marketing-incentives-to-AI-agents experiment (COI: vendor self-published). No claim is made that the underlying ideas (gates, cascades, credibility filters, fanout planning) are novel — every gate in this article rests on prior peer-reviewed work, cited inline. The novelty is the integration into a single 5×2 system with operational metrics and the Article-4-aligned modifier extension.
📌 Key DAE Terms in This Article
Gate — A near-binary filter. A closed gate strongly suppresses citation probability even when other gates are open — probabilistic effects (parametric leakage, hallucination) leave a non-zero floor, not a hard zero.
Channel — One of two parallel paths a citation can travel: Parametric (training-time, model weights) or Retrieval (query-time, live index — including text substrates and tool/endpoint substrates).
Gate 1a — Resolution/Routing — Entity disambiguation and channel routing (text vs. tool).
Gate 1b — Fan-Out Planning — Sub-query decomposition; 5–20 sub-queries per user query.
Tool/Endpoint Authority — Sub-type of #67 Structural Authority. The capacity to be invoked as a tool, not merely cited as prose.
Faithfulness Axis (G5) — Causal use of a cited source vs. post-hoc justification (Wallat ICTIR 2025).
Reflection-Iteration (#72) — Number of critic-driven re-retrieval cycles before synthesis. New cross-cutting modifier, agentic-specific.
Generative Citation Score (GCS) — Six-dimensional Wilson-bounded metric for quantifying which gate is closed (SubQueryCov, RetrievalToCit, RefSurvival, Faithfulness, ToolInclusion, BridgeCentrality).
Dual-Assignment Gate — A finding that a single authority type acts simultaneously through more than one gate. #66 Network Authority is the only authority type so classified, assigned to G1 + G2 + G4.
What You Need from the Previous Articles
This article assumes you have read Article 4, Six Types of Authority AI Systems Actually Measure, and Article 5, Where Structure Actually Works. If you have not:
- From Article 4 you need the six DAE authority types: #63 Entity, #64 Topical, #65 Content, #66 Network, #67 Structural, #68 Reputational — plus the three modifier dimensions: #69 Temporal, #70 Platform, #71 Consensus. This article extends this set with a fourth agentic-specific modifier, #72 Reflection-Iteration.
- From Article 5 you need its core reframing of Structural Authority (#67): it is not one decision but a four-stage cascade — parsing quality, parsing robustness, retrieval granularity, and markup preservation — whose effects are multiplicative. Most brands get the HTML right and lose on the other three, so optimizing the wrong stage wastes budget. This article generalizes that four-stage logic to the full taxonomy.
Everything else builds from these two foundations.
Three Forces Driving This Framework
The Five-Gate × Two-Channel matrix in this article is the synthesis of three converging developments between late 2024 and mid-2026. Each shaped a specific structural decision in the framework.
(1) Mike King’s “Beyond RAG: Why Every AI Search Platform Is Now Agentic and What That Means for Your Content” [Tier E] (COI: King is iPullRank Founder/CEO) (iPullRank, 20 May 2026) provided the strongest single industry synthesis of the agentic-RAG production stack to date, triangulated internally against ReAct (Yao et al., ICLR 2023) [Tier A], Toolformer (Schick et al., NeurIPS 2023) [Tier A], IRCoT (Trivedi et al., ACL 2023) [Tier A], and Self-RAG (Asai et al., ICLR 2024 Oral) [Tier A]. King’s synthesis established that Gate 1 must be split into G1a (Resolution/Routing) and G1b (Fan-Out Planning) — the planner generates 5–20 internal sub-queries per user query, and a brand resolved at G1a can still lose four of five sub-retrievals at G1b. The architectural framing King derives from practice is independently established in the peer-reviewed-grade literature: Nowaczyk (“Architectures for Building Agentic AI”, Springer Nature, forthcoming; arXiv:2512.09458, 10 Dec 2025) [Tier B] argues that reliability in agentic systems is first and foremost an architectural property — emerging from componentisation, schema-validated interfaces, and control/assurance loops. This lifts the agentic-stack claim above a single vendor synthesis and supplies the component vocabulary (planner, tool router, verifier, supervisor) that the five gates operationalize.
(2) Two peer-reviewed papers reframed Gate 5 specifically: Wallat, Heuss, de Rijke & Anand (ICTIR 2025 Best Paper Honorable Mention, DOI 10.1145/3731120.3744592, arXiv:2412.18004) [Tier A] established that up to 57% of citations on Cohere Command-R+ / NaturalQuestions are post-rationalized rather than causally grounded — Faithfulness must be measured as a separate axis from Survival. Saxena, Bommireddy, Padia & Gaur (arXiv:2509.21557 v2, Dec 2025, submitted to NeurIPS 2025 LLM Eval Workshop) [Tier C] quantified the G-Cite vs. P-Cite trade-off across ALCE, LongBench-Cite, REASONS, and FEVER. Gate 5 in this framework is therefore two-dimensional (Survival × Faithfulness), not one-dimensional.
(3) The Linux Foundation’s announcement of the Agentic AI Foundation (AAIF) [Tier D] on 9 December 2025 made Tool/Endpoint Authority an operational reality: by the March 2026 adoption snapshot (corroborated independently by Pento.ai, Truto.one, DigitalApplied, and BraivIQ), the MCP SDK had reached 97 million monthly downloads across Python and TypeScript, with 10,000+ active public servers. Founders Anthropic (donating MCP), OpenAI (donating AGENTS.md), and Block (donating goose) plus eight platinum members (AWS, Anthropic, Block, Bloomberg, Cloudflare, Google, Microsoft, OpenAI) cement this as an interop standard, not a vendor protocol. Tool/Endpoint Authority is therefore a first-class sub-type of #67 Structural Authority in this framework, not an afterthought.
A fourth structural decision follows from the agentic-pipeline reality: the Article-4 modifier set (#69 Temporal, #70 Platform, #71 Consensus) is extended with one agentic-specific modifier, #72 Reflection-Iteration, capturing the Self-RAG / Singh-§3 / HiPRAG / King family of behaviors where the planner re-issues retrieval rounds based on draft critique.
What this framework does NOT change relative to Article 4’s authority taxonomy: The Dual-Channel Principle (Parametric vs. Retrieval) is preserved as the mechanistic decomposition, anchored to Sun et al. ReDeEP ICLR 2025 (Knowledge FFNs vs. Copying Heads). Tool surfaces are a substrate type within the Retrieval channel, not a third channel. The triangulation Algaba NAACL 2025 + Sun ReDeEP ICLR 2025 + Wallat ICTIR 2025 anchors this distinction in peer-reviewed mechanistic interpretability work.
The Dual-Channel Principle
A modern AI system can cite your content through two architecturally distinct channels — the parametric channel (training-time: your brand surfaces from the model’s weights, with no document retrieved) and the retrieval channel (query-time: your URL is fetched live and cited). Treating “citation” as a single phenomenon is the single most common mistake in this entire field.
A modern AI system can cite your content via two architecturally distinct paths:
Parametric channel (training-time). The model was trained on a corpus that included a representation of your content or your brand. At inference time the model can produce your name without ever retrieving a document — purely from its weights. Algaba and colleagues at Vrije Universiteit Brussel showed this directly in their April 2025 follow-up study (arXiv:2504.02767) [Tier C] (preprint): prompting GPT-4o for references on 10,000 focal papers produced 274,951 LLM-generated references with structural and bibliometric properties closely matching the human citation graph — strong evidence that citation networks are internalized parametrically and reproducible without retrieval. The mechanistic substrate of this channel is the late-layer Knowledge FFNs that inject parametric knowledge into the residual stream (Sun et al., ReDeEP, ICLR 2025) [Tier A].
Retrieval channel (query-time). The system performs a search at inference time — through Bing, Google, a vector database, a domain-specific index, or a tool/endpoint surface — and produces a citation by attributing one or more sentences in the answer to the retrieved evidence. This is the path Perplexity, ChatGPT Search, Google AI Mode, and most “AI Overviews”-style features take. The mechanistic substrate is the set of Copying Heads (attention heads with positive OV-matrix eigenvalues) that propagate retrieved-context tokens through the residual stream (Sun et al., ReDeEP, ICLR 2025).
The Retrieval channel itself has two substrate types:
- Text substrate — web pages, vector embeddings, BM25 lexical indexes, document chunks. The classical RAG payload.
- Tool/Endpoint substrate — MCP servers, REST APIs, function-callable schemas, code interpreters, structured-data endpoints. Per Mike King (iPullRank, May 2026) (COI: iPullRank Founder/CEO): “When a tool exists, the router calls the tool instead of citing prose.” (In practice this is a strong tendency, not an absolute rule — many systems run tool calls and text retrieval in parallel, or fall back between them by latency and cost.) This substrate type became operationally significant with the MCP donation to the Linux Foundation on 9 December 2025; by the March 2026 adoption snapshot, MCP had reached 97 million monthly SDK downloads with 10,000+ active servers, validated independently by Lumer et al. (ScaleMCP, arXiv:2505.06416) [Tier E] (COI: PwC co-affiliated) and Pento.ai’s “A Year of MCP” retrospective (independent industry analysis).
Tool substrate is governed by Tool/Endpoint Authority as a sub-type of #67 Structural Authority. It is not a separate channel — it lives entirely within the Retrieval channel. This preserves the mechanistic mapping (Parametric/Retrieval = FFN/Copying-Head substrate) and admits tool surfaces as a first-class anchorable target without adding axes the underlying transformer architecture does not differentiate.
These two channels — Parametric and Retrieval — can pass and fail independently. The most painful diagnostic pattern in our consulting practice in 2025–2026 has been the following: a brand is named correctly by GPT-4o in 70% of relevant prompts (parametric channel open) but is never linked in Perplexity or ChatGPT Search (retrieval channel closed, usually at Gate 2). The owner thinks “we’re doing fine” because the name appears; in fact half the user journey is invisibly broken.
The framework axiom — the Dual-Channel Principle — is:
A citation requires at least one of the two channels to pass all five gates. Treating channel-pass on one path as evidence of channel-pass on the other is a category error.
The Five Gates
The framework retains the five-gate skeleton from the classical Five-Gate model with two agentic-era modifications: Gate 1 splits into G1a (Resolution/Routing) and G1b (Fan-Out Planning); Gate 5 acquires a second axis (Faithfulness alongside Survival).
Gate 1 — Query Triage
G1a — Resolution/Routing
G1a (Resolution/Routing) — the question this sub-gate asks: Has the model (a) resolved this query to one or more entities/topics it has parametric or retrieval anchors for, and (b) decided which channel(s) and which retrieval surface(s) to route this query to?
What “closed” looks like: The model produces an answer about your topic without ever generating your brand or URL as a candidate token — even before any retrieval or ranking step. Or: the model produces a candidate but routes the query to a channel/surface where your content is not present.
The mechanistic-interpretability literature has, over the last eighteen months, given us an unusually concrete picture of where this gate lives inside a transformer. Sun and colleagues, in their ICLR 2025 paper ReDeEP, decomposed RAG hallucination into two attributable substrates: Knowledge FFNs (later-layer feed-forward modules that inject parametric knowledge into the residual stream) and Copying Heads (attention heads that, identified by positive eigenvalues of their OV matrix, transfer information from context tokens into the residual stream). Hallucinations occurred when Knowledge FFNs over-added parametric knowledge while Copying Heads failed to retain external context. Their AARF method (Add Attention Reduce FFN) is an inference-time intervention that increases Copying-Head contribution and dampens Knowledge-FFN contribution, reducing hallucinated content without retraining.
Park & Kim (EMNLP 2025 Main, pp. 29766–29785) [Tier A] extended this with the SIPS metric (Semantic-Informed Parametric Signal), which measures the divergence between hidden states before and after the FFN layer using a semantic-entropy probe rather than ReDeEP’s Jensen–Shannon divergence on raw activations. Augenstein’s ECIR 2025 keynote (arXiv:2603.09654, March 2026) [Tier A] frames the underlying open problem: the interplay between parametric and contextual knowledge is still underexplored, and “when contextual knowledge should overwrite parametric knowledge” is itself a research question.
The routing component of G1a depends on a separate mechanism. Patent evidence: US20240362093A1 [Tier C] (published patent application, not yet granted) documents Google’s “Custom Corpus” routing patent, which describes selecting between query-time corpora based on classifier output. Singh Survey Section 4.1 (arXiv:2501.09136 v4, April 2026) [Tier C] formalizes this as the Single-Agent Router architecture.
📌 Box — Diagnostic Pattern: Title-Tag Loss (LinkedIn Posts)
LinkedIn generates post-page titles from a fixed template, not from a per-post, author-chosen
<title>. In Bettinga’s German-locale view the template renders as<title>Posten | LinkedIn</title>(or “Beitrag von [Name]” in the feed variant); other locales and crawlers see the equivalent template in their own language, sometimes in the slightly richer form “[Name] on LinkedIn: [opening words]”. Either way the title is machine-generated boilerplate, not a deliberate, topic-specific page title — so it gives a model only a weak, generic Gate-1 anchor for which post this is and what it is about, far below what a dedicated article page with a hand-crafted title provides. (This title-pattern observation comes from Bettinga’s German-language analysis and — unlike the robots.txt facts below — is not independently re-verified first-hand here; the displayed string is locale-dependent.)Newsletter pages under
/pulse/, by contrast, derive the<title>from the article headline (schematically<title>[Newsletter Article Headline] | LinkedIn</title>), giving a genuine Gate-1 anchor that ordinary posts lack. This matters differently per channel: where Gate 2 is open (OAI-SearchBot, Googlebot — see §G2 below), the weak post-level title anchor is the relevant limit — a constraint on anchoring, not an absolute wall, since LinkedIn content does still surface on those channels; where Gate 2 is closed (the blocked training and live-fetch crawlers), the policy block dominates regardless of title quality. The net is a double-gate weakness specific to LinkedIn-as-substrate: limited Gate-1 anchoring on posts, plus Gate-2 policy blocking for the AI crawlers LinkedIn disallows.Diagnostic implication: When a brand publishes only on LinkedIn (no own site), it is structurally disadvantaged in AI Search regardless of content quality — not because the content is poor, but because two gates are closed simultaneously at the platform level. Indirect discovery (Googlebot access, reposts, mirrored or cached copies) can still leak some signal, so the effect is strong suppression rather than literal invisibility.
Sources: robots.txt directives — LinkedIn robots.txt, first-hand verified 28 May 2026 (primary source). Title-tag / SERP-pattern observation and the original public surfacing of this double-gate analysis — Juliane Bettinga (SEO consultant & Co-Founder @SEOSOON), LinkedIn-Post May 2026 (COI: SEO consultancy); the title-tag pattern is itself independently verifiable.
Practical implication for content owners. A G1a failure cannot be fixed by adding more pages. It can only be addressed by changing the training-corpus signal (entity disambiguation, schema markup that survives ingestion, Wikipedia/Wikidata presence, citations from already-indexed corpora) or by changing the retrieval signal such that Copying Heads have something to copy from.
G1a is the gate #63 Entity Authority primarily lives at. It is also one of three gates that #66 Network Authority lives at — see the dual-assignment discussion in the Six-Authority Mapping section.
G1b — Fan-Out Planning
G1b (Fan-Out Planning) — the question this sub-gate asks: Has the planner decomposed the user query into a set of internal sub-queries, and does at least one of those sub-queries semantically match content I have published?
What “closed” looks like: The user query is resolved correctly at G1a (your brand is anchored), but the planner produces 5–20 sub-queries that all miss the angle, sub-topic, or framing your content covers. You are anchored but un-retrievable at the sub-query level.
This sub-stage is implicit in the classical Gate 1 and is made explicit here because the empirical evidence for fanout-planning as a separate behavioral stage has become unambiguous:
- Jeong et al., Adaptive-RAG (NAACL 2024 Long, pp. 7036–7050) [Tier A] established a three-class query-complexity classifier (no-retrieval / single-step / multi-step). Multi-step queries trigger fanout — fanout is therefore complexity-conditional, not universal.
- Trivedi et al., IRCoT (ACL 2023 Long, arXiv:2212.10509 v2) interleaves retrieval with chain-of-thought reasoning and demonstrates “11–21 recall points under a fixed-budget optimal recall setup” and “up to 15 F1 points… in downstream few-shot QA performance” on HotpotQA, 2WikiMultihopQA, MuSiQue, and IIRC. Fanout-with-iterative-retrieval is the production pattern.
- King “Beyond RAG” (iPullRank, May 2026) (COI: iPullRank Founder/CEO): “Every modern AI search platform fans out one user query into multiple internal sub-queries before any retrieval happens. If your content only matches the surface query, you lose at the planner stage.”
Three independent sectors triangulate (academia / academia / industry-patent + industry-practitioner). Triangulation met.
📌 Box — Empirical Anchor: ChatGPT Fanout-Query Inflection (8 May 2026)
Industry telemetry from Peec AI (May 2026) provides the first dated, real-world signal of the Gate-1b fanout-planning mechanism. Reddit’s citation share in ChatGPT shifted from a baseline ~2.6% to over 11% within a single week around 8 May 2026. Per Malte Landwehr (CPO/CMO, Peec AI), the cause was a change in ChatGPT’s fanout-planning behavior: the planner began appending “reddit” to a substantially larger share of generated sub-queries, shifting which sources surfaced at G2 retrieval. A concurrent rise of the third-party site GummySearch (0.005% → 0.1% in the same week — see Case Study in the Six-Authority section below) is consistent with the same shift.
Lily Ray (Founder, Algorythmic; VP SEO & AI Search, Amsive) confirmed the interpretation in the comment thread: “When fanout queries contain the term ‘reddit’, these pages rank in addition to Reddit. Makes a ton of sense. I also imagine it might not work forever.”
Diagnostic implication: Gate-1b fanout-planning behavior is non-stationary. The same brand can score open at G1b in one platform-week and closed in the next, without any change to the brand’s own content. Monitoring G1b is continuous, not one-shot.
Sources: Landwehr, M. (26 May 2026). “How to Become a Top Source in ChatGPT with Recycled Reddit Content.” LinkedIn-Article. Tier E (Peec AI proprietary telemetry, vendor-self-published; not externally replicated). Lily Ray (Algorythmic / Amsive), comment thread on the same article — Tier D (independent practitioner comment).
Practical implication for content owners. G1b cannot be optimized by writing one canonical “best answer” page. It demands sub-topic breadth — multiple semantically distinct pages or sections, each addressing a plausible fanout angle. Topic coverage maps that pre-empt the planner’s likely sub-queries (see also Article 4‘s discussion of #64 Topical Authority) are the operational instrument.
Gate 2 — Retrievability
Gate 2 (Retrievability) — the question this gate asks: When the live system executes its retrieval step for each sub-query, does this document (or this tool endpoint) appear in the candidate set?
What “closed” looks like: Your content exists, is well-anchored at G1a, the planner produces fanout sub-queries that semantically match your content — and yet your URL does not appear in the candidate set because (a) the live RAG index does not crawl your domain, (b) your bot policy blocks the AI fetcher, (c) your URL does not rank in the underlying lexical/dense retriever, or (d) the substrate the planner queries (e.g. a tool surface) does not include you.
Gate 2 is the most boringly mechanical gate and therefore the easiest to misdiagnose. Three findings frame it.
First, modern production stacks are not single-stage retrievers but multi-stage retrieve-then-rerank pipelines (Gao et al., RAG Survey, arXiv:2312.10997) [Tier C]; Gao et al., Modular RAG, arXiv:2407.21059) [Tier C]; Barnett et al., CAIN 2024 Seven Failure Points) [Tier A]. Gate 2 itself decomposes into a bi-encoder first stage (top-k retrieval, typically k = 100) and a cross-encoder rerank to top-10. Cohere’s Rerank 4 (Dec 2025) [Tier E; vendor-independent benchmark by Agentset is Tier D] documented this stack and reports an overall +170 ELO improvement over Rerank 3.5 (Pro 1627 vs. v3.5 1457), with up to +400 ELO on business-domain-specific tasks and +300 / +140 ELO on Business/Finance for the Fast variant. Use the Agentset numbers in any decision-relevant context.
Second, the strongest 2026 evidence for what closes Gate 2 in real production AI search comes from the Trustpilot / Seer Interactive study released 12 May 2026 via PR Newswire [Tier D] (COI: Trustpilot-commissioned, Seer-executed): methodology of 804,491 AI responses across ChatGPT, Gemini, Perplexity, and Google AI Mode; 15,783 unique prompts; 1,926 brands segmented into four cohorts (T0=437 / T1=497 / T2=497 / T3=495). The headline of that study is about a Reputational-Authority effect — but the mechanism is a Gate-2 mechanism: the study itself attributes ~99% of Trustpilot citations to the Trustpilot domain ranking organically in retrieval, not to the AI model seeking Trustpilot out by name (Trustpilot’s “3Rs” framing: Recency, Relevance, Ranking; Moz domain authority 94/100 as of 8 May 2026). The result is published as a brand-cohort progression: T0 = 1% citation rate, T1 = 53.5%, T2 = intermediate, T3 = 75.3%. Tier D, COI explicitly disclosed; triangulated by 5W Public Relations Q1 2026 Citation Source Audit (Torossian, 11 May 2026, PR Newswire) reporting a ~3× citation multiplier for brands present across G2/Capterra/Trustpilot/Yelp.
Industry-vertical caveat for the Trustpilot finding. The Trustpilot/Seer methodology states only that the study covered “a range of products and services” with 15,000+ prompts; specific industry verticals are not disclosed in the PR Newswire methodology block. The 1% → 75.3% magnitude is empirically established for industries where Trustpilot is the dominant consumer-facing review platform — retail/e-commerce, travel, financial services, and consumer hospitality, the verticals where Trustpilot’s 361 million review base concentrates. The mechanism (review-platform presence as a Gate-2 lever via organic-search ranking) generalizes to other verticals through their respective dominant review platforms, but the magnitude does not transfer 1:1. Practitioners should read the magnitude as an industry-conditional anchor: in B2B SaaS the analog is G2 / Capterra / TrustRadius; in healthcare it is condition-specific (Healthgrades, Vitals, ZocDoc, Jameda in DACH); in local services it is Google Reviews and Yelp. Re-running the cohort design on those platforms would be required to establish the magnitude per vertical. The framework’s position is that the structural finding (review platforms close Gate 2 via organic ranking) is robust; the 1%/53.5%/75.3% numerical anchors are robust for consumer-facing brands and should not be quoted as universal benchmarks.
📌 Box — Diagnostic Pattern: Asymmetric Bot Blocking (LinkedIn Case)
LinkedIn’s robots.txt (verified first-hand against the live file, 28 May 2026) blocks the major AI crawlers with a full
Disallow: /— among themGPTBot,ChatGPT-User,Google-Extended,anthropic-ai,ClaudeBot,Claude-Web,Claude-User,Claude-SearchBot,cohere-ai,Google-CloudVertexBot,PerplexityBot, andPerplexity-User— and adds a catch-allUser-agent: * → Disallow: /that blocks any crawler not explicitly listed (roughly two dozen AI and scraper agents are fully disallowed in total). One consequential exception qualifies the picture:OAI-SearchBot— OpenAI’s search-indexing crawler, distinct from the blocked training crawlerGPTBotand the blocked live-fetch agentChatGPT-User— is not globally blocked; it receives only the same path-level restrictions asGooglebot. The genericGooglebotis likewise not globally disallowed and retains access to/posts/and/pulse/(LinkedIn Newsletter), whereasGoogle-Extended(Gemini training/grounding) is blocked. The precise net effect at Gate 2 is therefore narrower than a blanket block: AI training and live-fetch access is policy-blocked, but the two search-index channels that feed ChatGPT-search and Google’s AI surfaces remain open at the robots level — subject to the same path limits as any conventional search engine.The asymmetry explains a measured citation pattern: LinkedIn ranks #7 in cross-platform citation share (llmpulse.ai data-studies [Tier E], May 2026: 4.43% of all citations) — but the citations concentrate in Google AI Mode and AI Overviews (Googlebot-routed), while ChatGPT, Gemini, Claude, and Perplexity have LinkedIn nearly invisible in their citation pools.
Diagnostic implication: A Gate-2 audit cannot be reduced to “is the site crawlable?” It must be bot-specific. The same domain can be open for one channel and closed for four others.
Source: Juliane Bettinga (SEO-Expertin & Co-Founder @SEOSOON), LinkedIn-Post May 2026 — Tier D (industry analysis with documented methodology). Data anchor: llmpulse.ai/data-studies/top-cited-domains — Tier D.
Tool-Surface Sub-Section (#67 Sub-Type)
Gate 2 has, since Q4 2025, acquired a second retrieval substrate: tool/endpoint surfaces. When a tool exists for a query class, the router (G1a) increasingly calls the tool instead of dispatching a text retrieval. This makes Tool/Endpoint Authority — a sub-type of #67 Structural Authority — a Gate-2-relevant capability for any brand whose content could be exposed as an endpoint rather than as prose.
The empirical anchor for tool-substrate prevalence is the Model Context Protocol (MCP). Anthropic open-sourced MCP in November 2024 and donated it to the Linux Foundation’s Agentic AI Foundation (AAIF) on 9 December 2025 (Linux Foundation press release; Anthropic news; modelcontextprotocol.io blog) +. AAIF co-founders: Anthropic donates MCP, OpenAI donates AGENTS.md, Block donates goose (per Paperclipped industry reporting). Platinum members: AWS, Anthropic, Block, Bloomberg, Cloudflare, Google, Microsoft, OpenAI.
By the March 2026 adoption snapshot (corroborated independently by four industry sources: Pento.ai’s “A Year of MCP” retrospective [Tier E], Truto.one’s “What is MCP? 2026 Guide for SaaS PMs”, DigitalApplied.com’s “MCP Adoption Statistics 2026”, and BraivIQ’s infrastructure analysis), the MCP SDK had reached 97 million monthly downloads across Python and TypeScript (SDK downloads — including CI runs and transitive dependency installs, not unique users), with 10,000+ active public servers. Compare to launch (November 2024): approximately 2 million downloads per month. The donation date (9 December 2025) preceded the 97M/10K snapshot by approximately three months; the snapshot does not date to the donation.
Lumer et al. ScaleMCP (arXiv:2505.06416) (COI: PricewaterhouseCoopers U.S.A. co-affiliated) provides the academic-format stress-test: “5,000 financial metric MCP servers, across 10 LLM models, 5 embedding models, and 5 retriever types.”
Triangulation for Tool/Endpoint Authority as a #67 sub-type: Schick Toolformer (NeurIPS 2023) + Lumer ScaleMCP (Tier E, COI: PwC) + Linux Foundation AAIF press release (Tier D) + Pento.ai retrospective (Tier D) — academic / empirical / industry-standard / independent industry analysis. Triangulation met.
Practical implication for content owners. If your content can be answered by a tool call (price, availability, calculation, structured data), the question is no longer “is my page indexed?” but “is my endpoint discoverable in the MCP registry the agent’s router queries?” This is a different operational discipline than classical SEO — closer to API product management than to content marketing.
Gate 3 — Credibility Filter
Gate 3 (Credibility Filter) — the question this gate asks: Given the retrieved candidates (text passages or tool outputs), will the generator treat this evidence as credibility-worthy — or will it down-weight it before generation?
What “closed” looks like: Your URL or tool output appears in the candidate set but is filtered out (or down-weighted into invisibility) before generation, because the model has learned that documents of your type, domain class, structural quality, or formatting style are low-credibility.
Three peer-reviewed anchors:
- Pan et al. “Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation” (arXiv:2404.06809, EMNLP 2024) [Tier A] — the CAG framework demonstrated that explicit credibility signaling during generation allows LLMs to discriminate among retrieved documents and significantly outperform vanilla RAG, particularly under noisy/misinformation contexts. The credibility filter is therefore not merely an emergent property; it is a trainable behavior, and production systems are increasingly trained on it.
- Tan et al. HtmlRAG (WWW 2025) [Tier A] — chunkability, hierarchical heading structure, and lead-with-answer prose measurably increase extraction probability. Plain-text dumps from HTML perform worse than structured HTML for the same content.
- Aggarwal et al. GEO: Generative Engine Optimization (KDD ’24), DOI 10.1145/3637528.3671900 [Tier A] — GEO-bench experiments document +41% visibility for Statistics Addition and +115% visibility for Cite-Sources Addition on position-5 content. Quotation Addition contributes +28%. These are content-authority operations that act at Gate 3.
This is the gate at which #67 Structural Authority lives almost exclusively, and at which #65 Content Authority lives partly.
The format question — Markdown vs HTML in the million-token era
A 2026-specific reframing of the Gate-3 extractability question is whether Markdown serving increases citation probability over HTML. The popular practitioner narrative (serve Markdown to AI crawlers via Accept: text/markdown content negotiation, save tokens, get cited more) is debated by four independent lines of 2025–2026 evidence — and the evidence is methodologically heterogeneous rather than convergent.
First, Tan et al. HtmlRAG (WWW 2025) — already cited above — provides the peer-reviewed retrieval-modeling perspective. Their finding: structured HTML with hierarchical headings, semantic tags, and chunkable structure outperforms plain-text dumps for retrieval-modeling. Markdown sits structurally closer to plain text than to richly-structured HTML. This is a retrieval-stage finding (Gate 2 to Gate 3 transition), not a citation-outcome finding.
Second, the Profound controlled A/B experiment (February 2026) (COI: Profound sells Agent Analytics; documented methodology) — 381 pages across 6 websites, 3-week measurement window, Profound Agent Analytics tracking OpenAI/Anthropic/Perplexity/Meta/DuckAssistBot bots, identical bot-detection logic across treatment and control — found a marginal ~16% mean lift driven almost entirely by high-traffic outlier pages and ~1 extra median visit per page, not statistically significant. Profound’s stated conclusion: “If Markdown were a game-changer, we would have seen it at this scale. We didn’t.”
Third, Thariq Shihipar (Engineering Lead, Claude Code, Anthropic) — “The Unreasonable Effectiveness of HTML” (thariqs.github.io/html-effectiveness/, 8 May 2026) [Tier E] (personal site of Anthropic engineer; 4.4 million views in 16 hours) reframes the original Markdown-default rationale. The token-economy argument that made Markdown the obvious choice in the GPT-4 era (8K–32K context windows, every token billed) has been structurally obsoleted: Claude Opus 4.7 and Sonnet 4.6 now run 1M-token context windows [Tier E] (Anthropic Docs, May 2026), GPT-5.5 supports 1M tokens (OpenAI, 24 April 2026), Gemini 3.1 Pro exceeds 2M. The few-bytes-of-overhead-per-paragraph cost that made Markdown the default is, in Shihipar’s words, now noise. HTML’s richer structural expressiveness — interactive elements, semantic depth, machine-parseable visual hierarchy — wins for both human inspection and machine extraction. Shihipar’s argument is about output format, not crawl-time format; the token-economy point transfers.
Fourth, Grace Cummins / Ramp (April 30, 2026) — “We Tested Marketing Incentives to AI Agents” (builders.ramp.com/post/marketing-to-ai-agents) [Tier E] (COI: vendor self-published on builders.ramp.com; documented methodology; reads as counter-evidence to the Profound/Tan/Shihipar consensus). Ramp ran a three-variant test across ~50 marketing pages: pure Markdown, stripped semantic HTML, and schema-injected HTML. Their headline finding: “Markdown was the only format that reliably surfaced in LLM responses.” Schema-markup, which Ramp had expected to win (literally designed for machines), performed worst.
Targeting confound — important caveat to Ramp’s headline. The three Ramp variants did not share bot-targeting rules. Markdown was served broadly (Cloudflare “AI Assistant” category OR unverified bots with low bot scores); stripped HTML and schema were served only to verified “AI Search” or “AI Assistant” bots. Compounding the issue, Ramp’s own diagnostic Finding #1 documents that Cloudflare’s “AI Search” label does not include ChatGPT, Claude, or Perplexity — those three are classified as “AI Assistants.” A strict “AI Search”-only rule misses the three largest LLM platforms. Ramp acknowledges this directly: “Part of this may reflect a targeting issue.” The cleanest reading: Markdown served to a broader bot set produced more downstream LLM responses than HTML served to a narrower bot set. That is not the same as “Markdown is causally better at producing citations than HTML.”
Synthesis (honest). The four-source evidence base is methodologically heterogeneous: Tan (peer-reviewed, retrieval-modeling, HTML > plain text); Profound (clean A/B, no significant Markdown effect); Shihipar (Anthropic engineering, token-economy argument obsoleted by 1M-context-window era); Ramp (Markdown won, but targeting-confounded). Three of four data points argue against a strong “Markdown is the citation lever” causal claim. The fourth (Ramp) is the strongest pro-Markdown data point but is methodologically not directly comparable to Profound’s clean A/B. The framework’s current synthesis: Markdown serving is a plausible Gate-2 hygiene optimization (token-cost, parser-friendliness), but the citation-causal evidence is mixed, and Tan WWW 2025 retains the peer-reviewed status for the retrieval-modeling layer. Practitioners adopting Markdown content negotiation should expect at most marginal gains, not multipliers.
Practical implication for content owners. Content negotiation via Accept: text/markdown is at most a hygiene-tier optimization — useful for token-cost reduction in API-billed agentic crawls, possibly marginal lift in narrow bot populations (Ramp), but not load-bearing for Gate 3 pass rates per the controlled-A/B (Profound) and peer-reviewed retrieval-modeling evidence (Tan). Investments in HTML structural quality (heading hierarchy, lead-with-answer prose, machine-parseable lists/tables, inline statistics and source-citations per Aggarwal GEO operations) are the empirically more robust Gate-3 levers.
Two related industry findings address the Schema-markup question that practitioners ask first: Search Atlas (Dec 2024, domain-level correlational) [Tier E] (COI: SEO-tool vendor self-published) found no correlation between schema coverage and LLM citation rate. Ahrefs [Tier E] (May 2026, 1,885 pages adding JSON-LD Aug 2025 – Mar 2026, page-level difference-in-differences) (COI: SEO-tool vendor self-published) found Google AI Overviews −4.6%, Google AI Mode +2.4% (n.s.), ChatGPT +2.2% (n.s.). Ramp’s Variant C (schema-heavy) (COI: vendor self-published) also underperformed Markdown and stripped HTML in their three-variant test. Three independent industry findings (all Tier E with disclosed COI) converge: schema markup is a hygiene factor, not a Gate-3 lever — and certainly not a remedy for a closed Gate 1 or Gate 2. Microsoft/Canel (SMX München March 2025, paraphrased via Schwartz, Search Engine Land 20 March 2025; cross-confirmed via David Mihm LinkedIn coverage) (third-party trade-press reporting of vendor statement) is the counter-example: Bing/Copilot uses schema for entity-graph signaling, which is a Gate-1 mechanism, not a Gate-3 lever.
Gate 4 — Consensus Pool & Pairwise Re-rank
Gate 4 (Consensus Pool) — the question this gate asks: Across the (typically 3–10) candidate documents that have survived Gate 3, do multiple of them agree, and is your document part of the agreeing set?
What “closed” looks like: Your URL is retrieved and is credibility-acceptable but lies off the consensus axis. The generator produces an answer dominated by the consensus and either omits your URL or cites it only as a contrast.
Gate 4 is where most of the heavy 2026 industry-published data lives, and where the noisiest noise around “generative engine optimization” originates. The framework’s job here is to separate the peer-reviewed mechanism (which is real and Tier A) from the vendor anecdote (which is real-but-COI-flagged).
Peer-reviewed mechanism (Tier A). Yang & Menczer (ACM WebSci 2025, DOI 10.1145/3717867.3717903; arXiv:2304.00228 v3, Feb 2025) [Tier A] — “Accuracy and Political Bias of News Source Credibility Ratings by Large Language Models” found that “LLMs exhibit a high level of agreement among themselves (average Spearman’s ρ = 0.79)” when rating news-source credibility across 9 LLMs and 3 providers. Schuster et al. (arXiv:2601.03746, Jan 2026) [Tier C] (preprint) — “Whose Facts Win? LLM Source Preferences under Knowledge Conflicts” — showed that multi-source agreement is the dominant signal for which sentence-level claim a generator decides to attribute. Naser (arXiv:2603.03299, March 2026) [Tier B; 69,557 citation instances across 10 commercial LLMs in four academic domains] found that “multi-model consensus (with more than 3 LLMs citing the same work) yields 95.6% accuracy, a 5.8-fold improvement” over baseline.
The Pairwise-Rerank sub-mechanism within G4 is documented by Google patent US20250124067A1 [Tier C] (published patent application, not yet granted) — Pairwise Ranking Prompting. King (iPullRank, May 2026) (COI: iPullRank Founder/CEO): “Your content is being compared head-to-head against every other surviving candidate. Most production stacks now use an LLM-as-judge cross-encoder for this step.”
Industry-published trend evidence (Tier D, vendor-commissioned, COI-flagged). The Trustpilot/Seer Interactive March 2026 study (PR Newswire, 12 May 2026) (Trustpilot-commissioned, executed by Seer Interactive — methodology disclosed, COI explicit) reports that review-and-trust websites are “the second most cited source type, accounting for 14% of all citations in AI responses.” Direction triangulated by 5W Q1 2026 Citation Source Audit reporting a ~3× citation multiplier for brands across G2/Capterra/Trustpilot/Yelp; the absolute magnitudes remain provisional pending peer-reviewed replication.
This is #68 Reputational Authority acting at Gate 4. It is also one of the three places where #66 Network Authority acts — see the dual-assignment discussion in the Six-Authority Mapping section.
Gate 5 — Generation-Time Citation × Faithfulness (the two-axis gate)
Gate 5 is two-dimensional. A citation must both survive into the generated answer (Survival) and reflect causal use of the source rather than post-rationalization (Faithfulness) — Wallat et al. (ICTIR 2025) found up to 57% of citations are post-rationalized, which forced the second axis.
Questions the gate asks:
- Survival: Once an answer has been drafted, will the generator attach a citation marker to your URL — or to a different surviving candidate — or to nothing at all?
- Faithfulness: If a citation marker is attached, does it reflect causal use of the cited source — or is it post-rationalized?
Classical Five-Gate treatments model Gate 5 as one-dimensional (Survival only). The agentic-era framework here makes it two-dimensional. The second axis was forced by the peer-reviewed evidence that surfaced in Q4 2025 / Q1 2026.
The Faithfulness finding (Wallat et al., ICTIR 2025). Wallat, Heuss, de Rijke & Anand (ICTIR 2025, DOI 10.1145/3731120.3744592, arXiv:2412.18004) — Best Paper Honorable Mention, ACM SIGIR-affiliated conference established four desiderata for trustworthy citations: Correctness, Faithfulness, Appropriateness, Comprehensiveness. Verbatim faithfulness definition: “the model’s reliance on cited documents is genuine, reflecting actual reference use rather than superficial alignment with prior beliefs, which we call post-rationalization.” Experimental anchor: Cohere Command-R+ (104B parameters, 128k context, 4-bit quantized) on NaturalQuestions (1,444 questions, Top-5 BM25+ColBERTv2 retrieval) — up to 57% of citations lack faithfulness in the relevant-but-uncited-document adversarial condition (273 of 476 recovery cases). At random-adversarial baseline the rate was 12% (116 of 936); for “cited for different reasons” the rate was 55% (290 of 525). Tier A; ACM Best Paper Honorable Mention verified via uva.nl/IRLab announcement (19 July 2025) and ACM conferences best-paper-awards listing.
Million-token-era caveat. The Wallat experiment used Command-R+ with a 128K context window, the 2024-era standard. The conceptual finding — that Faithfulness is a separate axis from Survival, that post-rationalization is a measurable failure mode — is architectural and transfers to 2026-era models (Claude Opus 4.7, GPT-5.5, Gemini 3.1 Pro), all of which now run 1M-token windows (Anthropic Docs, May 2026; OpenAI GPT-5.5 release, 24 April 2026). The specific 57% magnitude on million-token-era models is an open empirical question; one mechanistic prediction is that post-rationalization rates drop as Copying Heads operate under less context-budget pressure, but this prediction has not yet been directly tested. Article 7’s cross-model calibration will include 1M-context-window models specifically to measure this.
The Survival trade-off (Saxena et al., arXiv:2509.21557 v2, Dec 2025). Saxena, Bommireddy, Padia & Gaur (arXiv preprint v2, submitted to NeurIPS 2025 LLM Evaluation Workshop) — workshop submission, accepted-papers list not externally verified as of 28 May 2026 introduced a formal distinction between two citation paradigms:
- G-Cite (Generation-Time Citation): model produces answer text and citation markers in a single decoding pass. Citation decisions are local — based only on what has been written so far and currently-attended evidence.
- P-Cite (Post-hoc Citation): model first drafts the answer, then a second pass adds or verifies citations across the complete draft.
Empirical trade-off (as reported in Table 2 ALCE and Table 4 FEVER of arXiv:2509.21557 v2):
| Benchmark / metric | G-Cite | P-Cite |
|---|---|---|
| ALCE — Coverage | 0.372 | 0.748 |
| ALCE — Citation Correctness | 0.205 | 0.422 |
| LongBench-Cite — Coverage | 0.65 | 0.78 |
| Human-Eval — Answer Correctness | 0.69 | 0.78 |
| Human-Eval — Citation Hallucination | 0.41 | 0.37 |
| FEVER — Citation Correctness | 0.937 | 0.75 |
| FEVER — Coverage | 0.272 | 0.74 |
| ALCE — Latency (s) | 17.237 | 6.077 Zero-Shot: 2.925 |
Bold values indicate the better performer per row. Source: Saxena et al., arXiv:2509.21557 v2, Tables 2–4.
Paper Finding 1 (verbatim): “On ALCE, the advanced P-Cite achieve 75% coverage with 42% correctness, substantially outperforming the advanced G-Cite which reaches 37% coverage and 21% correctness.” Headline recommendation: “We recommend a retrieval-centric, P-Cite-first approach for high-stakes applications, reserving G-Cite for precision-critical settings such as strict claim verification.”
Flag: arXiv:2509.21557 is Tier C (preprint, workshop submission, main-conference peer-review not applicable). Treat magnitudes as directionally robust but quantitatively provisional. Tier-A replication would change the recommendation’s confidence interval, not its direction.
Mechanistic anchor (Sun et al., ReDeEP). The two-axis structure of G5 has a neural correlate: Wallat-style Faithfulness failures correspond to over-active Knowledge FFNs (parametric over-injection), while Saxena-style Survival failures correspond to under-active Copying Heads (retrieved-context under-propagation). Sun et al. (ICLR 2025) thus provides the parametric-vs-retrieval substrate for both G5 axes.
Triangulation Gate 5 (Survival × Faithfulness): Wallat ICTIR 2025 (Tier A — Faithfulness axis) + Saxena arXiv:2509.21557 (Tier C — Survival trade-off across G-Cite/P-Cite) + Sun ReDeEP ICLR 2025 (Tier A — mechanistic substrate). Three independent methodologies. Triangulation met (with Saxena tier-caveat).
The Mapping: Six Authority Types in the Five-Gate × Two-Channel System
This is the heart of the framework. Each authority type from Article 4 maps onto a primary gate (the gate the authority type predominantly acts through) and, where evidence supports, secondary gates. Network Authority (#66) is the one deliberate exception: it acts at three gates simultaneously (dual-assignment finding).
| Authority Type | Primary | Secondary | Channel | Evidence anchor |
|---|---|---|---|---|
| #63 Entity | G1a | — | Parametric | Tier A — Algaba 2025; Sun 2025 |
| #64 Topical | G3 | G1b, G4 | Both | Tier A — Aggarwal 2024; Tan 2025 |
| #65 Content | G3 | G5 | Retrieval | Tier A / C — Aggarwal 2024; Saxena 2025 |
| #66 Network | G4 | G1a + G2 dual-assignment | Both | Tier A — Algaba 2025 (×2); Yang & Menczer 2025 |
| #67 Structural incl. Tool/Endpoint | G2 | G3 | Retrieval | Tier A + B — Tan 2025; Lumer 2025 |
| #68 Reputational | G4 | G5 | Retrieval | Tier A + D — Yang & Menczer 2025; Seer/Trustpilot 2026 (COI); 5W 2026 |
Gate labels: G1a = Resolution/Routing · G1b = Fan-Out Planning · G2 = Retrievability · G3 = Credibility Filter · G4 = Consensus Pool · G5 = Generation × Faithfulness.
The 6×2 Matrix
| Gate | Parametric Channel | Retrieval Channel |
|---|---|---|
| G1a Resolution/Routing | Entity #63, Network #66 | Network #66 (graph-recall features) |
| G1b Fan-Out Planning | (planner uses parametric anchors) | Topical #64 (sub-query coverage) |
| G2 Retrievability | (n/a — parametric is not “retrieved”) | Structural #67 incl. Tool/Endpoint sub-type, Network #66 |
| G3 Credibility Filter | Topical #64 (via training) | Topical #64, Content #65, Structural #67 |
| G4 Consensus Pool | (Network #66, indirect) | Network #66, Reputational #68 |
| G5 Generation × Faithfulness | (rare; mostly retrieval) | Content #65, Reputational #68 |
Four Observations About This Mapping
- Network Authority is the only authority type that appears in three different cells. This is not a bug — it is the framework’s recognition that citation-graph signal is the only authority modality that operates simultaneously in training, retrieval, and consensus. Every other authority type has a primary gate and at most one secondary gate. The triangulation: Algaba et al. NAACL 2025 Findings (Tier A) + Algaba et al. arXiv:2504.02767 follow-up (Tier B — 274K samples, vendor-independent academic team) + Yang & Menczer arXiv:2304.00228 (Tier A). No single peer-reviewed paper tests all three gates simultaneously — the dual-assignment is a theoretically motivated reclassification supported by three independent peer-reviewed lines of evidence. This caveat appears explicitly in Honest Limitations §1 below.
- Gate 2 has no parametric column. This is structural: you cannot be “retrieved” by your own training weights; retrieval is by definition a query-time operation. Authority types that act on Gate 2 (#67 Structural including Tool/Endpoint sub-type, #66 Network’s retrieval-channel manifestation) must work through the live index or the live tool registry.
- Gates 4 and 5 are where vendor-published “AI citation studies” overwhelmingly cluster. This is because vendor research can measure citation outcomes (G5) and pool composition (G4) without instrumenting the model. It cannot easily measure G1a–G3 from outside, which is why almost all peer-reviewed mechanistic work concentrates on G1a–G3 (Sun, Pan, Wallat, Augenstein) and almost all industry research on outcomes concentrates on G4–G5 (Trustpilot, 5W, Ahrefs, Search Atlas). A separate methodology-side strand of industry research — exemplified by Graphite’s “Demystifying Randomness in AI” (Druck & Smith, 2026) [Tier E] (COI: Graphite sells AEO services) — sits orthogonal to that classification: it does not measure citation outcomes but rather how visibility can be measured at all (Wilson-Score confidence intervals on n=10, Sequential Sampling reducing sample needs by 51%, API-vs-Logged-Out cosine similarity of 0.48). The framework cites Graphite as a methodology reference in the GCS construction, not as outcome evidence. Practitioners need all three literatures — peer-reviewed mechanistic, vendor outcome, and measurement-methodology — because they describe complementary halves of the same pipeline.
- Existing citation authority operates as a content-surfacing gatekeeper — independent of format. Grace Cummins / Ramp (“We Tested Marketing Incentives to AI Agents”, 30 April 2026) (COI: vendor self-published on builders.ramp.com; documented methodology, ~50 pages × 1,300+ bot visits × 32-day tracking) documented an empirically clean version of the framework’s #66 Network Authority claim. Their headline finding, separate from the format-variant question: “Pages with higher existing AI citation volumes were far more likely to surface our embedded content. Pages with low existing citation volume got zero incentive mentions, regardless of format.” Ramp names this “a concept of ‘agent trust’ that’s analogous to domain authority in traditional SEO, but the signals are different.” Operationally this is the dual-assignment of #66 Network Authority across G1a (resolution/routing) and G2 (retrievability) and G4 (consensus): a page that the model has already learned to trust is a page whose new content gets a probabilistic head-start at every gate it must pass. This is the strongest 2026 industry-side validation of the dual-assignment finding.
📌 Box — Case Study: Five Gates Opening Simultaneously (GummySearch, May 2026)
In the same week as the Reddit-fanout inflection at ChatGPT (around 8 May 2026 — see §G1b Box above), an obscure third-party site — GummySearch, originally a Reddit search-and-analytics tool — rose from 0.005% to 0.1% of all ChatGPT citations. GummySearch had stopped accepting new customers on 30 November 2025 (the creator failed to negotiate a Reddit API deal at $35k MRR), but its bot-accessible landing pages remained crawlable and indexed.
The landing pages (e.g.
/best-clothing-brands-on-reddit/) exhibit a textbook multi-authority stack:
Authority Type How GummySearch landing pages satisfy it #64 Topical narrow deep topic (“Best clothing brands on Reddit”), 306 reviews from 45 subreddits #65 Content 10+ verbatim quotes per ranked brand, with original Reddit-user attribution and dates #66 Network piggyback on Reddit’s citation-graph position via on-top-of-Reddit data layer #67 Structural listicle #1–#6 ranking, “By Brand / By Product” toggle, machine-parseable #68 Reputational star ratings + user-quote provenance #70 Platform Modifier Reddit’s platform authority transferred via the on-top-of-Reddit layer All five Authority Types and the Platform Modifier light up simultaneously, without GummySearch having any of its own brand-, backlink-, or content-investment budget. The visibility lift is structural, not editorial.
Landwehr’s reverse-engineering (verbatim): “Almost everything on these landing pages is perfect for ChatGPT while it is in its ‘I Love Reddit’ phase.” Four mechanisms identified: listicle format; the term “reddit” mentioned 5+ times per page; subreddit names (r/BuyItForLife) prominent; every score backed by 10+ real-user quotes.
Diagnostic implication: Multi-Authority stacking on a single page is an achievable strategy — but per Lily Ray’s caveat (“might not work forever”), sustainability depends on a specific fanout-configuration that can shift platform-side at any time. The lift is real, the moat is structural-fragile.
Source: Malte Landwehr (CPO/CMO, Peec AI), LinkedIn-Article 26 May 2026 — Tier E (vendor-self-published, Peec AI proprietary telemetry).
Cross-Cutting Modifiers
Four dimensions are not gates but bend gate-pass probabilities across multiple gates. This framework inherits the three Article-4 modifiers (#69 Temporal, #70 Platform, #71 Consensus) without renaming or restructuring, and adds one agentic-pipeline-specific modifier (#72 Reflection-Iteration). “Authority Density” and “Multimodal Surface” — sometimes proposed elsewhere as standalone modifiers — are not maintained as separate dimensions here; they are sub-concepts under #66 Network Authority and #67 Structural Authority respectively.
#69 Temporal Modifier (Freshness)
- Affects: G1b (planner may inject year-tokens into sub-queries), G2 (retrieval indices prefer recent content), G4 (consensus pools rotate), G5 (generation prefers fresh-dated citations).
- Evidence tier: A + D. The causal anchor is Yubo Fang et al. (SIGIR APIR 2025) [Tier A] — seven LLM models tested in a controlled experiment where only the date of identical passages was changed; texts with newer dates rose by up to 95 ranking positions; up to 25% of all relevance decisions flipped solely due to date changes. Industry corroboration: Trustpilot’s “3Rs” framework (Recency, Relevance, Ranking); Ahrefs 17M-citation study finding AI-cited content is 25.7% fresher than organic Google results; Qwairy’s finding that AI systems inject the current year into 28.1% of all sub-queries even when users don’t specify it.
- Mechanism is by design, not emergence: ChatGPT’s production configuration contains
use_freshness_scoring_profile: true(Metehan Yesilyurt, October 2025 discovery via prompt-injection leak — Tier D). - Operational path: Quarterly content refresh cycles for all key pages. Systematic updating of data points and year references. Content age monitoring as a KPI.
#70 Platform Modifier (Inherited Trust)
- Affects: G1a (some platforms have privileged entity-resolution), G2 (some platforms have privileged bot access — see Bettinga LinkedIn case), G4 (platform-trust transfers to documents hosted on the platform).
- Evidence tier: D. Semrush 100M-citation study: Reddit appeared in ~60% of ChatGPT answers (before September 2025), Wikipedia at ~55%. Profound 680M-citation analysis: only 11% of domains are cited by both ChatGPT and Perplexity; only 7 of the top 50 domains appear across all three major platforms. Writesonic (2.4M domains): 67.4% of all cited domains appear on exactly one AI platform.
- Platform Authority is systemically unstable. Reddit citations on ChatGPT collapsed from ~60% to ~10% in September 2025; recovered from 2.6% to >11% on 8 May 2026 via the fanout-planning shift documented in §G1b. Same source, opposite directions, within nine months.
- Operational path: Multi-platform presence strategy prioritized by AI platform preferences. Monthly Cross-AI Coverage tracking. YouTube, Reddit, and LinkedIn as citation entry points — with the explicit caveat that LinkedIn closes Gate 2 for the AI channel (see §G2 Bettinga Box A).
#71 Consensus Modifier (Cross-Source Corroboration)
- Affects: G4 (definitionally), G5 (citation attachment converges on consensus sources).
- Evidence tier: A. Yang & Menczer arXiv:2304.00228 — Spearman ρ = 0.79 cross-LLM agreement; Schuster et al. arXiv:2601.03746 — multi-source agreement as dominant attribution signal; Naser arXiv:2603.03299 — multi-model consensus yields 95.6% citation accuracy, a 5.8-fold improvement.
- Limitations: Consensus is a property of the pool, not of your document. You can write the most accurate document in the world and still fail G4 if it lies off the consensus axis. The remedy is not to write more; it is to seed consensus (third-party coverage, citations from already-consensus sources, structured presence on consensus platforms — Wikipedia, Reddit, G2, Trustpilot, YouTube).
#72 Reflection-Iteration Modifier (Agentic-Specific)
- Affects: G1b (re-planning), G2 (re-retrieval), G3 (re-filtering), G5 (re-citation). Operates across the whole agentic loop.
- Evidence tier: A + B + C. Asai et al. Self-RAG (ICLR 2024 Oral) — model emits reflection tokens (Retrieve / IsRel / IsSup / IsUse) and decides on-the-fly whether to re-retrieve. Singh Survey §3 (arXiv:2501.09136 v4) catalogues Reflection as one of four Agentic Design Patterns. Wu et al. HiPRAG (arXiv:2510.07794) [Tier C] — preprint, self-declared “under review”, venue unconfirmed reports 27% → 2.3% reduction in over-retrieval and a 29% under-retrieval floor, with overall accuracy 65.4% (in-domain) / 67.2% (out-of-domain) on a process-level reward-shaped pipeline.
- Mechanism: The agent’s critic emits a re-retrieval signal when the candidate-set quality falls below a threshold. Each iteration is an opportunity for your content to re-enter the candidate set — and an opportunity for it to be filtered out again.
- Why a modifier and not a gate: Reflection-Iteration is not a separate decision point in the pipeline; it is a count of how many times the existing Gates 1b–5 are re-traversed. A brand can be invisible at iteration 1 and become visible at iteration 3 (or vice versa). The modifier captures iteration-stability rather than single-pass survival.
- Operational path: Audit candidate-set membership across iterations, not only at iteration 1. Brands whose visibility is iteration-1-only are structurally fragile to changes in critic thresholds.
Triangulation #72 Reflection-Iteration: Asai Self-RAG (Tier A) + Singh §3 (Tier C) + HiPRAG (Tier C) + King “Beyond RAG” (Tier E, COI: iPullRank). Triangulation met.
Production Reality: Single-LLM-Multi-Prompt, Not Multi-Agent
An important course-correction is required for any practitioner reading this article: the production architecture of agentic AI search in 2026 is not a constellation of communicating specialized agents. It is, overwhelmingly, a single large language model running tight loops with different prompts at each stage, plus tool calling.
Three independent sources triangulate this finding:
- Mike King (“Beyond RAG”, iPullRank, May 2026) (COI: iPullRank Founder/CEO): “Most production systems are not literal multi-agent constellations. They are a single LLM running tight loops with different prompts at each stage, plus tool calling. The ‘multi-agent’ framing is a presentation layer, not the underlying architecture.”
- Anthropic, “Building effective agents” (Schluntz & Zhang, anthropic.com/research/building-effective-agents, 19 December 2024) [Tier E]: “Consistently, the most successful implementations weren’t using complex frameworks or specialized libraries. Instead, they were building with simple, composable patterns.” And: “For many applications, however, optimizing single LLM calls with retrieval and in-context examples is usually enough.”
- Singh Survey §3.4 (arXiv:2501.09136 v4, April 2026): “While multi-agent collaboration offers significant potential, it is a less predictable design pattern compared to more mature workflows like Reflection and Tool Use.”
Singh’s full taxonomy in v4 deserves precise restatement, because earlier framings often simplified it: two macro-classes of Agentic RAG (Single-Agent, Multi-Agent), six concrete architecture patterns within them (Router, Multi-Agent Collaboration, Hierarchical, Corrective, Adaptive, Graph-Based), plus four cross-cutting Agentic Design Patterns (Reflection, Planning, Tool Use, Multi-Agent Collaboration — the last appears in both layers, a Singh-specific convention). For agentic-RAG diagnosis, the macro-class is “Single-Agent” for virtually all production deployments observed in 2026, with Reflection and Tool Use as the dominant cross-cutting patterns and Multi-Agent Collaboration as the least predictable.
Three sectors triangulate: practitioner-vendor (King) + vendor-engineering (Anthropic) + academic-survey (Singh §3.4). Triangulation met.
Practical implication. Diagnostic effort that targets imagined multi-agent architectures (e.g., “which agent rejected my brand?”) is wasted. The correct unit of diagnosis is the prompt-stage within a single-LLM loop: G1a-prompt, G1b-prompt, G2-retrieval-prompt, G3-rerank-prompt, G4-consensus-prompt, G5-citation-prompt. The Triage Protocol below operationalizes this stage-by-stage diagnosis.
Operationalization — The Generative Citation Score (GCS)
The Generative Citation Score (GCS) is a six-dimensional, Wilson-bounded metric — one dimension per gate-component (SubQueryCov, RetrievalToCit, RefSurvival, Faithfulness, ToolInclusion, BridgeCentrality) — generalizing the classical one-dimensional gate-closure score. The construction methodology follows Aggarwal et al. (GEO: Generative Engine Optimization, KDD ’24, DOI 10.1145/3637528.3671900), who established the legitimacy of user-defined visibility metrics for generative engines.
📌 The Six-Dimensional GCS — Definition
For each dimension d ∈ {SubQueryCov, RetrievalToCit, RefSurvival, Faithfulness, ToolInclusion, BridgeCentrality}, let n = sample size and k = observed positive events.
Step 1 — Point estimate:
p̂d = kd / nd
Step 2 — Wilson lower bound (95% CI):
WilsonLower(p̂d) = [ p̂d + z2/(2n) − z · √(p̂d(1−p̂d)/n + z2/(4n2)) ] / [ 1 + z2/n ]
Step 3 — Per-dimension GCS (closeness):
GCSd = 1 − WilsonLower(p̂d)
(high = gate closed; low = gate open)
Step 4 — Composite GCStotal:
GCStotal = w · [ GCSSubQ, GCSRtoC, GCSRefS, GCSFaith, GCSTool, GCSBridge ]T
where z = 1.96 for a 95% confidence interval, and w is a six-component weight vector. The default weight vector is deliberately unset. Empirical calibration is an open task scheduled for Article 7.
Dimension-Level Tier-A Anchors (per-dimension three-source triangulation)
| GCS Component | What it measures | Tier-A Anchor |
|---|---|---|
| SubQueryCov | Share of fanout sub-queries semantically matching content | Jeong Adaptive-RAG NAACL 2024; Trivedi IRCoT ACL 2023 |
| RetrievalToCit | Retrieve → cite conversion rate | ALCE (Gao et al., arXiv:2305.14627, ACL 2023) |
| RefSurvival | Reference survival across iteration cycles | Asai Self-RAG ICLR 2024 (Oral) |
| Faithfulness | Causal vs. post-rationalized citation | Wallat ICTIR 2025 (Best Paper Honorable Mention); Sun ReDeEP ICLR 2025 |
| ToolInclusion | Brand endpoint inclusion in tool registry | Schick Toolformer NeurIPS 2023; Lumer ScaleMCP (Tier E, COI: PwC) |
| BridgeCentrality | Citation-graph betweenness as bridge node | Algaba NAACL 2025 (Network Authority); multi-hop QA literature |
Sampling Parameters
- Minimum n = 200 prompts per dimension per content item under diagnosis
- Sampling window ≥ 9 days to absorb day-of-week and index-refresh variance (Sielinski R., arXiv:2603.08924, March 2026) [Tier C]
- Bootstrap 95% CI on top of the Wilson interval when n is small (200 ≤ n < 500)
- Flag the diagnostic as inconclusive when the bootstrap CI of the test item overlaps the control item’s CI by > 50%
Triangulation for GCS construction methodology: King 6 Operations Metrics (iPullRank, May 2026) (COI: iPullRank Founder/CEO — used as practitioner-side anchor, not as load-bearing evidence) + Wilson Score (DAE-internal, derived from Wilson 1927; Brown, Cai & DasGupta 2001; Cao arXiv:1809.07694) + Aggarwal GEO KDD 2024 (Tier A; user-defined visibility metric framework). Three sectors: practitioner-operations, statistical-method, peer-reviewed-methodology. Triangulation met for the construction-method choice; the load-bearing peer-reviewed anchor is Aggarwal KDD 2024 (Tier A) plus the Wilson-statistics-methodology layer.
Empirical validation of the Wilson-Score approach in practice: Druck & Smith / Graphite (“Demystifying Randomness in AI”, 2026) (COI: Graphite is an AEO agency selling visibility-measurement services) applied Wilson-Score binomial confidence intervals plus Sequential Sampling to over 200,000 LLM responses across gpt-5.2-chat-latest, ChatGPT-Logged-Out, and Gemini-Logged-Out conditions on 200 entity-comparison prompts × 400 responses. Their empirical findings on the Wilson-Score-plus-Sequential-Sampling workflow: Visibility is estimable with n=10 at Mean Absolute Error ~5.6% across entities; Sequential Sampling reduces required responses from a fixed 60 to an average of 29.4 (a 51% efficiency gain) without loss of CI tightness; the median ratio of observed-to-expected variance is ~1.02, confirming that independent API calls are statistically independent draws from the same distribution. The Graphite paper is not peer-reviewed and the author team has commercial COI (the firm sells AEO measurement), but the methodology is statistically sound — Bowyer, Aitchison & Ivanova (ICML 2025) explicitly validate Wilson-Score intervals as the recommended approach for binary LLM evaluations at small-n.
Two operational consequences for GCS users:
- n=10 per dimension is sufficient for diagnostic GCS reads at MAE ~5–10%, dramatically lowering the measurement-cost barrier.
- The API-vs-Logged-Out cosine similarity of 0.48 reported by Graphite means GCS measured purely on API calls is at best a parallel-reality estimate of the user-facing reality — practitioners running GCS should report which condition they sampled and ideally sample both.
What GCS does and does not do. GCS measures whether a gate dimension is closed and with what confidence. It does not, by itself, tell you why. The Six-Step Triage Protocol below uses GCS as input.
The Diagnostic Tool — Six-Step Triage Protocol
The classical Five-Gate-protocol covers five steps; this framework’s six-step version adds Faithfulness (step 6) and refactors Steps 3–4 to reflect the G1a/G1b split.
Step 1 — Establish Baseline + GCStotal
Build a prompt set of n ≥ 200 prompts representative of your buyer’s journey (informational, comparative, transactional, branded). Sample over ≥ 9 days. Run the same set across at least four platforms (ChatGPT, Gemini, Perplexity, Google AI Mode). Compute GCS per dimension. If GCStotal < 0.3 with tight CI, you have no citation problem at the level you’re measuring. If GCStotal > 0.7 with tight CI, you have a problem and the next five steps localize it.
Step 2 — Sub-Query Coverage Audit (G1b)
For each prompt in your set, inspect (where the platform exposes it) the planner’s fanout sub-queries. In platforms that do not expose the fanout (the majority), use Lily-Ray-style reverse engineering: prompt the platform multiple times with controlled variations of the surface query and observe which sub-topic angles produce citations. If your content matches the surface query but no sub-query, G1b is closed. Remediation: expand topical coverage (multiple semantically distinct pages per topic cluster); see #64 Topical Authority.
Step 3 — Anchoring Probe (G1a)
Prompt the model directly: “List the top 10 brands in [your category]. Do not search the web.” If your brand does not appear with GCS(G1a) > 0.5 — Gate 1a is closed. Remediation lives in Entity Authority (#63) and Network Authority (#66) work: Wikipedia/Wikidata, structured entity data, cross-domain co-occurrence.
Step 4 — Retrievability Probe (G2)
Run the same prompts in a system you can introspect (Perplexity with source-list, or a self-built RAG using Bing/Google APIs, or a self-built MCP-enabled agent). Are your URLs (or your tool endpoints) in the candidate set at all? If not — Gate 2 is closed. Remediation is technical SEO, llms.txt, bot-policy review, and indexability — not content marketing. For tool-substrate brands: audit MCP registry presence, OpenAPI discoverability, and function-call schema completeness.
Step 5 — Credibility + Consensus Differential (G3 / G4)
When your URL is in the candidate set but does not appear in the answer:
- If competitor URLs appear in similar candidate sets and are cited, while yours is not → Gate 3 (credibility) is closed. Remediation: Structural Authority (#67) — schema, citations, structure — plus Content Authority (#65) — extractability, lead-with-answers, fact density. Aggarwal-GEO operations: +41% Statistics Addition, +115% Cite-Sources Addition, +28% Quotation Addition on position-5 content.
- If competitor URLs appear in the answer but draw from a different cluster of co-cited sources → Gate 4 (consensus) is closed. Remediation: Reputational (#68) + Network (#66) — third-party coverage, presence on consensus platforms (Wikipedia, Reddit, G2, Trustpilot, YouTube).
Step 6 — Faithfulness Check (G5)
Even when your URL does appear in the answer with a citation marker, audit whether the citation is causal or post-rationalized. Use the Wallat-style adversarial probe: replace your URL in the retrieval context with a semantically similar but distinct URL, regenerate, and check whether the citation marker moves. If the model still attributes the claim to your URL despite the URL being absent from the retrieval context, the citation is post-rationalized — your brand is technically “cited” but functionally not driving the answer.
Investment Priority Matrix
| Closed Gate | First investment (Tier-A-grounded) | Second investment | Avoid |
|---|---|---|---|
| G1a (Anchoring) | Wikipedia/Wikidata + entity disambiguation | Citation-network seeding (#66) | Schema-only campaigns (Search Atlas + Ahrefs null results) |
| G1b (Fan-Out) | Topical coverage breadth — multiple pages per sub-topic | Anticipate planner’s likely sub-queries | Single “canonical” pages without sub-topic structure |
| G2 (Retrievability) | Indexability, bot policy, llms.txt, technical SEO | MCP/OpenAPI endpoint exposure for tool-substrate | More on-domain prose content |
| G3 (Credibility) | Restructure for extractability (Aggarwal +41% / +115%) | Sourced statistics + quotations in content | Style-only rewrites |
| G4 (Consensus) | Earned third-party presence on consensus platforms | Reviews / review-platform profile build | Brand-controlled “thought leadership” only |
| G5 Survival (Citation) | Use systems that prefer P-Cite (most production) | Multi-model consensus (Naser 95.6%) | One-platform optimization |
| G5 Faithfulness | Make citations causally necessary (unique data the model cannot post-rationalize from priors) | Audit prompt patterns where competitors get post-rationalized citations | Bulk citations of widely-available facts |
Architectural Variations
The Five Gates × Two Channels are universal in shape but their strictness varies across architectures.
- Pure-RAG systems (Perplexity, ChatGPT Search without web tools disabled, Google AI Mode): G1b–G5 dominate; G1a is partially bypassed by aggressive retrieval. Network Authority’s G2 expression is highest here. Tool-substrate negligible.
- Pure-parametric systems (raw GPT-4o without web tools, Claude without document upload): G1a dominates; G1b–G4 are skipped; G5 simplifies to “does the model produce a verifiable URL or hallucinate one?” Naser’s audit found hallucination rates spanning a fivefold range, 11.4–56.8%, depending on model and domain.
- Hybrid systems (GPT-4o + light retrieval, Gemini Deep Research, Claude Projects with Web Search): All five gates active. Most production behavior in 2026 falls here.
- Agentic systems (autonomous deep-research agents, MCP-enabled assistants, Cowork-class file-and-task automation): G1a–G5 are iterated under the Reflection-Iteration Modifier (#72). Tool-substrate is dominant; text-substrate often acts only as fallback when no tool matches. The Singh-Survey taxonomy’s “Adaptive” and “Corrective” patterns describe this architectural family. Empirical citation behavior of agentic stacks is still under-measured outside controlled benchmarks.
Independent Industry Validation
The Five-Gate × Two-Channel structure and the Multi-Authority-Stack diagnosis are framework constructs — they describe failure-mode topology, not directly falsifiable mechanism. The strongest external evidence that the topology is real is when independent practitioners arrive at the same diagnosis without knowledge of the framework.
Dana Billingsley (AI Discovery Intelligence; AI Search & AI Visibility specialist), commenting on Malte Landwehr’s GummySearch analysis (LinkedIn-Article, 26 May 2026), described the same phenomenon in different vocabulary:
“What is especially interesting here is not just the Reddit association itself, but the structure of the recommendation environment being created around it. These pages package: socially validated recommendations, buyer-intent phrasing, comparative context, quote-backed reinforcement, explicit recommendation formatting — into something extremely easy for retrieval and synthesis systems to process. It feels less like classic ‘ranking’ behavior and more like AI systems reinforcing environments that already resemble consensus-oriented recommendation layers. That may end up being the more important takeaway long term than the specific Reddit tactic itself.”
— Dana Billingsley, comment on Landwehr LinkedIn-Article, 26 May 2026 — independent specialist comment
Billingsley’s elements map cleanly onto the framework:
| Billingsley’s element | DAE-Framework Mapping |
|---|---|
| “socially validated recommendations” | #68 Reputational Authority |
| “buyer-intent phrasing” | #64 Topical Authority (intent-mapped) |
| “comparative context” | #66 Network Authority (co-citation context) |
| “quote-backed reinforcement” | #65 Content Authority + #71 Consensus Modifier |
| “explicit recommendation formatting” | #67 Structural Authority |
| “consensus-oriented recommendation layers” | Gate 4 (Consensus Pool) + #71 Consensus Modifier |
| “less like classic ‘ranking’ behavior” | Cliff-shaped failure modes (Article 4 §Five Gates) |
Five of six Authority Types, the Consensus Modifier, and the cliff-shape thesis are independently named by Billingsley in her own analytic language, without reference to the DAE framework. This is external triangulation of the framework’s structural claims — Tier D (industry voice), not peer-reviewed, but evidentially independent. The same article’s comment thread contains Lily Ray’s complementary observation about the non-stationary nature of fanout-planning behavior (cited in §G1b above), giving the validation set two independent practitioner stimulations within one discussion.
Honest Limitations
The framework is operational, not finished. Eight open questions are worth flagging by name, because they are the places where the framework’s confidence is lowest.
Open question 1 — Network Authority dual-assignment is theory-led, not single-paper-confirmed. The reclassification of #66 across G1a + G2 + G4 rests on three independent peer-reviewed lines of evidence (Algaba NAACL Findings 2025; Algaba arXiv:2504.02767 April 2025 follow-up — 274K samples, vendor-independent; Yang & Menczer arXiv:2304.00228) plus the three-source triangulation requirement. No single study simultaneously tests all three gates for #66. A targeted experiment that does so would either confirm or refute the dual-assignment; the framework treats the assignment as the best current synthesis and is open to revision.
Open question 2 — The Six-Dimensional GCS is statistically established but not AI-citation-established. The Wilson interval (1927) and the Wilson-Lower-Bound-for-ranking technique (Cao X. arXiv:1809.07694, 2018) are well-established statistical instruments. The Aggarwal et al. KDD 2024 framework legitimizes user-defined visibility metrics for generative engines. But the specific six-dimensional aggregation in GCS has not been peer-reviewed as the standard metric for agentic-RAG diagnostics. Practitioners should report the underlying p̂, n, and 95% CI per dimension alongside any GCStotal number so downstream readers can verify the calculation. Default weights are deliberately unset and will be empirically calibrated in Article 7.
Open question 3 — #72 Reflection-Iteration Modifier is not isolated in production-RAG pipelines. Asai Self-RAG, Singh §3, HiPRAG, and King “Beyond RAG” all describe Reflection as a real architectural pattern, but no peer-reviewed paper isolates the citation-visibility effect of Reflection-Iteration count in a production RAG pipeline (as opposed to Self-RAG’s controlled benchmark). The modifier is included in this framework because the qualitative evidence is strong; the quantitative magnitude is open.
Open question 4 — Temporal Modifier (#69) magnitude in agentic pipelines is unknown. Yubo Fang et al. (SIGIR APIR 2025) established the causal-direction in classic IR settings. Whether the same magnitudes apply when fanout-planning + reflection-iteration sit between the user query and the retrieval step is empirically untested.
Open question 5 — Saxena G-Cite vs. P-Cite trade-off is Tier C. arXiv:2509.21557 v2 is a workshop preprint whose accepted-papers list could not be externally verified as of 28 May 2026. The numerical findings (37%/75% Coverage, 21%/42% Citation Correctness on ALCE) are directionally robust against the framework’s qualitative claims but should be treated as quantitatively provisional pending main-conference replication.
Open question 6 — Tier-D industry findings have documented methodology but generalizability constraints. Three industry-side anchors in this framework are Tier D rather than peer-reviewed, each with a distinct generalizability boundary that practitioners should hold in mind.
- Bettinga (LinkedIn-robots.txt asymmetry) and Landwehr/Peec AI (8 May 2026 Reddit/GummySearch fanout-inflection): Diagnostic illustrations, not peer-reviewed claims. The robots.txt directives are no longer taken from Bettinga’s post — they are first-hand verified against the live LinkedIn robots.txt (28 May 2026; see Primary Sources), so that part of the analysis rests on a primary source rather than a practitioner report. The Peec AI citation telemetry remains proprietary and not externally replicated.
- Trustpilot / Seer Interactive 1% → 75.3% magnitude: The PR Newswire methodology block states only “a range of products and services” — specific industry verticals are not disclosed. Trustpilot’s 361 million review base concentrates in consumer-facing brands (retail/e-commerce, travel, financial services, hospitality), and the magnitude is empirically established for those verticals. The structural finding (review platforms close Gate 2 via organic-search ranking) generalizes; the magnitude does not transfer 1:1 to verticals where Trustpilot is not the dominant review platform — B2B SaaS (G2/Capterra/TrustRadius), healthcare (Healthgrades/Vitals/ZocDoc/Jameda), local services (Google Reviews/Yelp). Practitioners should treat 1%/53.5%/75.3% as industry-conditional anchors for consumer-facing brands, not as universal benchmarks. Vertical-specific replication on the respective dominant review platforms would be required to establish per-vertical magnitudes.
- Cummins/Ramp marketing-incentives-to-AI-agents (April 2026): Targeting confound between the three variants is acknowledged by the authors; format-effect is not isolable from targeting-effect in this design. The “agent trust” finding (existing-citation-volume as a content-surfacing prerequisite) is robust against the targeting issue and triangulates with #66 Network Authority’s dual-assignment; the format-variant finding (Markdown won) does not survive the targeting confound as a clean causal claim.
Open question 7 — Million-token-era applicability of pre-2026 numerical findings. Several numerical anchors in this article were measured on pre-2026 models with 128K-or-smaller context windows: Wallat ICTIR 2025 (Cohere Command-R+ at 128K), Aggarwal GEO KDD 2024 (GPT-3.5/GPT-4-era), Pan CAG EMNLP 2024 (similar generation), and the bulk of ALCE/LongBench-Cite benchmarks. The conceptual claims — Faithfulness as a separate axis from Survival (Wallat), content-authority operations as Gate-3 levers (Aggarwal), credibility-aware generation as a trainable behavior (Pan) — are architectural and transfer to million-token-era models. The specific magnitudes (Wallat 57%, Aggarwal +41% / +115% / +28%, Pan’s vanilla-RAG performance gap) are pre-2026 snapshots and are not 1:1 transferable to Claude Opus 4.7, GPT-5.5, Gemini 3.1 Pro, or other 1M+-context models. The mechanistic prediction is that some failure modes (post-rationalization under context-budget pressure, lead-position bias under retrieval truncation) attenuate at million-token context; other failure modes (consensus-pool dynamics at Gate 4, faithfulness of attribution at Gate 5) may be largely unchanged. Article 7’s cross-model calibration will measure these explicitly. Practitioners should treat the pre-2026 numerical findings as architectural-direction evidence, not as production-model-current magnitudes.
Open question 8 — Per-model variance in Gate-4/Gate-5 behavior is large and undermechanistic. Grace Cummins / Ramp documented (April 2026) (COI: vendor self-published on builders.ramp.com) a striking per-model disparity in how content surfaces across LLMs even when all three platforms’ bots crawl the same content: Perplexity surfaced an embedded incentive vaguely from day 2; Claude surfaced it specifically (brand name, exact amount, tracked URL, step-by-step instructions) starting day 12 with a ~4× step-change at week 3; ChatGPT crawled the page but produced zero surface mentions across the entire 32-day window. The mechanistic explanation lives at Gate 4 (consensus pool composition) and Gate 5 (generation faithfulness filter), but the framework currently has no peer-reviewed mechanistic account of why ChatGPT’s Gate-4 / Gate-5 filter behaves so differently from Claude’s on identical input content. The Claude step-change at week 3 (no external cause identified, no model release) is itself an open phenomenon — plausibly index-refresh dynamics or trust-aggregation crossover, not yet characterized in the literature. Article 7’s multi-platform Cross-AI Coverage benchmark will quantify the dispersion across the five most-cited platforms; the mechanism behind the dispersion remains an open empirical question.
Two additional caveats apply to the entire framework:
- The mapping table is post-hoc-rationalist in the same sense any classification is. We do not claim the model “knows” about gates; we claim the gates are an accurate description of the failure-mode topology as observed from outside.
- The architectures keep moving. GPT-5.2, Gemini 3, and Claude 4.x all shipped substantial behavior changes in Q1–Q2 2026. The gate topology has been stable for ~18 months; specific gate-pass probabilities are not. Re-run GCS against your current platforms quarterly.
What Comes Next
Article 7 (forthcoming) will turn the Six-Step Triage Protocol into a public, replicable methodology with code, prompt templates, and an empirical calibration of the GCS weight vector across five verticals (B2B SaaS, healthcare, finance, legal, consumer retail). Article 7 will also publish the first multi-platform Cross-AI Coverage benchmark for the DAE framework.
The framework’s claim of completeness in this article is structural — the five gates × two channels, the six authority types with the Tool/Endpoint sub-type, and the four cross-cutting modifiers are, to the best of our triangulated evidence, the complete description of where citation outcomes are decided in agentic-RAG systems as of mid-2026. The framework’s claim of operationalization is bounded — GCS is the metric, Article 7 will be the calibration.
Frequently Asked Questions
Q1. Is Tool/Endpoint Authority a separate channel?
No. Tool surfaces are a substrate type within the Retrieval channel, governed by #67 Structural Authority as a sub-type. The Two-Channel structure (Parametric / Retrieval) remains the mechanistic decomposition, anchored to Sun et al. ReDeEP (Knowledge FFNs vs. Copying Heads). Treating tool surfaces as a third channel was considered and rejected because the underlying transformer architecture does not differentiate text-token retrieval from tool-output-token retrieval at the residual-stream level — they go through the same Copying Heads.
Q2. Why are the modifiers aligned to Article 4’s set (#69–#71)?
Because Article 4 is the live, authoritatively-published anchor of the DAE series (3 April 2026) and established the modifier numbering (#69 Temporal, #70 Platform, #71 Consensus). This article adds one new modifier (#72 Reflection-Iteration) that is agentic-pipeline-specific. “Authority Density” and “Multimodal Surface” — sometimes proposed as standalone modifiers — are not maintained here as separate modifiers; they are sub-concepts under #66 Network Authority and #67 Structural Authority respectively.
Q3. Saxena et al. — is it NeurIPS or not?
arXiv:2509.21557 v2 (18 Dec 2025) is submitted to the NeurIPS 2025 LLM Evaluation Workshop. The workshop accepted-papers list could not be externally verified as of 28 May 2026. The framework treats it as Tier C (preprint with claimed workshop submission, no main-track peer review). All numerical claims from the paper are quoted against Tables 2–4 of the v2 PDF; magnitudes are directionally robust, quantitatively provisional.
Q4. Is the 57% post-rationalization rate real?
Yes, but with context. Wallat et al. (ICTIR 2025, Best Paper Honorable Mention) measured 57% as the upper-bound on Cohere Command-R+ / NaturalQuestions in the relevant-but-uncited-document adversarial condition. The random-adversarial baseline was 12%. The figure is not a universal RAG failure rate — it is the failure rate under a specific adversarial probe, on a specific model, on a specific benchmark. The implication is structural (Faithfulness is a separate axis from Survival), not numerical (do not extrapolate “57% of all RAG citations are fake”).
Q5. What is the GummySearch case really showing?
A page that hits five Authority Types and one Modifier simultaneously can ride a temporary platform-specific fanout configuration to substantial citation share (0.005% → 0.1% in one week). The lift is real and structurally explainable. The moat is structurally fragile — it depends on ChatGPT continuing to inject “reddit” into fanout sub-queries at the May 2026 rate. Per Lily Ray’s caveat: “might not work forever.”
Q6. Does LinkedIn-posting help AI Search visibility?
It depends on the channel — and the precise answer is narrower than a blanket no. LinkedIn’s robots.txt fully blocks AI training and live-fetch crawlers (GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, and ~20 others) plus a catch-all Disallow: / — so Claude, Perplexity, and Gemini-grounding have no robots-level path to LinkedIn content. But OpenAI’s OAI-SearchBot and Google’s Googlebot are only path-restricted, not blocked, so ChatGPT-search and Google AI Overviews/AI Mode can technically still index LinkedIn /posts/ and /pulse/. Where those channels are open, the harder constraint is Gate 1, not Gate 2: LinkedIn posts share an identical Title-Tag pattern that prevents G1a anchoring (newsletters under /pulse/ are the partial exception, with individualized titles). For sustained visibility the framework’s recommendation is unchanged: host on your own domain and use LinkedIn as a distribution channel, not as the canonical publication surface.
Q7. Is the MCP adoption really at 97 million monthly downloads?
Per the March 2026 adoption snapshot, yes — corroborated independently by Pento.ai, Truto.one, DigitalApplied, and BraivIQ. The figure is not as of the donation date (9 December 2025); it is approximately three months later. At launch (November 2024), monthly downloads were approximately 2 million. The trajectory is steeper than typical OSS-protocol adoption curves but consistent with the AAIF-member endorsement cascade. Two caveats on interpretation: SDK downloads are not unique users — the figure includes CI/CD runs, mirror traffic, and transitive dependency installs, so it tracks ecosystem momentum rather than an adopter headcount; and server counts vary by method — the official registry is in preview and excludes private enterprise servers, so public tallies range from ~5,800 to ~15,900 depending on whether registry, package-manager, or GitHub-topic signals are counted.
Q8. Why is the GCS weight vector unset?
Because empirical calibration requires cross-vertical data the framework does not yet have. Each industry vertical (B2B SaaS, healthcare, finance, legal, consumer retail) likely has different gate-criticality profiles. Article 7 will publish the first calibration. In the interim, practitioners should treat each GCS dimension independently and report dimension-level Wilson intervals — not a single composite score.
Q9. How does this map onto Article 4’s six authority types?
One-to-one with one exception. Each of #63–#68 has a primary gate (mostly) and a secondary gate. #66 Network Authority is the dual-assignment exception (G1a + G2 + G4 simultaneously). The mapping table in §”The Mapping: Six Authority Types in the Five-Gate × Two-Channel System” above is the canonical reference.
Sources & Methodology
All sources below are classified by tier and, where applicable, by conflict-of-interest disclosure. Peer-reviewed sources are listed before vendor sources within each tier. Tier definitions appear in the Evidence Tiers box at the start of this article.
Primary Sources (First-Hand Verified)
- LinkedIn robots.txt.
https://www.linkedin.com/robots.txt, verified first-hand against the live file on 28 May 2026. Primary, directly reproducible source for the §G2 Gate-2 analysis. Confirmed: fullDisallow: /forGPTBot,ChatGPT-User,Google-Extended,anthropic-ai,ClaudeBot,Claude-Web,Claude-User,Claude-SearchBot,cohere-ai,Google-CloudVertexBot,PerplexityBot,Perplexity-User, and ~12 further AI/scraper agents (DuckAssistBot,Meta-ExternalAgent/-Fetcher,CCBot,Bytespider,Diffbot,Quora-Bot,DataForSeoBot,Timpibot, others), plus a catch-allUser-agent: * → Disallow: /. Notable exceptions, also confirmed first-hand:OAI-SearchBotandGooglebotare path-restricted but not globally blocked, retaining access to/posts/and/pulse/. Because LinkedIn disallows automated access to its own robots.txt, the file was retrieved through a normal browser session rather than an automated fetch. (Accessed: May 28, 2026)
[Tier A] Peer-Reviewed Primary Research
- Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., & Deshpande, A. (2024). GEO: Generative Engine Optimization. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’24), 5–16. DOI: 10.1145/3637528.3671900. arXiv:2311.09735. (Accessed: May 28, 2026)
- Algaba, A., Mazijn, C., Holst, V., Tori, F., Wenmackers, S., & Ginis, V. (2025). Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias. Findings of the Association for Computational Linguistics: NAACL 2025, 6844–6879. aclanthology.org/2025.findings-naacl.381. (Accessed: May 28, 2026)
- Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2024). Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. ICLR 2024 (Oral). arXiv:2310.11511. (Accessed: May 28, 2026)
- Augenstein, I. (2025). Understanding the Interplay between LLMs’ Utilisation of Parametric and Contextual Knowledge. ECIR 2025 Keynote. arXiv:2603.09654. (Accessed: May 28, 2026)
- Barnett, S., Kurniawan, S., Thudumu, S., Brannelly, Z., & Abdelrazek, M. (2024). Seven Failure Points When Engineering a Retrieval-Augmented Generation System. CAIN 2024. arXiv:2401.05856. (Accessed: May 28, 2026)
- Fang, Y. et al. (2025). Do Large Language Models Favor Recent Content? SIGIR APIR 2025. DOI: 10.1145/3767695.3769493. (Accessed: May 28, 2026)
- Gao, L. et al. (2023). ALCE: Enabling Large Language Models to Generate Text with Citations. EMNLP 2023. arXiv:2305.14627. (Accessed: May 28, 2026)
- Jeong, S., Baek, J., Cho, S., Hwang, S.J., & Park, J.C. (2024). Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity. NAACL 2024 Long, pp. 7036–7050. aclanthology.org/2024.naacl-long.389. (Accessed: May 28, 2026)
- Mallen, A., Asai, A., Zhong, V., Das, R., Khashabi, D., & Hajishirzi, H. (2023). When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories. ACL 2023 Long Papers, pp. 9802–9822. aclanthology.org/2023.acl-long.546. (Accessed: May 28, 2026)
- Pan, R., Cao, B., Lin, H., Han, X., Zheng, J., Wang, S., Cai, X., & Sun, L. (2024). Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation (CAG framework). EMNLP 2024. arXiv:2404.06809. (Accessed: May 28, 2026)
- Park, S.-J., & Kim, K.-M. (2025). Measuring and Mitigating Media Outlet Name Bias in LLMs. EMNLP 2025 Main, pp. 29766–29785. aclanthology.org/2025.emnlp-main.1513. (Accessed: May 28, 2026)
- Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., & Scialom, T. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools. NeurIPS 2023. arXiv:2302.04761. (Accessed: May 28, 2026)
- Sun, J. et al. (2025). ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability. ICLR 2025 (Spotlight). arXiv:2410.11414. (Accessed: May 28, 2026)
- Tan, J., Dou, Z. et al. (2025). HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems. WWW 2025. DOI: 10.1145/3696410.3714546. arXiv:2411.02959. (Accessed: May 28, 2026)
- Trivedi, H., Balasubramanian, N., Khot, T., & Sabharwal, A. (2023). Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions (IRCoT). ACL 2023 Long. arXiv:2212.10509 v2. (Accessed: May 28, 2026)
- Wallat, J., Heuss, M., de Rijke, M., & Anand, A. (2025). Correctness is not Faithfulness in RAG Attributions. ICTIR 2025 (Best Paper Honorable Mention; ACM SIGIR-affiliated). DOI: 10.1145/3731120.3744592. arXiv:2412.18004. (Accessed: May 28, 2026)
- Yang, K.-C., & Menczer, F. (2025). Accuracy and Political Bias of News Source Credibility Ratings by Large Language Models. ACM WebSci 2025. DOI: 10.1145/3717867.3717903. arXiv:2304.00228 v3. (Accessed: May 28, 2026)
- Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 2023. arXiv:2210.03629. (Accessed: May 28, 2026)
[Tier B] Large-Sample Vendor-Independent Datasets (>100K Samples)
- Algaba, A., Holst, V., Tori, F., Mobini, M., Verbeken, B., Wenmackers, S., & Ginis, V. (April 2025). How Deep Do Large Language Models Internalize Scientific Literature and Citation Practices? arXiv:2504.02767. 274,951 GPT-4o-generated references across 10,000 focal papers; Vrije Universiteit Brussel academic team — vendor-independent. Preprint, peer-reviewed venue pending; treated as Tier B based on sample size, vendor independence, and reproducible methodology. (Accessed: May 28, 2026)
- Naser, M. Z. (2026). How LLMs Cite and Why It Matters: A Cross-Model Audit of Reference Fabrication in AI-Assisted Academic Writing. arXiv:2603.03299. 69,557 citation instances × 10 commercial LLMs (~696K total observations); multi-model consensus ≥3 LLMs yields 95.6% accuracy, 5.8× improvement. Academic single-author study (Clemson University) — vendor-independent. (Accessed: May 28, 2026)
[Tier C] Independent Meta-Analyses & Surveys (Aggregating ≥10 External Sources)
- Gao, Y. et al. (2023). Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv:2312.10997. Comprehensive RAG-architecture survey across the academic literature. (Accessed: May 28, 2026)
- Gao, Y. et al. (2024). Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks. arXiv:2407.21059. Modular-RAG meta-synthesis. (Accessed: May 28, 2026)
- Singh, A., Ehtesham, A., Kumar, S., & Khoei, T.T. (2025/2026). Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG. arXiv:2501.09136 v4 (April 2026). Independent academic meta-synthesis. Taxonomy: two macro-classes (Single-Agent, Multi-Agent), six concrete architecture patterns, four cross-cutting Agentic Design Patterns. §3.4 explicitly notes Multi-Agent Collaboration is less predictable than Reflection and Tool Use. (Accessed: May 28, 2026)
[Tier C] (Article-6-specific) Preprints, Patents & Workshop Submissions (Primary Sources Pending Peer Review)
Article-6-specific Tier-C sub-classification for primary sources that are neither peer-reviewed (Tier A), large-sample-vendor-independent (Tier B), independent meta-analyses (Tier C, strict definition), nor vendor-published (Tier D/E). These are academic preprints, patent documents, and workshop submissions pending venue acceptance.
- Cao, X. (2018). Improved Online Wilson Score Interval Method for Community Answer Quality Ranking. arXiv:1809.07694. (Accessed: May 28, 2026)
- Saxena, Y., Bommireddy, R., Padia, A., & Gaur, M. (Sep 2025; v2 18 Dec 2025). Generation-Time vs. Post-hoc Citation: A Holistic Evaluation of LLM Attribution. arXiv:2509.21557 v2. Submitted to NeurIPS 2025 LLM Eval Workshop. Workshop acceptance not externally verified as of 28 May 2026. Numerical findings cited verbatim from Tables 2–4 of the v2 PDF.
- Schuster, T., Gautam, V., & Markert, K. (2026). Whose Facts Win? LLM Source Preferences under Knowledge Conflicts. arXiv:2601.03746. (Accessed: May 28, 2026)
- Sielinski, R. (March 2026). Quantifying Uncertainty in AI Visibility: A Statistical Framework for Generative Search Measurement. arXiv:2603.08924. (Accessed: May 28, 2026)
- Wu, Y., Zhang, Z., Wan, C., Zhao, X., He, X., Du, B., & Chen, J. (October 2025). HiPRAG: Hierarchical Process-Reward Optimization for Adaptive Retrieval in RAG. arXiv:2510.07794. Preprint, self-declared “under review”, venue unconfirmed. (Accessed: May 28, 2026)
Patents — granted:
- US11769017B1 — Google. Generative Summaries. Granted patent. patents.google.com/patent/US11769017B1. (Accessed: May 28, 2026)
Patents — published applications (not yet granted):
- US20240362093A1 — Google. Custom Corpus / Routing. Published October 2024. patents.google.com/patent/US20240362093A1. (Accessed: May 28, 2026)
- US20250124067A1 — Google. Pairwise Ranking Prompting. Published October 2024. patents.google.com/patent/US20250124067A1. (Accessed: May 28, 2026)
- US20240289407A1 — Google. Stateful Chat / Memory. Published patent application. patents.google.com/patent/US20240289407A1. (Accessed: May 28, 2026)
- WO2024064249A1 — Google. Promptagator (Few-Shot Dense Retrieval). PCT international application, published. patents.google.com/patent/WO2024064249A1. (Accessed: May 28, 2026)
[Tier D] Industry Study with Documented Methodology (Not Vendor-Self-Published)
- Agentset (2025). Cohere Rerank 4: A real upgrade over 3.5. Independent benchmark, December 2025. agentset.ai/blog/cohere-reranker-v4. Independent industry benchmark of a vendor product (Cohere Rerank 4); Agentset is not testing its own product, satisfying the not-vendor-self-published criterion for this specific test. (Accessed: May 28, 2026)
- Bettinga, J. (May 2026). Hilft LinkedIn wirklich für Sichtbarkeit in AI Search? LinkedIn-Post / Carousel (3 slides). COI: SEO consultant & Co-Founder @SEOSOON. The §G2 LinkedIn-as-substrate double-gate analysis originates from this post (German-language LinkedIn) and is credited accordingly. The load-bearing robots.txt directives are not relied on from the post — they are independently re-verified first-hand against the live primary source (see Primary Sources above). The title-tag / SERP-pattern observation is drawn from this post. (Accessed: May 28, 2026)
- Billingsley, D. (May 2026). Comment on Landwehr LinkedIn-Article (26 May 2026). AI Discovery Intelligence; AI Search & AI Visibility specialist. Cited verbatim for the framework’s Independent Industry Validation section. (Accessed: May 28, 2026)
- 5W Public Relations (Torossian, R.). Q1 2026 Citation Source Audit. 11 May 2026, PR Newswire. Synthesis of 9 prior industry studies (Similarweb, SEMrush, Profound, Peec AI, SE Ranking, Goodie, Ahrefs, Evertune, Passionfruit). COI: 5W is a PR agency synthesizing third-party data; not vendor-self-published. (Accessed: May 28, 2026)
- Linux Foundation. (9 December 2025). Linux Foundation Announces the Formation of the Agentic AI Foundation. linuxfoundation.org/press. Founding donations: Anthropic (MCP), OpenAI (AGENTS.md), Block (goose). Platinum members: AWS, Anthropic, Block, Bloomberg, Cloudflare, Google, Microsoft, OpenAI. Standards-body announcement, not a vendor-self-published study. (Accessed: May 28, 2026)
- Mihm, D. / Schwartz, B. (20 March 2025). Microsoft Bing/Copilot use schema for its LLMs. Search Engine Land. searchengineland.com/microsoft-bing-copilot-use-schema-for-its-llms-453455 + David Mihm LinkedIn coverage. Third-party SEO-trade-press reporting of vendor (Microsoft/Canel) statement at SMX München. Microsoft/Canel SMX-München statement is a vendor-confirmed paraphrase, not original transcript. (Accessed: May 28, 2026)
- Pento.ai. A Year of MCP: From Internal Experiment to Industry Standard. pento.ai/blog/a-year-of-mcp-2025-review. Independent industry retrospective; corroborating March 2026 adoption snapshot of 97M monthly downloads / 10K+ active servers. Pento.ai is not an MCP vendor — independent analysis. (Accessed: May 28, 2026)
- Ray, L. (May 2026). Comment on Landwehr LinkedIn-Article. Founder of Algorythmic; VP, SEO & AI Search at Amsive. Cited for fanout-planning observation and stability caveat. (Accessed: May 28, 2026)
- Trustpilot / Seer Interactive. “What AI says about you” report. 12 May 2026. PR Newswire and seerinteractive.com/insights. Methodology: 804,491 AI responses across ChatGPT/Gemini/Perplexity/Google AI Mode; 15,783 prompts covering “a range of products and services” (specific industry verticals not disclosed); 1,926 brands; T0–T3 cohort design (n = 437 / 497 / 497 / 495). COI: Trustpilot-commissioned, Seer-executed (third-party agency execution). Industry-vertical caveat (Honest Limitations §6): magnitudes (1% / 53.5% / 75.3%) are robust for consumer-facing brands where Trustpilot is the dominant review platform; they should not be quoted as universal benchmarks for B2B SaaS, healthcare, or local services. (Accessed: May 28, 2026)
[Tier E] Vendor Study (Self-Published, COI Disclosed Inline)
- Ahrefs. (11 May 2026). We Tracked 1,885 Pages Adding Schema. AI Citations Barely Moved. ahrefs.com/blog/schema-ai-citations. COI: Ahrefs is an SEO-tool vendor; published on its own blog. Cited for the schema-effect-on-AI-citations finding. (Accessed: May 28, 2026)
- Anthropic (Schluntz, E., & Zhang, B.). (19 December 2024). Building effective agents. anthropic.com/research/building-effective-agents. COI: Anthropic is an AI vendor. Cited for the explicit production-architecture finding that simple composable patterns + single-LLM-multi-prompt outperform complex multi-agent frameworks. (Accessed: May 28, 2026)
- Anthropic Docs. (May 2026). Context windows. platform.claude.com/docs/en/build-with-claude/context-windows. COI: Anthropic platform documentation. Cited for the May 2026 status: Opus 4.7, Opus 4.6, Sonnet 4.6 at 1M-token context; Haiku 4.5 at 200K. (Accessed: May 28, 2026)
- Cohere. (16 December 2025). Introducing Rerank 4. cohere.com/blog/rerank-4. COI: vendor self-report. Disambiguated against the Agentset benchmark [Tier D] in §G2. (Accessed: May 28, 2026)
- Cummins, G. / Ramp. (30 April 2026). We Tested Marketing Incentives to AI Agents. Here’s What Happened. Ramp Builders Blog. builders.ramp.com/post/marketing-to-ai-agents. COI: Published on Ramp’s own builders blog; Ramp is a corporate-card and finance-tools vendor. Methodology documented in detail: 3-variant test (pure Markdown / stripped HTML / schema-injected) across ~50 marketing pages, Cloudflare Workers conditional serving, unique tracked incentives per variant, 32-day measurement window; 1,300+ bot visits over first 2.5 weeks, ~370 agent relays by day 32 with Claude dominant, ChatGPT zero, Perplexity vague-then-branded by day 33. Known limitation acknowledged by authors: targeting confound — Markdown served broadly (AI Assistant OR unverified low-bot-score), HTML+schema served strictly (verified bots only). Format-effect is not isolable from targeting-effect in this design. Cited for: (a) “agent trust” / existing-citation-volume as content-surfacing prerequisite (§Three/Four Observations); (b) per-model variance in Gate-4/Gate-5 behavior (Open Question 8); (c) format-variant evidence with explicit methodological caveat (§G3 Markdown-vs-HTML synthesis); (d) bot-detection diagnostic findings (Cloudflare label mismatch; OpenAI SearchBot caching; DeepSeek Chrome-58 UA spoofing) for §G2 operational layer. (Accessed: May 28, 2026)
- Druck, G., & Smith, E. / Graphite. (2026). Demystifying Randomness in AI. Graphite Five Percent White Paper. graphite.io/five-percent/demystifying-randomness-in-ai. Methodology: 200 entity-comparison prompts × 400 responses across
gpt-5.2-chat-latest(OpenAI API), ChatGPT-Logged-Out, and Gemini-Logged-Out conditions; >200,000 LLM responses total; Wilson-Score binomial confidence intervals, Sequential Sampling, McNemar’s test, Z-tests; all experimental data publicly accessible via Google Drive. Key findings used in this article: (1) Visibility estimable with n=10 at MAE ~5.6%; (2) Sequential Sampling reduces required responses by 51% without CI-tightness loss; (3) API-vs-Logged-Out cosine similarity 0.48 — API measurements are not a valid proxy for the user-facing reality. COI: Graphite is an AEO agency selling visibility-measurement services; the paper is not peer-reviewed. Author credentials: Druck holds PhD UMass Amherst NLP (1,200+ citations, McCallum lab); Smith is Graphite CEO (MSc UCL, growth marketing). Methodology is statistically rigorous and externally validated by Bowyer, Aitchison & Ivanova (ICML 2025) [Tier A]; all experimental data are publicly available; sample size (200K+ responses) is substantial. Cited in this article as a methodology reference for GCS construction (§GCS Triangulation). Limitations: scope is entity-comparison prompts only; entity-extraction accuracy not formally evaluated; temperature parameter not specified. (Accessed: May 28, 2026) - King, M. (May 20, 2026). Beyond RAG: Why Every AI Search Platform Is Now Agentic and What That Means for Your Content. iPullRank. ipullrank.com/agentic-rag. COI: King is iPullRank Founder/CEO. Substantive vendor synthesis with per-claim triangulation against peer-reviewed sources (ReAct, Toolformer, IRCoT, Self-RAG) — every load-bearing claim from King is independently triangulated against Tier-A evidence in this article. (Accessed: May 28, 2026)
- Nowaczyk, S. (Dec 10, 2025). Architectures for Building Agentic AI (Chapter 3). In Generative and Agentic AI Reliability: Architectures, Challenges, and Trust for Autonomous Systems, Springer Nature (accepted, forthcoming). arXiv:2512.09458v1 [cs.AI], CC BY 4.0. arxiv.org/abs/2512.09458. Center for Applied Intelligent Systems Research, Halmstad University. [Tier B] Peer-review-grade academic anchor for the “reliability is an architectural property” framing; supplies the component vocabulary (planner, tool router, verifier, supervisor) that the five gates operationalize. (Accessed: May 29, 2026)
- Landwehr, M. (26 May 2026). How to Become a Top Source in ChatGPT with Recycled Reddit Content. LinkedIn-Article (peec.ai/blog). COI: Author is CPO/CMO at Peec AI; data source is Peec AI proprietary citation telemetry. Vendor-affiliated; Peec AI sells citation-tracking tools. Methodology documented in-article. Cited for the Reddit-recycle observation in §G1b. (Accessed: May 28, 2026)
- llmpulse.ai. Data Studies: Top Cited Domains. llmpulse.ai/data-studies/top-cited-domains. COI: llmpulse.ai is an AI-citation-analytics vendor; data published on own platform. Citation-share data May 2026: YouTube 26.47%, Reddit 17.39%, Google 15.45%, Instagram 6.78%, Facebook 6.7%, TikTok 4.7%, LinkedIn 4.43%, Apple 2.55%, Wikipedia 2.33%, Trustpilot 1.99%. (Accessed: May 28, 2026)
- Lumer, E. et al. (2025). ScaleMCP: Scaling Tool Selection for Large-Scale Agentic AI Systems. arXiv:2505.06416. COI: PricewaterhouseCoopers U.S.A. co-authored. Cited for 5,000 financial-metric MCP-server stress-test methodology. Although academic preprint format, PwC co-affiliation classifies this as vendor-side under the strict vendor-independence requirement. (Accessed: May 28, 2026)
- Profound. (February 2026). We Ran a Controlled Experiment on Markdown vs. HTML for AI Bots. tryprofound.com/blog/does-markdown-increase-ai-bot-traffic. COI: Profound sells Agent Analytics measurement; published on own blog. Methodology: 381 pages across 6 websites, controlled A/B (192 treatment + 189 control), Profound Agent Analytics, 19 January – 8 February 2026. Result: ~16% mean lift, ~1 median extra visit, statistically not significant. (Accessed: May 28, 2026)
- Search Atlas. (December 2024). The Limits of Schema Markup for AI Search. searchatlas.com/research. COI: Search Atlas is an SEO-tool vendor; published on own research blog. Cited for schema-markup-effect findings. (Accessed: May 28, 2026)
- Shihipar, T. (8 May 2026). Using Claude Code: The Unreasonable Effectiveness of HTML. thariqs.github.io/html-effectiveness/. Personal site of Anthropic engineer (Engineering Lead, Claude Code). 4.4 million views in 16 hours; widely covered (Simon Willison, Lenny’s Newsletter, Hacker News). COI: Anthropic employee; not an Anthropic publication. Cited in §G3 for the million-token-era reframing of the Markdown-vs-HTML format question. (Accessed: May 28, 2026)
Statistical Methodology References
- Wilson, E. B. (1927). Probable Inference, the Law of Succession, and Statistical Inference. Journal of the American Statistical Association 22, 209–212. DOI: 10.1080/01621459.1927.10502953. (Accessed: May 28, 2026)
- Brown, L. D., Cai, T. T., & DasGupta, A. (2001). Interval Estimation for a Binomial Proportion. Statistical Science 16(2), 101–117. DOI: 10.1214/ss/1009213286. (Accessed: May 28, 2026)
Triangulation Audit Results
| Core claim | Source 1 | Source 2 | Source 3 | Triangulated |
|---|---|---|---|---|
| Five-Gate cascade architecture | Gao 2024 [Tier C] | Gao 2023 [Tier C] | Barnett CAIN 2024 [Tier A] | ✅ |
| Dual-Channel (Parametric / Retrieval) | Sun ICLR 2025 [Tier A] | Augenstein 2025 [Tier A] | Pan EMNLP 2024 [Tier A] | ✅ |
| G1a/G1b split (Fan-Out Planning) | Trivedi ACL 2023 [Tier A] | Jeong NAACL 2024 [Tier A] | King [Tier E] (COI) | ✅ |
| G5 two-axis (Survival × Faithfulness) | Wallat ICTIR 2025 [Tier A] | Saxena 2025 [Tier C] | Sun ICLR 2025 [Tier A] | ✅ C-caveat |
| Tool/Endpoint as #67 sub-type | Schick NeurIPS 2023 [Tier A] | Lumer 2025 [Tier E] (COI: PwC) | LF AAIF 2025 [Tier D] + Pento.ai [Tier D] | ✅ |
| #66 dual-assignment (G1a + G2 + G4) | Algaba NAACL 2025 [Tier A] | Algaba 2025 follow-up [Tier B] | Yang & Menczer 2025 [Tier A] | ✅ theory-led |
| Consensus-based citation accuracy | Yang & Menczer 2025 [Tier A] | Naser 2026 [Tier B] | Schuster 2026 [Tier C] | ✅ |
| Single-LLM-multi-prompt dominates production | King 2026 [Tier E] (COI) | Anthropic Dec 2024 [Tier E] | Singh §3.4 [Tier C] | ⚠️ no Tier-A anchor |
| GCS six-dim Wilson construction | King 6 Metrics [Tier E] (COI) | Wilson (internal) | Aggarwal KDD 2024 [Tier A] | ✅ |
| #72 Reflection-Iteration Modifier | Asai ICLR 2024 [Tier A] | Singh §3 [Tier C] | HiPRAG 2025 [Tier C] + King [Tier E] (COI) | ✅ |
| Reputational/review-platform magnitude | Seer/Trustpilot 2026 [Tier D] (COI) | 5W Q1 2026 [Tier D] | — | ⚠️ industry consensus only |
| Schema markup is hygiene, not lever | Search Atlas 2024 [Tier E] (COI) | Ahrefs 2026 [Tier E] (COI) | Ramp 2026 [Tier E] (COI) | ⚠️ 3× vendor convergence, no Tier-A |
| Temporal Modifier (#69) magnitude in agentic pipelines | Fang SIGIR APIR 2025 [Tier A] classic IR setting | Trustpilot “3Rs” [Tier D] | — | ⚠️ agentic magnitude open |
Full citations and DOIs in the Sources & Methodology section above. Tier letters per Article 1 standard: [Tier A] peer-reviewed academic research · [Tier B] large-scale industry dataset (>100K samples, vendor-independent) · [Tier C] independent meta-analysis aggregating ≥10 external sources (plus Article-6-specific extension for preprints/patents/workshop submissions pending peer review) · [Tier D] industry study with documented methodology, not vendor-self-published · [Tier E] vendor study (self-published, COI disclosed inline).
About the Author
Manuel Hürlimann is the creator of Digital Authority Engineering (DAE) — the systematic discipline of building machine-verifiable expertise that AI systems recognize, cite, and recommend. Based in Switzerland, he works as a consultant and lecturer at the intersection of AI search behavior, citation analysis, and brand authority.
Through the Authority Intelligence Lab at GaryOwl.com, he publishes original research on how AI systems select, evaluate, and cite sources — applying every principle to GaryOwl.com itself as a living lab. The Five-Gate × Two-Channel Coordinate System is the architectural backbone of the DAE framework’s Agentic-AEO layer, building on the six-type authority taxonomy (#63–#68) and three modifier dimensions (#69–#71) established in Article 4 and extending it with the agentic-pipeline-specific #72 Reflection-Iteration Modifier.
Connect: GaryOwl.com · LinkedIn · manuel@octyl.io
Framework Disclosure: The DAE framework is independently developed and not affiliated with any vendor whose products are evaluated in this article. The author has no equity, employment, or paid-advisory relationship with Cohere, Anthropic, Block, OpenAI, Google, Microsoft, AWS, Bloomberg, Cloudflare, Trustpilot, Seer Interactive, 5W Public Relations, Ahrefs, Search Atlas, Graphite, iPullRank, PricewaterhouseCoopers, Peec AI, llmpulse.ai, Pento.ai, Truto.one, DigitalApplied, or BraivIQ as of publication date. Where vendor-published research is used (Tier D / Tier E), COI is disclosed inline and again in Sources & Methodology. The framework’s stated preference for peer-reviewed Tier-A evidence under the source-hierarchy principle is consistently applied: when Tier-A and Tier-D evidence conflict, Tier-A governs and Tier-D is reported with its COI flag. The DAE framework is applied to GaryOwl.com itself as a living lab — every framework principle is simultaneously tested on this site. The framework is open for use with attribution. Validation is ongoing and published transparently; no guarantees implied. AI behavior varies by model and platform.
Article Navigation: ← Article 5 | Next: Article 7 (forthcoming) →
GaryOwl.com – Authority Intelligence Lab
“A citation is not a ranking outcome. It is the outcome of five sequential, binary gates — and four of them can be closed before any ‘ranking’ step is reached. Diagnose the gate, not the rank.” — Manuel Hürlimann, Digital Authority Engineering