Become The Source All LLMs Quote (GEO & AEO)

Good morning.

When a buyer asks ChatGPT or Claude to recommend a business like yours, it does not run one search.

It runs a dozen, pulls pages from across all of them, and names a few sources as the answer. Most operators are invisible for all twelve.

This issue is the build for becoming the source the machine quotes.

A five-phase system, the prompts to run each phase, and a skill that produces the pages for you.

— Sam

IN TODAY’S ISSUE 🤖

Why search split into two engines that don't share rules
How one question becomes a dozen behind the screen
The 30 percent of the work that earns the citation
FORGE: the five-phase build, start to finish
The Protocol: five prompts, one per phase
The feedback loop that tells you why you weren't cited
A skill that writes citation-grade pages for you
What to hand your team, and what to keep

Let’s get into it.

Search Split Into Two Engines

For twenty years there was one game. You ranked on Google, you got clicks, clicks turned into customers. The whole discipline of SEO grew up around winning a position on a page of blue links.

That page is emptying out. Ahrefs put numbers on it: when Google shows an AI Overview, clicks to the top organic result drop by around 58 percent (Ahrefs). Roughly two-thirds of Google searches now end with no click to anyone. Gartner's 2024 forecast had traditional search volume falling 25 percent by 2026, and that read conservative in hindsight.

Search did not die. It split into two engines that answer to different rules. The old engine ranks pages and sends traffic. The new engine reads the web, decides what is true, and writes the answer itself, citing a handful of sources along the way. Your buyer reads the answer and never sees the list.

Generative Engine Optimization is the work of becoming one of those cited sources. It is not a replacement for SEO. Google has been clear that AI answers draw from the same index and the same ranking systems as regular search, with no separate path you can buy your way into (Google). Pages that already perform in organic search have the best shot at being quoted. The foundation is the same. The 30 percent on top is new, and it is where the citation lives.

What Happens When Someone Asks AI

Picture a custom home builder in a competitive metro. For twenty years the work came from referrals, and the website was fifteen thin pages nobody read.

Then buyers started asking ChatGPT "who's a good custom home builder near me," and the answer came back with confidence and named three competitors. The builder was not on the list. They were not even in the running, because the model never retrieved a single one of their pages.

Here is what runs the moment that question gets typed:

The engine takes the one query and fans it out into eight to twelve sub-questions, dispatched at once across the live web and its own indexes (Search Engine Land). "Best custom home builder near me" becomes "what should a custom build cost," "how long do permits take," "what is the process," "who handles ADUs," and a half-dozen more. The engine retrieves pages for each, reranks them, throws most away, and synthesizes what survives into the answer.

A page that was never indexed cannot be retrieved. A page that was retrieved but answers only one of the twelve sub-questions wins one slice and loses the rest. Whoever answers more of the fan-out gets cited more, and the builder with one generic homepage answers almost none of it.

This is why the businesses winning AI search are not writing one better page. They are building a structure that answers the whole fan-out: every service, every location, every real question a buyer asks before they hire. The builder above rebuilt to over a thousand pages and started getting cited by ChatGPT inside thirty days, in one of the most competitive construction markets in the country. The pages were the moat.

The Thirty Percent That Gets You Cited

The 70 percent that carries over from SEO is the part most operators already half-know: clean structure, fast pages, content organized so a crawler can read it. That groundwork is necessary, and it is not enough on its own. The 30 percent that earns the citation is where the work is now, and it comes down to three levers.

Proprietary data is the lever almost nobody pulls. The models are hungry for information they cannot get anywhere else: your real pricing, your process, your timeframes, the permit data you pulled from the county. It feels mundane to publish, and that is exactly why it works. A competitor can copy your homepage in an afternoon and cannot copy your data at all, and a model citing a specific number reaches for the page that has it.
Structure decides whether you get quoted. Answer the page's core question in the first 70 words, in a passage that reads correctly on its own, because the engine lifts passages rather than whole pages. Name people and credentials consistently, put a real "last updated" date on the page, and add schema at the page level rather than the site level, because the engine parses relationships one page at a time.
Brand mentions matter more than backlinks now. The models lean on the sources they already trust, and that set is concentrated: YouTube passed Reddit as the most-cited domain in early 2026, with Reddit, LinkedIn, and Wikipedia close behind (Adweek). A LinkedIn article and a YouTube video covering the ground you publish give the model two more places to find you.

One thing you can skip with a clear conscience is rewriting your content for machines only and stressing over llms.txt. Google has said it does not use the file and does not plan to, and only a couple of engines respect it. Treat it as cheap insurance, one command to generate, then move on to the work that moves the needle.

FORGE: The Build

The system has five phases. Run them in order and the output is a content structure that answers the fan-out and gets cited.

The name is FORGE, because that is what you are doing to the pages.

	Phase	What it does
F	Find the gap	Map what AI already cites for your buyer's questions, and where the openings are.
O	Organize the architecture	Lay out a hub-and-spoke structure that serves Google and the answer engines at once.
R	Research before you write	Gather sourced facts to disk first, so every page is researched, not invented.
G	Ground it for the engines	Answer-first structure, page-level schema, entity clarity, proprietary data, freshness.
E	Emit and index	Ship, get indexed, then run the feedback loop that tells you why you were skipped.

The hub-and-spoke piece in phase two is the part that scales. Every service becomes a hub, every location or segment a spoke, and each combination earns its own page. That is how fifteen pages becomes a thousand without writing a thousand things from scratch.

Done well, each page solves a real query and stands on its own. Done badly, it is the same template with the city name swapped, and the engines have learned to ignore that.

The Protocol

Five prompts, one per phase. Run them in a single Claude or ChatGPT project so each builds on the last, with web search turned on. Replace the bracketed inputs with your specifics before running.

Step 1: Find The Gap

This prompt maps the fan-out for your buyer and shows you who owns it now. The output is your target list and the exact language your buyers use, which feeds every page after it.

Map the AI-search opportunity for my business and find where I can win.

MY BUSINESS: [what you do, who you serve, the city or market]
MY BUYER: [who hires you and what triggers the search]

Do four things and report each separately.

1. QUERY FAN-OUT. Take the 5 core questions a buyer asks before hiring me. For each, list the 8-12 sub-questions an AI engine would fan it out into. These become my target pages.

2. WHO GETS CITED NOW. For my top 10 buyer queries, search how ChatGPT/Perplexity/Google answer them today and note which businesses or pages get named. Flag queries where no clear source is cited: those are open.

3. THE GAPS. Rank the openings by how winnable they are for a smaller player, easiest first. Favor specific, local, or proprietary-data questions over broad commodity ones.

4. BUYER LANGUAGE. From Reddit, YouTube comments, and forums, pull 15-20 exact phrases my buyers use to describe the problem, the objections, and the outcome they want. Direct quotes only.

Output as four labeled sections I can hand to the next step.

Step 2: Organize The Architecture

This turns the target list into a hub-and-spoke map. The output is a build sheet: which pages are hubs, which are spokes, and how they link.

Design a hub-and-spoke content architecture from the target list above.

Rules:
- Every service is a hub page. Every location or segment is a spoke. Every hub-by-spoke combination that a real buyer would search gets its own page.
- Each page must answer one real query and stand on its own. No pages that only make sense as a template swap.
- Map internal links: every spoke links up to its hub, every hub links down to its spokes, siblings cross-link where a buyer would move between them.

Output:
1. A table of every page: working title, type (hub/spoke), the primary query it answers, and the 3-5 sub-questions it must cover.
2. The internal-linking plan in plain language.
3. A build order, highest-opportunity pages first, using the ranking from Step 1.

Step 3: Research Before You Write

This is the rule that separates researched content from invented content. The output is a sourced fact file the writing step pulls from, so no page ships a number you can't defend.

Build a sourced research file for this page: [page title and the query it answers].

Gather facts from real, namable sources only: government and county data, industry bodies, primary documentation, reputable media, and any first-party data I provide here: [paste your pricing, process, timeframes, results, or other proprietary data].

For every fact, record the claim, the source, the URL, and the date.

Hard rule: if a statistic cannot be sourced, do not include it. Never estimate a number to fill a sentence. If the data doesn't exist, say so.

Output the file as a list of sourced facts plus a short "buyer language" section pulled from Step 1. I will hand this file to the writing step.

Step 4: Ground It For The Engines

This writes the page from the research file in citation-grade structure. The output is the finished page plus its schema, ready to ship.

Write the page for [title], using only the sourced research file above. Do not introduce any fact that isn't in it.

Structure:
- First 70 words: answer the page's core question directly, in a passage that reads correctly on its own if an engine quotes it alone.
- A one-line TL;DR and 3-5 standalone key takeaways.
- Body sections with question-style headings from the sub-questions. Each section's first sentence answers its heading.
- A section leading with my proprietary data.
- An FAQ block mirroring real buyer questions, linking to sibling pages.
- A visible "last updated" date.

Then produce page-level JSON-LD schema (Article plus FAQPage, and LocalBusiness if it's a location page), every field filled from the research file. Name people and credentials consistently in both the copy and the schema.

Write in plain, specific language. No filler, no AI throat-clearing. Read it back and cut any sentence you wouldn't say out loud.

Step 5: Emit And Index

Ship the pages, submit the sitemap so they get indexed, then run this on any query you should be winning and aren't. The output is a ranked fix list you feed back into Step 4.

Run the citation feedback loop for this query: [query I should win].

1. Ask ChatGPT, Perplexity, and Google's AI answer to respond to the query. Record who gets cited and what the answer says.
2. If my page isn't cited, ask the engine directly: what would this page need to be a source you'd cite for this query? Capture the exact reasons.
3. Compare the cited pages to mine: what do they answer that I don't, what data do they show, how is their structure different?
4. Output a ranked list of specific fixes, most impactful first, each tied to a concrete change I can make in Step 4. No vague advice.

There is no finish line here. The engines re-index constantly, so re-run this monthly on the queries that matter.

The Skill: citation-engine

The Protocol shows the moves. Running them by hand for a thousand pages is the part nobody finishes. So this issue ships a skill that runs the build for you.

citation-engine is a Claude skill that takes a page, a batch, or a whole hub-and-spoke cluster and runs four waves on it: research the sourced facts to disk, draft the page answer-first from that research, ground it with page-level schema and entity clarity, then QA every claim against the research and flag anything it can't verify.

The rule underneath all four waves is the one from Step 3: a claim comes from sourced research or it does not go in. It will not invent a statistic to finish a sentence, and the QA wave hunts down any that slipped through, plus the FTC and health-finance-legal traps that get pages pulled.

Here’s the Skill. Copy this, paste, and save into a plain text file:

---
name: citation-engine
description: >-
  Produce citation-grade pages and articles that get retrieved and quoted by AI
  answer engines (ChatGPT, Perplexity, Google AI Overviews) and rank on Google.
  Runs four waves — research, draft, ground, QA — and never ships a fabricated
  statistic. Use when writing a single page, a batch, or a hub-and-spoke content
  cluster for AEO/GEO, or when auditing an existing draft for citability and
  fabricated claims. Triggers: "write an AEO page," "build a content cluster,"
  "make this citable," "QA this draft for made-up stats."
---

# citation-engine

Builds content that answer engines retrieve and quote, not just content that reads well. The engine runs the same four waves whether you hand it one page or a hundred, and the rule underneath every wave is the same: a claim comes from sourced research or it does not go in.

This skill produces the page. A separate audit skill (`geo-optimization`) scores a finished page for citability. Run this to write; run that to grade.

---

## When to use this

- Writing a new page, article, or programmatic page set for a business that needs to show up in AI answers.
- Building a hub-and-spoke cluster (one service hub, many location or segment spokes) at scale.
- Rewriting thin or generic pages into pages with proprietary data and answer-first structure.
- Running a hard QA pass on an existing draft to find fabricated stats, inflated numbers, and legally risky claims before it ships.

If the user only wants a citability score on existing content, point them to `geo-optimization` instead.

---

## The four waves

Run these in order. Check in with the user between waves unless they pass `--auto`. Each wave writes its output to disk so the next wave (and the user) can trace where every fact came from.

### Wave 1 — Research

Find citable facts before writing a word. This is the wave that separates AI-researched content from AI-generated content.

1. Establish the topic, the audience, and the exact questions a buyer would ask an answer engine to arrive here. Decompose the main query into 8–12 sub-questions (this mirrors how AI search fans a single query out across sub-queries). Each sub-question is a candidate heading or a candidate spoke page.
2. Gather facts from real, namable sources only: government and county data, academic and industry bodies, primary documentation, first-party data the user supplies, and reputable media. Prefer web search or a research tool (Tavily, Exa, Perplexity) when available. Note the source and date for every fact.
3. Pull the language real buyers use from forums, Reddit, and YouTube comments. This becomes the phrasing in headings and body copy, so the page matches how people actually ask.
4. Write everything to `./.citation-research/[slug].md`: each fact, its source, its date, and the sub-questions. Add `.citation-research/` to `.gitignore` if a repo is present.

Hard rule: if a statistic is not in the research file, it does not appear in the draft. Never invent a number to fill a sentence. If the data does not exist, say so in plain language or cut the claim.

See `references/research-rules.md`.

### Wave 2 — Draft

Write from the research file, not from memory. For each page:

1. Answer the page's core question in the first 70 words, in a self-contained passage that reads correctly even if an engine lifts it out of context. This is the single highest-leverage move for getting quoted.
2. Add a one-line TL;DR and a short key-takeaways block near the top.
3. Use question-style headings drawn from the sub-questions in Wave 1. Keep each section's answer self-contained at the passage level, because engines retrieve passages, not whole pages.
4. Lead with the proprietary or non-commodity data the user holds. Pricing, process, timeframes, county-level numbers, first-party results: the things a competitor cannot copy and an engine cannot find elsewhere.
5. Run the humanization rules so the draft reads like a person wrote it. See `references/humanization.md`.
6. Close with an FAQ section that ties back to related pages for internal linking.

### Wave 3 — Ground

Make the page machine-readable for both Google and the answer engines.

1. Add page-level JSON-LD schema, not site-level. Engines parse entity relationships per page. Use the right type (Article, FAQPage, LocalBusiness, Service, Product) and fill every field with real values from the research file. See `references/schema-templates.md`.
2. Mark entities clearly: name people, companies, and credentials consistently, and state what each is on first use.
3. Add a visible "last updated" date. Engines weight fresh evidence, and a dated page signals freshness.
4. Confirm answer-first structure, passage self-containment, and internal links to the hub and sibling spokes. See `references/answer-structure.md`.

### Wave 4 — QA

Audit before anything ships.

1. Cross-check every claim against `./.citation-research/[slug].md`. Flag any number, quote, or fact that is not traceable to the research. Fix or cut it.
2. Run the fabrication and legal pass: unsubstantiated "best/#1/guaranteed" claims, fake scarcity, and YMYL (health, finance, legal) overreach. See `references/qa-checklist.md`.
3. Re-audit once after fixes. Produce a short transparency report: what changed, what was cut, and anything still unverifiable that needs a human decision. Never ship an unverifiable claim silently.

---

## Output

For a single page: the finished markdown (or HTML), the JSON-LD block, and the QA transparency report.

For a batch or cluster: one file per page following the user's content directory and frontmatter conventions, plus a hub-and-spoke map showing which pages link to which. If the project has a `CLAUDE.md` or a brand voice file, read it first and match it.

State the deliverable format up front (markdown, MDX, HTML) so the user knows exactly what they are getting.

---

## A note on accuracy

This engine reduces fabrication and catches risky claims. It is not a lawyer or a compliance review. A human signs off on the final pages, especially for health, finance, and legal topics. The QA report exists to make that review fast, not to replace it.

Install it in Claude Code (Settings, then Skills, then Add Skill, and select the file), start a session, and point it at a page:

/citation-engine write an AEO page answering "how long do building permits take in [county]" for my client, using the proprietary data in this file

It pairs with content scoring on the other side. This skill writes the page; an audit skill grades a finished one. Build with one, check with the other.

The skill file is attached to this issue.

Deploying This Across Your Team

Your role here is architect. The first two phases are yours: find the gaps worth winning and lay out the architecture, the judgment calls about where the business should show up. Those do not get delegated.

Hand the rest down. Phases three through five are a repeatable production line your team or the skill can run page after page: research to disk, write to structure, ship, then work the feedback loop every month. Two roles are worth naming and keeping:

One person owns the data going in. It is the input competitors can't match and the one most likely to be left vague, so it needs a name attached to it.
Another owns the monthly feedback loop. The engines re-index whether or not anyone is watching, and a page cited in March can drop out by June.

The operator who treats this as a one-time build gets a spike and a slow fade. The one who runs it as a standing system compounds, because every page that earns a citation makes the next one easier to place.

Most of your competitors are still optimizing for a page of blue links that sends less traffic every quarter.

The businesses building for the new engine are early enough that a smaller operator with better data and a tighter structure can outrank companies ten times their size, the way the early days of any channel reward the people who move first.

That window does not stay open. A year from now, a thousand well-built pages that answer the fan-out will be the baseline, and the advantage will already belong to whoever got there first.

This issue handed you one engine: a way to turn a content structure into citations. The harder question is which structures across your whole business deserve that treatment, in what order, and how they connect to the rest of your operation, so the work compounds instead of sitting in a corner.

That architecture, the full system rather than the single build, is what we work through in Cortex.

Click here for more details on what Cortex is

Talk soon,
Sam Woods
The Editor