SEO in the LLM era

For over two decades, the rules of content discovery were clear: optimize for search engines, earn backlinks, climb the rankings, get traffic. SEO was the game, and everyone knew how to play it.

That game is changing. Not because SEO is dead—it isn’t—but because a new player has entered the field. LLMs are reshaping how people find information, and the consequences for content creators are still unfolding.

The Old World: SEO Fundamentals

Traditional SEO boils down to a few well-understood principles: make your content crawlable, make it relevant, and prove it’s authoritative.

The mechanics are familiar. Meta tags tell search engines what a page is about. Sitemaps expose your site structure. Semantic HTML (<article>, <nav>, <header>) helps crawlers understand content hierarchy. Structured data via schema.org provides machine-readable context—product details, article metadata, FAQ blocks.

These fundamentals haven’t disappeared. If anything, they’ve become more important as a foundation. Without clean technical SEO, AI systems have nothing reliable to ingest, understand, or cite. When I migrated this blog to Astro, fixing the SEO basics was a primary motivation—moving from a client-rendered React SPA to static HTML meant search engines could finally see the content without executing JavaScript.

But being crawlable and well-structured is now table stakes. The question is: who’s doing the crawling?

The Shift: From Search Engines to Answer Engines

The way people find information is fragmenting. Google is still dominant, but users increasingly turn to ChatGPT, Perplexity, Claude, and Gemini to answer their questions directly. Instead of browsing ten blue links, they get a synthesized answer—sometimes with citations, sometimes without.

The numbers are striking. A significant share of Google searches now end with zero clicks—the answer appears right on the results page via AI Overviews. Meanwhile, AI-powered search tools are growing fast. Bots like GPTBot, ClaudeBot, and PerplexityBot now account for a meaningful share of crawl traffic on many sites.

This creates a fundamental tension for content creators. You invest effort in creating content, a model ingests it, and delivers the answer to users without them ever visiting your site. Your content is valuable enough to be used, but not valuable enough to be visited.

The Alphabet Soup: GEO, AEO, LLMO

As the landscape shifts, new acronyms have emerged to describe different facets of optimization:

GEO (Generative Engine Optimization): Creating content that performs well in generative search results—how AI models summarize, synthesize, and cite your content
AEO (Answer Engine Optimization): Optimizing for direct answers—featured snippets, voice search, chatbots. Think “position zero”
LLMO (Large Language Model Optimization): Structuring content so LLMs understand, trust, and accurately reference it across all contexts—not just search

There’s overlap between these, and the terminology hasn’t settled. Some practitioners argue it’s all just SEO evolving. Others insist each discipline requires distinct strategies. The hiring market seems to be converging on “AI Search Optimization” (AISO) as an umbrella term.

My honest take? It’s too early to declare any of these a mature discipline. The underlying shift is real—content discovery is diversifying beyond traditional search. But much of the specific tactical advice floating around is speculative. The models change faster than the optimization playbooks.

Note

The core philosophy of search has shifted from convincing an algorithm your page is the most relevant result for a keyword, to convincing an AI model your brand is the most authoritative entity for a topic. This is a move from a “link economy” to a “knowledge economy.”

The Zero-Click Problem

Let’s address this directly: if LLMs answer queries without sending traffic to your site, why bother optimizing for them?

It’s a fair concern, and there’s no clean answer yet. But here’s how I think about it:

The concern is real. If an LLM fully answers a user’s question using your content, you get no visit, no ad impression, no conversion opportunity. For businesses that depend on traffic-driven revenue, this is a genuine threat.

But visibility still has value beyond clicks. When an LLM cites your content or mentions your brand as authoritative, that builds credibility. Users who see your name in AI-generated answers may seek you out later. Being the source that AI trusts creates a form of authority that’s harder to game than traditional backlinks.

And the alternative is worse. If you don’t optimize for AI discoverability, your content simply won’t surface at all—not in search results, not in AI answers. At least with optimization, you’re in the conversation.

The honest framing: we’re in a transition period. The value exchange between content creators and AI platforms hasn’t been resolved. But ignoring the shift won’t make it go away.

Traditional SEO vs GEO

Where SEO and GEO Diverge

Dimension	Traditional SEO	GEO / LLM Optimization
Discovery	Search engine results pages	AI-generated answers and chat interfaces
Ranking Signal	Backlinks, domain authority, keywords	Content quality, factual accuracy, source authority
Content Format	Keyword-optimized pages	Clear, structured, quotable prose
Success Metric	Rankings, clicks, traffic	Citations, brand mentions, inclusion in answers
Technical Foundation	Meta tags, sitemaps, page speed	Structured data, schema markup, llms.txt

The overlap is significant. Good content, clear structure, and technical soundness serve both worlds. The difference lies in who you’re optimizing for—a ranking algorithm or a language model—and what “success” looks like.

What You Can Do Today

While the landscape is still evolving, some practical steps are worth taking now. These aren’t speculative GEO hacks—they’re solid foundations that serve both traditional search and AI discoverability.

Get the Basics Right

Semantic HTML: Use proper heading hierarchy, <article> tags, and meaningful structure. Both search engines and LLMs parse HTML structure to understand content.
Meta tags and Open Graph: Title, description, and social preview metadata help all systems understand what your content is about.
Sitemaps: Expose your site structure. If crawlers—traditional or AI—can’t find your pages, nothing else matters.
Page speed: Fast-loading, accessible content gets crawled more frequently and provides better signals.
AI crawler directives: Bots like GPTBot, ClaudeBot, and PerplexityBot respect robots.txt. Make a deliberate decision about which AI crawlers to allow rather than leaving it to defaults—blocking all AI crawlers means your content won’t appear in AI answers, but allowing them without limits means your content trains models for free. There’s no universal right answer here.

Add Structured Data with JSON-LD

Schema.org markup gives machines explicit context about your content. The recommended format is JSON-LD (JavaScript Object Notation for Linked Data)—a self-contained <script> block embedded in your page’s <head>. Google explicitly recommends JSON-LD over alternatives like microdata or RDFa, and it’s particularly well-suited for LLMs: because it’s a standalone structured block rather than inline annotations scattered through the DOM, models can parse it cleanly without navigating your markup.

Article schema, FAQ schema, and HowTo schema help both Google’s rich results and LLMs that process structured data during retrieval.

{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "SEO in the LLM Era",
  "author": {
    "@type": "Person",
    "name": "David Salathé",
    "url": "https://blog.dsalathe.dev"
  },
  "publisher": {
    "@type": "Person",
    "name": "David Salathé"
  },
  "datePublished": "2026-02-11",
  "description": "How content discovery is changing with LLMs",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://blog.dsalathe.dev/seo-in-the-llm-era"
  }
}

Consider llms.txt

The llms.txt proposal is an emerging convention—a file at the root of your site that helps LLMs understand your site structure, content organization, and how to use your information. Think of it as robots.txt but for language models. It’s still early, but the cost of adding one is low and the signal is clear. This blog serves one.

Offer Machine-Readable Feeds

RSS and Atom feeds are an underappreciated channel for AI discoverability. They provide clean, structured, full-text content without navigation chrome or ads—exactly what a model needs to ingest your content accurately. Some AI-powered aggregators and research tools already consume RSS feeds directly. If your site doesn’t have one, you’re leaving a low-effort discoverability channel on the table.

Write for Understanding, Not Keywords

This is where traditional SEO advice and GEO advice genuinely converge. Content that’s clear, well-structured, factually accurate, and genuinely useful ranks well in search engines and gets cited by LLMs. Keyword stuffing has been declining in value for years. Writing content that actually answers questions—thoroughly, honestly, with concrete examples—is the strategy that ages best.

What’s Next

The honest answer: nobody knows exactly where this is heading. The models are improving, the platforms are shifting, and the relationship between AI systems and content creators is still being negotiated.

What seems clear is that the days of SEO as the sole content discovery strategy are behind us. Content creators now need to think about discoverability across multiple channels—search engines, AI assistants, social platforms, RSS feeds. The acronyms will keep multiplying (GEO, AEO, LLMO, AISO…), but the underlying principle remains: create content worth finding, and make it easy for both humans and machines to understand.

The fundamentals haven’t changed. Good content, clear structure, technical soundness. What’s changed is who’s reading—and it’s increasingly not just humans clicking through search results.

Next Actions: Making Your Site LLM-Native

The steps above are solid foundations, but the frontier is moving fast. Here are concrete techniques that go further—turning your site from merely crawlable by AI to optimized for AI consumption.

Serve Markdown Versions of Your Pages

LLMs work with text, not DOM trees. When a bot crawls your HTML page, it has to strip away navigation, ads, scripts, and layout chrome to get to the actual content—a lossy process. The cleaner alternative: serve a markdown version of the page directly.

Cloudflare has already productized this with their Markdown for Agents feature. When an AI agent requests a page and signals it wants markdown (via the x-markdown-tokens header), Cloudflare automatically converts the HTML response into clean markdown. The agent can even specify a token limit to get a trimmed version that fits its context window. If your site is on Cloudflare, enabling this is a toggle—no code changes required.

Even without Cloudflare, the principle applies. You could serve an llms-full.txt alongside your llms.txt—the former containing your full content in markdown, the latter acting as a site map. Some sites already expose /page-slug.md mirrors of their HTML pages. The cost is low, and you’re removing friction between your content and the models trying to understand it.

Expose MCP Endpoints

This is where things get interesting. The Model Context Protocol (MCP) is an open standard that lets AI agents interact with external tools and data sources in a structured way. Instead of a bot scraping your HTML and hoping to parse it correctly, an MCP server can expose your content as typed, queryable resources—posts by topic, concepts by status, search by keyword.

Think of it as the difference between screen-scraping a website and calling its API. MCP endpoints let AI agents ask for exactly what they need rather than guessing from markup. For a blog, this could mean exposing a search tool, a “get related posts” resource, or structured article metadata. For a product site, it could mean exposing documentation, API references, or pricing data in a format models can consume natively.

MCP is still early, but the trajectory is clear: AI agents are evolving from passive crawlers to active participants that interact with sites programmatically. Sites that offer MCP endpoints will be first-class citizens in that world.

On the browser side, WebMCP is a complementary standard co-authored by Google and Microsoft that lets web pages expose JavaScript functions as typed tools via navigator.modelContext—no backend server required. Instead of an agent scraping your HTML, it can call searchPosts(query) or getArticle(slug) directly in the browser. It shipped as an early preview in Chrome 146 Canary. The model is the same: define your site’s capabilities as a structured tool contract, and agents can interact with them reliably instead of guessing from markup.

Adopt Structured Protocols: robots.txt for the AI Age

Beyond llms.txt, several complementary standards are emerging:

ai.txt: A proposed convention for declaring AI-specific permissions—which models can use your content, for what purposes, and under what terms. Think of it as a more granular robots.txt that addresses the nuances of AI training vs. inference vs. retrieval.
Structured citations metadata: Adding citation or citeAs metadata to your pages makes it easier for models to attribute content correctly when they reference it in answers.
Content hashing and provenance: Embedding content signatures helps AI systems verify that the content they ingested hasn’t been tampered with—important as AI-generated misinformation becomes a concern.

Optimize for Retrieval-Augmented Generation (RAG)

Many AI answer engines don’t rely solely on training data—they retrieve fresh content at query time through RAG pipelines. To perform well in RAG:

Write self-contained sections with clear headings. RAG systems often retrieve chunks, not full pages—each section should make sense in isolation.
Front-load key information. Put the answer in the first paragraph, then elaborate. This mirrors the inverted pyramid style from journalism, and it’s exactly what chunk-based retrieval rewards.
Use explicit definitions and factual statements. “X is Y” constructions are easier for models to extract and cite than buried, hedged explanations.

Provide API and Feed Alternatives

Go beyond RSS. Consider exposing a JSON API for your content—even a simple static one generated at build time. A /api/posts.json endpoint returning structured metadata (title, description, date, topics, URL) gives AI systems a clean programmatic entry point. This blog already exposes search endpoints for its command palette—the same data could serve AI agents.

The thread connecting all these techniques is the same: reduce the distance between your content and the model’s understanding of it. Every layer of indirection—HTML parsing, layout stripping, content guessing—is a chance for information loss. The sites that thrive in the LLM era will be the ones that meet AI agents where they are, speaking their language natively.

Final Thoughts

If you’re a content creator, don’t panic about AI eating your traffic. Don’t chase every new acronym either. Focus on what has always worked: write clearly, structure carefully, and be genuinely useful. Add structured data. Make your site technically sound. These foundations serve you regardless of which AI model or search engine is doing the discovering.

The landscape is shifting, but it’s not a cliff—it’s a gradual slope. The creators who adapt will be the ones who stay curious about the changes without abandoning what works.

Related: This blog’s migration to Astro was partly motivated by SEO concerns. For more on the hosting side, see the Cloudflare migration.

Want to discuss? Find me on GitHub or LinkedIn.