Pillar Article

Technical GEO: How to optimise your website for AI search | SearchScore

The technical changes that make the biggest difference to your AI search visibility. From robots.txt and llms.txt to structured data and schema markup - a complete technical GEO guide.

Technical GEO: How to optimise your website for AI search

Most GEO improvements are not about rewriting your content. They are technical changes - how your site is configured, what signals it sends to AI crawlers, and how well machines can understand its structure. This guide covers every technical change that materially improves AI search visibility.

Key Takeaway

Technical GEO requires configuring robots.txt and llms.txt files, along with structured data schemas and optimised content structure, to ensure AI crawlers can accurately parse and cite your website.

In this guide

- Prioritised: what to fix first

- AI crawler permissions in robots.txt

- Creating your llms.txt file

- Schema markup for AI citation

- Structured data implementation

- Platform and performance signals

- Technical GEO checklist

What to fix first

Not all technical GEO changes are equal. Our analysis of 12,000 websites found that three issues account for the majority of AI search invisibility - and two of them take under an hour to fix.

Highest impact

Unblock AI crawlersrobots.txt fix - takes 10 minutes Highest impact

Create llms.txtNew file - takes 30 minutes High impact

Add schema markupJSON-LD injection - 1 to 4 hours Medium impact

Improve page structureSemantic HTML cleanup

AI crawler permissions in robots.txt

Your robots.txt file tells web crawlers which parts of your site they can access. The problem is that most robots.txt files were written before AI search engines existed - and many contain blanket rules that accidentally block AI crawlers alongside spam bots.

The major AI crawlers and their user-agent names:

- GPTBot - OpenAI / ChatGPT

- PerplexityBot - Perplexity AI

- ClaudeBot - Anthropic / Claude

- Googlebot - Google AI Overviews (uses standard Googlebot)

- anthropic-ai - Anthropic web crawler

- cohere-ai - Cohere

Common mistake: Using User-agent: * with Disallow: / to block all bots will also block every AI crawler. This is the single most damaging GEO error and we see it on 73% of websites we audit.

To allow all major AI crawlers, add these lines to your robots.txt:

`# Allow major AI crawlers User-agent: GPTBot Allow: /

User-agent: PerplexityBot Allow: /

User-agent: ClaudeBot Allow: /

User-agent: anthropic-ai Allow: /

User-agent: cohere-ai Allow: /`

If you need to block AI training data collection while allowing search, use more specific directives. OpenAI, Anthropic and others honour different bot names for training versus live retrieval.

Creating your llms.txt file

llms.txt is a plain text file placed at the root of your website (e.g. yoursite.com/llms.txt) that gives AI language models structured guidance about your site. Think of it as a sitemap for AI - not just where pages are, but what your site is, what your most important content covers, and how an AI should understand your brand.

The format is simple Markdown. A basic llms.txt looks like this:

`# YourBrand

One-line description of what your website/business does.

About

[Brief description of who you are, what you do, and who you serve]

Key pages

Home: Main landing page
About: Company background and team
Blog: Articles and guides

Key topics

This site covers [your main topic areas]. Our content is written by [credentials].

Contact

[contact@yoursite.com]`

Beyond the basics, you can also include a detailed llms-full.txt that contains the complete text of your most important pages - making it trivial for AI models to ingest your content without crawling your full site.

Quick win: Our data shows that 92% of websites have no llms.txt file at all. Simply creating one puts you ahead of almost all of your competitors from an AI search perspective.

Schema markup for AI citation

Schema.org markup is structured data embedded in your HTML that tells machines what your content means. Google has required it for rich results for years - but for GEO, it is even more important. AI engines use schema to verify facts, understand entities, attribute authorship and decide whether to cite your content.

Organisation schema

Organisation schema establishes your brand as a known entity. It should include your official name, URL, logo, contact details, social media profiles and, where applicable, your Wikipedia or Wikidata URL. This is the foundation of brand authority for AI citation.

{ "@context": "https://schema.org", "@type": "Organization", "name": "Your Company Name", "url": "https://yoursite.com", "logo": "https://yoursite.com/logo.png", "sameAs": [ "https://twitter.com/yourhandle", "https://linkedin.com/company/yourcompany", "https://en.wikipedia.org/wiki/YourCompany" ] }

Article schema

Every blog post and article should have Article schema with a named author, a datePublished, and a publisher reference. This is how AI engines attribute content to real, verified people - a critical EEAT signal.

FAQPage schema

FAQPage schema is one of the most powerful GEO signals available. AI engines that synthesise answers frequently pull from structured Q&A content - and FAQ schema makes your Q&A pairs directly machine-readable. Add it to any page with questions and answers.

Structured data: the bigger picture

Beyond the core three schema types, consider adding structured data relevant to your business type:

- LocalBusiness - for businesses with a physical location

- Product - for ecommerce and software products

- Person - for author pages and personal brands

- HowTo - for instructional content (AI engines love step-by-step guides)

- BreadcrumbList - helps AI understand your site hierarchy

- WebSite with SearchAction - signals your site as a navigable entity

Implement schema as JSON-LD in the <head> of your pages. It is easier to maintain than inline microdata and is the format preferred by both Google and AI crawlers.

Platform and performance signals

AI crawlers face the same technical barriers as other bots. Slow load times, JavaScript-heavy rendering, broken pagination and inconsistent canonical URLs all reduce how effectively AI engines can parse your content.

- Core Web Vitals - fast LCP and low CLS improve crawl efficiency

- Semantic HTML - use proper heading hierarchy (H1 > H2 > H3), not divs styled to look like headings

- Alt text - all images labelled, helping AI understand visual content context

- Canonical tags - prevent AI engines from indexing duplicate content versions

- XML sitemap - ensure all important pages are discoverable

- HTTPS - a basic trust signal for all search engines, including AI

Technical GEO checklist

- ☐ GPTBot, ClaudeBot and PerplexityBot are not blocked in robots.txt

- ☐ llms.txt file exists at domain root

- ☐ llms.txt includes accurate site description, key pages and key topics

- ☐ Organisation schema implemented on homepage

- ☐ Article schema on all blog posts with named author

- ☐ FAQPage schema on key pages

- ☐ Person schema on author bio pages

- ☐ All schema validated with Google Rich Results Test

- ☐ Canonical tags on all pages

- ☐ XML sitemap submitted to Google Search Console

- ☐ HTTPS active across entire site

- ☐ No JavaScript rendering required to access main content

- ☐ H1 > H2 > H3 heading hierarchy consistent on all pages

- ☐ Image alt text complete

Check your AI visibility

Free audit. Instant results. No sign-up required.

Check my score →

Check your AI visibility

Enter your URL at SearchScore for a free AI visibility Score. See how ChatGPT, Perplexity and Google AI see your site - and exactly what to fix.

Track your AI citations weekly

SearchScore Tracker runs weekly (or daily) scans across ChatGPT, Claude, Gemini, Perplexity, Grok and DeepSeek. Get your baseline, free.

Track my score →

Technical GEO: How to optimise your website for AI search | SearchScore