Pillar Article
The technical changes that make the biggest difference to your AI search visibility. From robots.txt and llms.txt to structured data and schema markup - a complete technical GEO guide.
Technical GEO: How to optimise your website for AI search | SearchScore
Most GEO improvements are not about rewriting your content. They are technical changes - how your site is configured, what signals it sends to AI crawlers, and how well machines can understand its structure. This guide covers every technical change that materially improves AI search visibility.
Key Takeaway
Technical GEO requires configuring robots.txt and llms.txt files, along with structured data schemas and optimised content structure, to ensure AI crawlers can accurately parse and cite your website.
- Prioritised: what to fix first
- AI crawler permissions in robots.txt
- Schema markup for AI citation
- Structured data implementation
- Platform and performance signals
Not all technical GEO changes are equal. Our analysis of 12,000 websites found that three issues account for the majority of AI search invisibility - and two of them take under an hour to fix.
Highest impact
Unblock AI crawlersrobots.txt fix - takes 10 minutes Highest impact
Create llms.txtNew file - takes 30 minutes High impact
Add schema markupJSON-LD injection - 1 to 4 hours Medium impact
Improve page structureSemantic HTML cleanup
Your robots.txt file tells web crawlers which parts of your site they can access. The problem is that most robots.txt files were written before AI search engines existed - and many contain blanket rules that accidentally block AI crawlers alongside spam bots.
The major AI crawlers and their user-agent names:
- GPTBot - OpenAI / ChatGPT
- PerplexityBot - Perplexity AI
- ClaudeBot - Anthropic / Claude
- Googlebot - Google AI Overviews (uses standard Googlebot)
- anthropic-ai - Anthropic web crawler
- cohere-ai - Cohere
Common mistake: Using User-agent: * with Disallow: / to block all bots will also block every AI crawler. This is the single most damaging GEO error and we see it on 73% of websites we audit.
To allow all major AI crawlers, add these lines to your robots.txt:
`# Allow major AI crawlers User-agent: GPTBot Allow: /
User-agent: PerplexityBot Allow: /
User-agent: ClaudeBot Allow: /
User-agent: anthropic-ai Allow: /
User-agent: cohere-ai Allow: /`
If you need to block AI training data collection while allowing search, use more specific directives. OpenAI, Anthropic and others honour different bot names for training versus live retrieval.
llms.txt is a plain text file placed at the root of your website (e.g. yoursite.com/llms.txt) that gives AI language models structured guidance about your site. Think of it as a sitemap for AI - not just where pages are, but what your site is, what your most important content covers, and how an AI should understand your brand.
The format is simple Markdown. A basic llms.txt looks like this:
`# YourBrand
One-line description of what your website/business does.
[Brief description of who you are, what you do, and who you serve]
This site covers [your main topic areas]. Our content is written by [credentials].
[contact@yoursite.com]`
Beyond the basics, you can also include a detailed llms-full.txt that contains the complete text of your most important pages - making it trivial for AI models to ingest your content without crawling your full site.
Quick win: Our data shows that 92% of websites have no llms.txt file at all. Simply creating one puts you ahead of almost all of your competitors from an AI search perspective.
Schema.org markup is structured data embedded in your HTML that tells machines what your content means. Google has required it for rich results for years - but for GEO, it is even more important. AI engines use schema to verify facts, understand entities, attribute authorship and decide whether to cite your content.
Organisation schema establishes your brand as a known entity. It should include your official name, URL, logo, contact details, social media profiles and, where applicable, your Wikipedia or Wikidata URL. This is the foundation of brand authority for AI citation.
{ "@context": "https://schema.org", "@type": "Organization", "name": "Your Company Name", "url": "https://yoursite.com", "logo": "https://yoursite.com/logo.png", "sameAs": [ "https://twitter.com/yourhandle", "https://linkedin.com/company/yourcompany", "https://en.wikipedia.org/wiki/YourCompany" ] }
Every blog post and article should have Article schema with a named author, a datePublished, and a publisher reference. This is how AI engines attribute content to real, verified people - a critical EEAT signal.
FAQPage schema is one of the most powerful GEO signals available. AI engines that synthesise answers frequently pull from structured Q&A content - and FAQ schema makes your Q&A pairs directly machine-readable. Add it to any page with questions and answers.
Beyond the core three schema types, consider adding structured data relevant to your business type:
- LocalBusiness - for businesses with a physical location
- Product - for ecommerce and software products
- Person - for author pages and personal brands
- HowTo - for instructional content (AI engines love step-by-step guides)
- BreadcrumbList - helps AI understand your site hierarchy
- WebSite with SearchAction - signals your site as a navigable entity
Implement schema as JSON-LD in the <head> of your pages. It is easier to maintain than inline microdata and is the format preferred by both Google and AI crawlers.
AI crawlers face the same technical barriers as other bots. Slow load times, JavaScript-heavy rendering, broken pagination and inconsistent canonical URLs all reduce how effectively AI engines can parse your content.
- Core Web Vitals - fast LCP and low CLS improve crawl efficiency
- Semantic HTML - use proper heading hierarchy (H1 > H2 > H3), not divs styled to look like headings
- Alt text - all images labelled, helping AI understand visual content context
- Canonical tags - prevent AI engines from indexing duplicate content versions
- XML sitemap - ensure all important pages are discoverable
- HTTPS - a basic trust signal for all search engines, including AI
- ☐ GPTBot, ClaudeBot and PerplexityBot are not blocked in robots.txt
- ☐ llms.txt file exists at domain root
- ☐ llms.txt includes accurate site description, key pages and key topics
- ☐ Organisation schema implemented on homepage
- ☐ Article schema on all blog posts with named author
- ☐ FAQPage schema on key pages
- ☐ Person schema on author bio pages
- ☐ All schema validated with Google Rich Results Test
- ☐ Canonical tags on all pages
- ☐ XML sitemap submitted to Google Search Console
- ☐ HTTPS active across entire site
- ☐ No JavaScript rendering required to access main content
- ☐ H1 > H2 > H3 heading hierarchy consistent on all pages
- ☐ Image alt text complete
Free audit. Instant results. No sign-up required.
Check my score →
- How to create an llms.txt file (step by step) →
- AI bot permissions in robots.txt - complete guide →
- Schema markup for AI citation →
- Structured data types AI engines rely on →
- GEO technical audit checklist →
- llms.txt examples: what good files look like →
- Ofcom (ofcom.org.uk) – UK Digital Communications Research
- UK Government (gov.uk) – National AI Strategy
- Academic Research – Generative Engine Optimisation (GEO), arXiv 2024
- Reuters Technology – AI & Search Industry News
Check your AI visibility
Enter your URL at SearchScore for a free AI visibility Score. See how ChatGPT, Perplexity and Google AI see your site - and exactly what to fix.
SearchScore Tracker runs weekly (or daily) scans across ChatGPT, Claude, Gemini, Perplexity, Grok and DeepSeek. Get your baseline, free.
Related guides
Technical Geo AI bot permissions in robots.txt: The complete guide Technical Geo How to create an llms.txt file: Step-by-step guide Technical Geo GEO technical audit checklist: 25 items to check Technical Geo schema markup for AI citation: Which types matter most