How to Create an llms.txt File for Your Business Website
In 2005, every serious website had a robots.txt file. It told search engine crawlers what to index and what to skip. Twenty years later, an equivalent is emerging for AI systems: llms.txt.
The spec was created by Jeremy Howard (founder of fast.ai) and it solves a specific problem. When an AI system crawls your website, it has no idea which pages contain useful, structured information and which pages are marketing fluff. llms.txt is a plain text file at your domain root that acts as a navigation directory. It tells the AI: here is what this site contains, and here is what matters.
Some AI systems and indexing pipelines are beginning to check for llms.txt proactively, the same way Googlebot checks robots.txt. Adoption is early. But the businesses that have one now are giving AI crawlers a direct path to their most important data, while everyone else forces the crawler to guess.
The spec
An llms.txt file lives at yourdomain.com/llms.txt. It is plain Markdown with a specific structure:
- An H1 header with the name of the site or organization
- A blockquote summarizing what the site is
- H2 sections grouping links by type
- Each link formatted as
[Label](URL): Description
That's it. No JSON, no XML, no special encoding. Jeremy Howard intentionally chose Markdown because LLMs already parse it natively.
What a bad llms.txt looks like
Most businesses that attempt an llms.txt do this:
# Horizon Air Solutions
> HVAC company serving Dallas-Fort Worth.
## Pages
- [Home](https://horizonair.com/)
- [About Us](https://horizonair.com/about)
- [Services](https://horizonair.com/services)
- [AC Repair](https://horizonair.com/services/ac-repair)
- [Heating](https://horizonair.com/services/heating)
- [Contact](https://horizonair.com/contact)
- [Blog](https://horizonair.com/blog)
- [Testimonials](https://horizonair.com/testimonials)
- [Careers](https://horizonair.com/careers)
- [Privacy Policy](https://horizonair.com/privacy)
This is a sitemap with extra steps. It gives the AI crawler no signal about which pages actually contain structured, verifiable data versus which pages are brochureware. An AI system that reads this file knows exactly as much as it did before: nothing useful about how the business actually performs.
What a good llms.txt looks like
# Horizon Air Solutions
> HVAC contractor serving Dallas-Fort Worth, TX. 22 years in operation.
> Licensed, insured, residential and commercial HVAC installation,
> repair, and maintenance.
## Verified Business Data
- [TrustRecord](https://horizonair.com/trustrecord): Verified operational
metrics including job volume, service mix, repeat customer rate, and
service area. Machine-readable JSON-LD and canonical JSON. Independently
verified from connected accounting systems. Refreshed monthly.
## Business Information
- [About](https://horizonair.com/about): Company history, ownership,
licensing, and insurance information.
- [Services](https://horizonair.com/services): Full list of HVAC services
offered with service area details.
- [Service Areas](https://horizonair.com/areas): Cities and counties served
with primary and secondary coverage zones.
## External Verification
- [BBB Profile](https://www.bbb.org/us/tx/dallas/profile/heating-and-air/horizon-air-solutions-0123456789): BBB accreditation status and complaint history.
- [TrustRecord Registry](https://trustrecord.com/hvac/horizon-air-dallas-tx): Canonical verified record in the TrustRecord public registry.
The difference is obvious. The good version tells the AI crawler exactly where to find structured operational data, distinguishes verified data from general business information, and links to external verification sources. It is a curated directory of what an AI system needs to evaluate this business, not a dump of every page on the site.
Section by section
The H1 and blockquote identify the business. Keep the blockquote factual. Years in operation, location, license type, core services. No slogans, no taglines, no "proudly serving." An AI model parsing this blockquote should get the same density of information it would get from a structured database entry.
Verified Business Data is the most important section. This is where you point to pages containing structured, machine-readable operational data. If you have JSON-LD markup on your site, link to the page that contains it. If you have a TrustRecord or any other structured data page, this is where it goes. The description should specify the format (JSON-LD, canonical JSON) and the verification method.
Business Information covers the standard pages that contain useful facts. Not every page on your site belongs here. Your blog archive, your careers page, your privacy policy, your testimonial carousel: none of these help an AI system evaluate whether to recommend you. Include only pages where a model can extract discrete, factual claims about your business.
External Verification links to third-party sources that corroborate your data. BBB profiles, state licensing board records, registry entries. These are the equivalent of backlinks in the SEO world. They tell the AI system that the data on your site is not just self-asserted.
llms-full.txt
The spec also defines llms-full.txt, an extended version that can include the full text content of your key pages, pre-formatted for LLM consumption. If llms.txt is the table of contents, llms-full.txt is the full book.
For most service businesses, llms.txt alone is sufficient. The pages it links to should already be server-rendered and crawlable. But if you have dense structured data that you want to make maximally accessible, llms-full.txt lets you serve it directly in a single file, no crawling required.
The file goes at the same domain root: yourdomain.com/llms-full.txt.
llms.txt is a map. It is not the territory.
This is the part most people get wrong. Adding an llms.txt file to a website that contains no structured operational data is like putting a detailed legend on a blank map. The file tells AI crawlers where to look. If there is nothing to find when they get there, the file is worthless.
Adding an
llms.txtfile to a website that contains no structured operational data is like putting a detailed legend on a blank map.
I see businesses spend time crafting an llms.txt file while every page it links to is unstructured marketing copy. The crawler follows the link, hits an "About Us" page full of sentences like "We take pride in delivering exceptional service," and moves on with zero usable data.
The sequence matters: first, publish structured, machine-readable operational data on your site. Then create an llms.txt file that points to it. If you do not have a page on your site with LocalBusiness JSON-LD markup, verified operational metrics, or canonical JSON data, creating an llms.txt file should not be your first step. Building the data layer should be.
This is the problem TrueSignal solves. We connect to your operational systems, extract the real data, and publish it as a TrustRecord: a structured, verified page with three layers (HTML, JSON-LD, and canonical JSON). When a business has a TrustRecord at yourdomain.com/trustrecord, their llms.txt has something real to reference. The map points to actual territory.
We also publish every TrustRecord on trustrecord.com as a canonical registry entry, so the llms.txt file can include an external verification link to an independent source. Belt and suspenders.
How to deploy it
Create a file called llms.txt in your site's public root directory. On Next.js, that is the public/ folder. On WordPress, upload it via FTP or use a plugin that serves static files from the root. On Squarespace and Wix, you may need a URL redirect rule to serve a plain text file at the root path.
Verify it is accessible by visiting https://yourdomain.com/llms.txt in a browser. You should see raw Markdown text.
Update it when your site structure changes. If you add a new structured data page, add it to the file. If you remove a page, remove the reference. Treat it like robots.txt: set it and maintain it, do not set it and forget it.
If you already have a TrustRecord, or you are considering getting one, the llms.txt integration is already accounted for. The structured data page it needs to point to already exists in the right format.