llms.txt

llms.txt is a Markdown file that gives language models and agent tools a curated entry point for a site. Sites usually publish it at /llms.txt.

Lectito supports the practical parts of the convention:

fetching a site's llms.txt
parsing its sections and links
expanding linked pages into one Markdown context file
crawling a bounded set of pages to generate an llms.txt index

It does not treat llms.txt as access control. Use robots.txt, HTTP authorization, and normal server controls for that.

File Shape

A small file looks like this:

# Example Docs

> Documentation for Example's public API.

Use the current API reference when generated examples disagree with older blog
posts.

## Docs

- [Quick start](https://example.com/docs/quick-start.md): First integration
  steps.
- [API reference](https://example.com/docs/api.md): Endpoint and object
  reference.

## Optional

- [Changelog](https://example.com/docs/changelog.md)

Lectito expects:

one H1 title
an optional blockquote summary
optional notes before the first H2
H2 sections containing Markdown links

The Optional section has special handling. lectito llms expand skips those links by default so the generated context stays smaller.

Fetch

Fetch a site's llms.txt:

lectito llms fetch https://example.com

For bare site URLs, Lectito requests /llms.txt. Explicit URLs are used as given:

lectito llms fetch https://example.com/docs/llms.txt

You can write the result to a file:

lectito llms fetch https://example.com --output llms.txt

Parse

Parse an llms.txt file into JSON:

lectito llms parse llms.txt --pretty

This is useful for checking whether section names, optional links, and notes are being read as expected.

Expand

Expand linked resources into one Markdown file:

lectito llms expand llms.txt --output llms-full.txt

Lectito keeps Markdown resources unchanged. When a linked resource looks like HTML, Lectito extracts the readable article and inserts the extracted Markdown. For remote links, Lectito checks the HTTP Content-Type header before falling back to URL suffixes and simple Markdown markers.

Each resource is separated and labeled:

---
# Source: Quick start
URL: https://example.com/docs/quick-start.md
Notes: First integration steps.
...

Use --include-optional to include the Optional section:

lectito llms expand llms.txt --include-optional --output llms-full.txt

Use --max-links when you want a smaller bundle:

lectito llms expand llms.txt --max-links 10

Generate

Generate an llms.txt file from a seed page:

lectito llms generate https://example.com/docs/ --output llms.txt

The crawler is intentionally bounded. For URL seeds, Lectito follows same-origin links only. For local HTML files, it follows relative local links. Assets such as images, stylesheets, scripts, PDFs, archives, and feeds are skipped.

To write the expanded context at the same time, pass --full:

lectito llms generate https://example.com/docs/ \
  --output llms.txt \
  --full llms-full.txt

--full-output is the same option with a more explicit name.

You can also generate from a sitemap:

lectito llms generate --sitemap https://example.com/sitemap.xml \
  --output llms.txt

Or discover sitemaps from a URL seed:

lectito llms generate https://example.com --discover \
  --output llms.txt

Discovery reads Sitemap: lines from robots.txt. When no sitemap is listed there, Lectito tries /sitemap.xml.

Sitemap indexes are supported. Lectito reads child sitemaps up to --max-sitemaps, then fetches page URLs up to --max-pages:

lectito llms generate --sitemap https://example.com/sitemap.xml \
  --max-sitemaps 10 \
  --max-pages 100 \
  --output llms.txt

Remote sitemap generation keeps sitemap and page URLs on the same origin as the sitemap input. Local sitemap files may list any absolute page URL.

By default, generation fetches up to 25 pages and follows links up to depth 2:

lectito llms generate https://example.com/docs/ \
  --max-pages 10 \
  --max-depth 1

Use --filter for the common path and glob cases. Prefix a pattern with ! to exclude it:

lectito llms generate --sitemap https://example.com/sitemap.xml \
  --filter /docs/ \
  --filter '!/docs/archive/' \
  --filter '!*/drafts/*'

Patterns that start with / match URL paths. Plain path values are prefixes. Path patterns with * or ? are globs. Other glob patterns match the full URL.

Use --delay to wait between page fetches:

lectito llms generate https://example.com/docs/ --delay 250

Remote generation checks robots.txt before fetching page URLs. Lectito keeps the existing browser-like user agent for HTTP requests, but evaluates robots rules as Lectito unless you pass another token:

lectito llms generate https://example.com/docs/ \
  --robots-agent LectitoDocsBot

Use --ignore-robots only when you explicitly want to bypass those checks:

lectito llms generate https://example.com/docs/ --ignore-robots

Only pages that produce readable article content are included. Each accepted page becomes one link in the generated file. Lectito uses the extracted title as the link label, switches to a page's canonical URL when one is available, and uses the extracted excerpt as the link note.

Remote generation also reads Last-Modified response headers. Sitemap generation reads lastmod values. When either value is present, Lectito adds it to the generated note and uses it as a small ranking signal. Ranking favors likely entry points such as docs roots, guides, API references, and pages with useful notes. Archive-like URLs are pushed down.

Set the generated title, summary, or section name when the defaults are too generic:

lectito llms generate https://example.com/docs/ \
  --title "Example Docs" \
  --summary "Public documentation for Example." \
  --section "Guides" \
  --output llms.txt

When To Use It

Use llms.txt when you want agents to start from a small, curated list of important pages. It works well for docs, public APIs, policy pages, and small knowledge bases.

Do not expect every model provider or search engine to read it. The reliable use case is explicit: a developer, tool, or agent asks Lectito to fetch or expand the file.