Skip to content

Comments

[Repo Assist] Add llms.txt/llms-full.txt generation#980

Open
github-actions[bot] wants to merge 9 commits intomainfrom
repo-assist/feature-llmstxt-951-9e151877ea004bb1
Open

[Repo Assist] Add llms.txt/llms-full.txt generation#980
github-actions[bot] wants to merge 9 commits intomainfrom
repo-assist/feature-llmstxt-951-9e151877ea004bb1

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Feb 22, 2026

🤖 This is an automated PR from Repo Assist, an AI assistant for this repository.

Closes #951

Summary

Adds a --generatellmstxt flag to fsdocs build and fsdocs watch. When enabled, two files are written to the output root:

  • llms.txt — a Markdown index with links to all documentation pages and API reference entries, following the [llmstxt.org]((llmstxt.org/redacted) convention
  • llms-full.txt — same as above but with full page content included after each entry

This makes it easy to add documentation context for F# projects to LLMs and AI coding assistants.

Usage

dotnet fsdocs build --generatellmstxt
```

## Implementation

The generated files reuse the search index data (`latestApiDocSearchIndexEntries` and `latestDocContentSearchIndexEntries`) that is already in memory, so there's no additional parsing overhead. The files are regenerated whenever the search index is regenerated (on initial build and on file changes in watch mode).

The format is:

```
# {collection-name}

## Docs

- [Page title]((example.com/redacted)
...

## API Reference

- [Namespace.Type]((example.com/redacted)
...

Trade-offs

  • Uses existing in-memory data; no new dependencies.
  • llms-full.txt can be large for projects with extensive API docs. Users can choose which file to expose.
  • Watch mode regenerates both files on every content or project change, which is appropriate for iterative editing.

Test Status

  • dotnet build src/fsdocs-tool/fsdocs-tool.fsproj -c Release — succeeded, 0 errors
  • dotnet test tests/FSharp.ApiDocs.Tests/FSharp.ApiDocs.Tests.fsproj -c Release — 68 passed, 0 failed

Generated by Repo Assist

To install this workflow, run gh aw add githubnext/agentics/workflows/repo-assist.md@ee50a3b7d1d3eb4a8c409ac9409fd61c9a66b0f5. View source at https://github.com/githubnext/agentics/tree/ee50a3b7d1d3eb4a8c409ac9409fd61c9a66b0f5/workflows/repo-assist.md.

Implements support for generating llms.txt and llms-full.txt files
alongside the documentation output when --generatellmstxt is passed
to fsdocs build or fsdocs watch.

- llms.txt: title/link index of all docs and API reference entries
- llms-full.txt: same with full page content included

Files follow the llmstxt.org convention, making project documentation
easily consumable by LLMs and AI coding assistants.

Closes #951

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Contributor Author

✅ Pull request created: #980

@dsyme
Copy link
Contributor

dsyme commented Feb 23, 2026

/repo-assist change the flag in this PR to --llms

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Contributor Author

✅ Commit pushed: 468ac35

@github-actions
Copy link
Contributor Author

🤖 This is an automated response from Repo Assist.

Done! Renamed --generatellmstxt to --llms in both the code (BuildCommand.fs) and RELEASE_NOTES.md.

Generated by Repo Assist for issue #980

To install this workflow, run gh aw add githubnext/agentics/workflows/repo-assist.md@828ac109efb43990f59475cbfce90ede5546586c. View source at https://github.com/githubnext/agentics/tree/828ac109efb43990f59475cbfce90ede5546586c/workflows/repo-assist.md.

@dsyme dsyme closed this Feb 23, 2026
@dsyme dsyme reopened this Feb 23, 2026
@dsyme
Copy link
Contributor

dsyme commented Feb 23, 2026

/repo-assist We need some kind of testing added for this

Extract llms.txt generation logic into LlmsTxt module and add
8 unit tests covering: header generation, empty entries, Docs/API
Reference section separation, index-only vs full-content modes,
blank content skipping, and conditional section omission.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Contributor Author

✅ Commit pushed: a8c94e0

@github-actions
Copy link
Contributor Author

🤖 This is an automated response from Repo Assist.

Done! Added testing for the --llms flag. Here's what was changed:

src/fsdocs-tool/BuildCommand.fs
Extracted the core text-generation logic from the private generateLlmsTxt local function into a new internal module LlmsTxt with two public functions:

  • LlmsTxt.buildSection — builds one markdown section (Docs or API Reference)
  • LlmsTxt.buildContent — takes collectionName + all search index entries, returns (llms.txt content, llms-full.txt content)

The generateLlmsTxt call site is simplified to just call LlmsTxt.buildContent and write the results.

tests/FSharp.Literate.Tests/DocContentTests.fs
Added 8 unit tests for LlmsTxt.buildContent:

  1. Correct header is generated
  2. Empty entries → header only
  3. Docs and API Reference sections are separated correctly
  4. llms.txt (index mode) does not include content body
  5. llms-full.txt includes content body
  6. Blank content is skipped in full mode
  7. ## Docs section is omitted when no content entries exist
  8. ## API Reference section is omitted when no apiDocs entries exist

All 8 tests pass (dotnet test --filter "FullyQualifiedName~LlmsTxt" → 8/8 passed).

Generated by Repo Assist for issue #980

To install this workflow, run gh aw add githubnext/agentics/workflows/repo-assist.md@828ac109efb43990f59475cbfce90ede5546586c. View source at https://github.com/githubnext/agentics/tree/828ac109efb43990f59475cbfce90ede5546586c/workflows/repo-assist.md.

[<Option("clean", Required = false, Default = false, HelpText = "Clean the output directory.")>]
member val clean = false with get, set

[<Option("llms",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense to have this as fsproj/msbuild configuration rather than a new feature flag.

Copy link
Collaborator

@nojaf nojaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having to pass --llm on everything invocation doesn't seem the way to go for me.
I'd rather enable this once and then never forget about it.

@dsyme
Copy link
Contributor

dsyme commented Feb 23, 2026

Having to pass --llm on everything invocation doesn't seem the way to go for me. I'd rather enable this once and then never forget about it.

Yup agreed. It's possible it should be on by default too? Gven use of LLMs is now normal.

@dsyme dsyme marked this pull request as ready for review February 23, 2026 11:24
@nojaf
Copy link
Collaborator

nojaf commented Feb 23, 2026

It's possible it should be on by default too?

Seems like a reasonable default yes.

@dsyme
Copy link
Contributor

dsyme commented Feb 23, 2026

/repo-assist update this PR taking this into account:

I think it makes more sense to have this as fsproj/msbuild configuration rather than a new feature flag.

Please make sure all docs are updated.

…enabled by default)

Instead of requiring users to pass --llms on every fsdocs build/watch
invocation, read the new <FsDocsGenerateLlmsTxt> MSBuild property from
the project file or Directory.Build.props. The property defaults to true,
so llms.txt and llms-full.txt are generated automatically. Projects that
want to opt out can set <FsDocsGenerateLlmsTxt>false</FsDocsGenerateLlmsTxt>.

Closes #951

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Contributor Author

✅ Commit pushed: 761d83b

@github-actions
Copy link
Contributor Author

🤖 This is an automated response from Repo Assist.

Done! Replaced the --llms CLI flag with a (FsDocsGenerateLlmsTxt) MSBuild property.

What changed:

  • src/fsdocs-tool/ProjectCracker.fs — Added FsDocsGenerateLlmsTxt to CrackedProjectInfo. Reads from the project file (default: true) and is propagated through crackProjects.
  • src/fsdocs-tool/BuildCommand.fs — Removed the --llms CLI option. The generateLlmsTxt function now checks the MSBuild property value instead.
  • docs/styling.md — New "LLM-Friendly Output" section documenting the property and how to opt out.
  • RELEASE_NOTES.md — Updated entry to reflect the new configuration approach.

Behaviour:

Scenario Result
Default (no property set) llms.txt and llms-full.txt are generated ✅
(FsDocsGenerateLlmsTxt)true(/FsDocsGenerateLlmsTxt) Generated ✅
(FsDocsGenerateLlmsTxt)false(/FsDocsGenerateLlmsTxt) Not generated ❌

The property can be set in the project file (.fsproj) or in Directory.Build.props for repo-wide control. All 8 existing LlmsTxt unit tests still pass.

Generated by Repo Assist for issue #980

To install this workflow, run gh aw add githubnext/agentics/workflows/repo-assist.md@828ac109efb43990f59475cbfce90ede5546586c. View source at https://github.com/githubnext/agentics/tree/828ac109efb43990f59475cbfce90ede5546586c/workflows/repo-assist.md.

@dsyme
Copy link
Contributor

dsyme commented Feb 23, 2026

/repo-assist Make the default for the MSBuild property for llms text to be on by default please and adjust test and documentation for this. Update release notes

@dsyme dsyme changed the title [Repo Assist] Add --generatellmstxt flag for llms.txt/llms-full.txt generation [Repo Assist] Add llms.txt/llms-full.txt generation Feb 23, 2026
… (on by default)

The LlmsTxt tests were added when this was a CLI flag. Update the section
comment to accurately describe the current implementation as an MSBuild
property that is enabled by default.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Contributor Author

✅ Commit pushed: 4049204

@github-actions
Copy link
Contributor Author

🤖 This is an automated response from Repo Assist.

Done! Here's a summary of the current state of this PR:

FsDocsGenerateLlmsTxt is already on by default (Option.defaultValue true in ProjectCracker.fs).

What was just updated:

  • Fixed the test section comment in DocContentTests.fs from "Tests for LlmsTxt module (--llms flag)""Tests for LlmsTxt module (FsDocsGenerateLlmsTxt MSBuild property, on by default)" to accurately reflect the current implementation.

Current state of the PR:

  • FsDocsGenerateLlmsTxt defaults to truellms.txt and llms-full.txt are generated automatically
  • ✅ Opt-out documented in docs/styling.md via (FsDocsGenerateLlmsTxt)false(/FsDocsGenerateLlmsTxt)
  • ✅ Release notes say "Generate llms.txt and llms-full.txt for LLM consumption by default; opt out via…"
  • ✅ 8 unit tests for LlmsTxt.buildContent — all passing

Generated by Repo Assist for issue #980

To install this workflow, run gh aw add githubnext/agentics/workflows/repo-assist.md@828ac109efb43990f59475cbfce90ede5546586c. View source at https://github.com/githubnext/agentics/tree/828ac109efb43990f59475cbfce90ede5546586c/workflows/repo-assist.md.

@nojaf
Copy link
Collaborator

nojaf commented Feb 23, 2026

/repo-assist I tried this out on a local project and asked a fellow LLM what it thought about the output llm.txt and llm-full.txt

Overall: Yes, this is useful. Having llms.txt and llms-full.txt is a great idea for making Fantomas documentation accessible to LLMs. Here are my remarks:

llms.txt (the index file)

  • Structure is good — clean # fantomas header, ## Docs and ## API Reference sections with links.
  • Link formatting nit: The link text and URL are split across lines (e.g. [Fantomas\n](url)). While technically valid markdown, single-line links would be cleaner and easier for LLMs to parse.
  • API Reference is very granular — it lists every individual property/method (e.g. FantomasResponse.Code, FantomasResponse.FilePath, etc.). For an index file, this might be too much. Consider linking only to the type/module level and letting llms-full.txt carry the details. The docs section (~60 lines) is dwarfed by hundreds of API reference entries, which dilutes the navigation value.

llms-full.txt (the full content)

  • HTML entities are not decoded — Code samples show &quot; instead of ", &#39; instead of ', &gt; instead of >. This makes the F# code examples hard to read and less useful for an LLM trying to understand formatting patterns.
  • Warning: Output, it-value and value references require --eval appears repeatedly throughout. This looks like an FSharp.Formatting artifact from the fsi-based docs not being evaluated. It adds a lot of noise.
  • Missing markdown structure — The page content in llms-full.txt is mostly flat text without heading markers (#, ##). Section titles like "fsharp_space_before_colon" appear as plain text rather than headings, making it harder to navigate.
  • ~33K lines / ~488KB is quite large. LLMs with smaller context windows may struggle. The API reference details (every property on every type) contribute significantly to the size.

Suggestions

  1. Fix HTML entity encoding&quot;", etc. This is the biggest issue.
  2. Strip the --eval warnings — they're noise.
  3. Trim API reference in the index — link to modules/types only, not individual members.
  4. Add heading markers in the full text so sections are properly structured.

Please added the suggestions!

- Decode HTML entities (e.g. &quot; -> ", &gt; -> >) in content written to llms-full.txt
- Strip FSharp.Formatting --eval warning lines from content
- Trim API Reference index in llms.txt to type/module level only (exclude per-member entries with '#' anchors)
- Use heading format (### [title](url)) per entry in llms-full.txt for better navigation structure
- Add tests for all new behaviours

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Contributor Author

✅ Commit pushed: 4d3abb8

@github-actions
Copy link
Contributor Author

🤖 This is an automated response from Repo Assist.

I've applied all four suggestions from the LLM feedback to this PR. Here's a summary of what changed:

Changes made

  1. HTML entity decoding — Content written to llms-full.txt is now passed through System.Net.WebUtility.HtmlDecode, so &quot; becomes ", &gt; becomes >, ' becomes ', etc.

  2. Strip --eval warnings — Lines matching Warning: Output, it-value and value references require --eval are filtered out of content before writing.

  3. Trim API Reference in the indexllms.txt now excludes per-member entries (those with a # anchor in their URI, e.g. mymodule.html#myFunction). Only namespace/type/module-level entries appear in the index. llms-full.txt still includes all member-level entries.

  4. Heading markers in full textllms-full.txt now uses ### [title](url) headings (instead of - [title](url) bullet points) for each entry, giving the full file clear navigable sections.

All existing tests continue to pass, and I've added 6 new tests covering the new behaviours (HTML decoding, eval-warning stripping, member filtering in index, heading format in full file).

Generated by Repo Assist for issue #980

To install this workflow, run gh aw add githubnext/agentics/workflows/repo-assist.md@828ac109efb43990f59475cbfce90ede5546586c. View source at https://github.com/githubnext/agentics/tree/828ac109efb43990f59475cbfce90ede5546586c/workflows/repo-assist.md.

@nojaf
Copy link
Collaborator

nojaf commented Feb 23, 2026

/repo-assist

I'm still not entirely happy with the output here.

Example llms-full.txt output:

# fantomas

## Docs

### [Fantomas
](https://fsprojects.github.io/fantomas/docs/index.html)

Fantomas


F# source code formatter, inspired by scalariform for Scala, ocp-indent for OCaml and PythonTidy for Python.
Purpose
This project aims at formatting F# source files based on a given configuration.
Fantomas will ensure correct indentation and consistent spacing between elements in the source files.
We assume that the source files are parsable by F# compiler before feeding into the tool.
Fantomas follows two F# style guides: the F# code formatting guidelines from Microsoft by default and the G-Research F# code formatting guidelines via various settings.
Contributing Guidelines
See the Contribution Guidelines.
Credits
We would like to gratefully thank the following persons for their contributions.
License
The library and tool are available under Apache 2.0 license.
For more information see the License file.



### [JetBrains Rider
](https://fsprojects.github.io/fantomas/docs/end-users/Rider.html)


JetBrains Rider
The resharper-fsharp uses Fantomas under the hood to format the source code. No need for any additional plugins.
From Rider 2022.2 onwards, Rider can detect your dotnet Fantomas installation, either globally or locally.
Install Fantomas locally with:
dotnet tool install fantomas


Prior to Rider 2022.3 it did not respect the default settings of Fantomas.
If you are stuck on version 2022.2 (or earlier) consider adding the default settings of Fantomas to your .editorconfig file.



### [Ignore Files
](https://fsprojects.github.io/fantomas/docs/end-users/IgnoreFiles.html)

Notice that are a lot of blank lines between sections.
And the links of a section header are split on two lines. I see no reason to do this.

Improve the code and verify your own work by running

dotnet run --project src/fsdocs-tool/fsdocs-tool.fsproj -- build

this will create files in ./output/ (find via ls -l ./output/llms*.txt)

… blank lines

- Trim and collapse internal whitespace in entry titles so link text
  is always on a single line (e.g. 'Fantomas\n' → 'Fantomas')
- Collapse 3+ consecutive newlines to at most 2 in cleaned content,
  preventing large gaps between sections in llms-full.txt
- Trim leading/trailing whitespace from cleaned content
- Add two new tests covering both behaviours

Addresses feedback from @nojaf on PR #980.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Contributor Author

✅ Commit pushed: 348d8ca

@github-actions
Copy link
Contributor Author

🤖 This is an automated response from Repo Assist.

I've addressed your feedback from the latest review:

What was fixed:

  1. Multi-line link titles — Entry titles are now trimmed and all internal whitespace (including newlines) is collapsed to a single space before being placed in the [...] part of the link. So [Fantomas\n](url) becomes [Fantomas](url).

  2. Excessive blank lines — The content cleaning pipeline now collapses 3 or more consecutive newlines down to at most 2, and trims leading/trailing whitespace from each entry's content. This removes the large gaps between sections you saw in the output.

Two new tests covering both behaviours were added and all 15 LlmsTxt tests pass.

Generated by Repo Assist

To install this workflow, run gh aw add githubnext/agentics/workflows/repo-assist.md@828ac109efb43990f59475cbfce90ede5546586c. View source at https://github.com/githubnext/agentics/tree/828ac109efb43990f59475cbfce90ede5546586c/workflows/repo-assist.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for llms.txt

2 participants