Skip to content

Comments

[Repo Assist] Fix #748: HTML-encode XML doc text nodes and unresolved cref values#994

Draft
github-actions[bot] wants to merge 1 commit intomainfrom
repo-assist/fix-748-xml-doc-html-encoding-v2-6c850b1f53e8d053
Draft

[Repo Assist] Fix #748: HTML-encode XML doc text nodes and unresolved cref values#994
github-actions[bot] wants to merge 1 commit intomainfrom
repo-assist/fix-748-xml-doc-html-encoding-v2-6c850b1f53e8d053

Conversation

@github-actions
Copy link
Contributor

🤖 This is an automated pull request from Repo Assist.

Closes #748

Summary

This PR fixes two related HTML-encoding gaps in GenerateModel.fs that could cause broken output or HTML injection when XML documentation comments contain special characters.

Root Cause

In readXmlElementAsHtml, two code paths were appending content to the HTML output without proper HTML encoding:

  1. Text nodes (line 1901): html.Append(text) — if XML doc text contains <, >, or & characters (e.g. in LaTeX math like \[ 1 < 2 < 3 > 0 \]), these would be emitted as raw HTML characters, breaking the document structure.

  2. Unresolved (see cref) values (line 1945): html.Append(cref.Value) — unresolved cross-references fall back to emitting the raw cref string (e.g. T:TheNamespace.GenericClass2\1), which was already noted in the code with a commented-out HtmlEncode` call.

Fix

  • HTML-encode text nodes: html.Append(HttpUtility.HtmlEncode text)
  • Enable the already-commented-out cref encoding: let crefAsHtml = HttpUtility.HtmlEncode cref.Value

Trade-offs

HTML entities in the source XML doc (like &lt;) are decoded by the XML parser before being stored as text node values. My encoding re-encodes them correctly for HTML output. Browsers decode HTML entities before passing text to MathJax (which reads from the DOM), so LaTeX math with < and > operators continues to work correctly.

The existing test for LaTeX math content was updated to expect the correctly HTML-encoded output (1 &lt; 2 &lt; 3 &gt; 0 instead of 1 < 2 < 3 > 0).

Test Status

  • dotnet build src/FSharp.Formatting.ApiDocs/ — succeeded (0 errors)
  • dotnet test tests/FSharp.ApiDocs.Tests/68/68 passed (4 skipped, 0 failed)

Generated by Repo Assist

To install this workflow, run gh aw add githubnext/agentics/workflows/repo-assist.md@828ac109efb43990f59475cbfce90ede5546586c. View source at https://github.com/githubnext/agentics/tree/828ac109efb43990f59475cbfce90ede5546586c/workflows/repo-assist.md.

- HTML-encode text nodes in readXmlElementAsHtml to prevent HTML injection
  when XML doc text contains characters like '<', '>', '&'
- HTML-encode unresolved <see cref> values (the commented-out crefAsHtml
  code was already there, just never enabled)
- Update test to expect HTML-encoded output for math expressions with '<' and '>'

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Contributor Author

✅ Pull request created: #994

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

When using XML docs, backtick characters break the output

0 participants