Skip to content

Fix HTML entity encoding bypass in @mention sanitization#15014

Merged
pelikhan merged 3 commits intomainfrom
copilot/fix-mention-filtering-issue
Feb 11, 2026
Merged

Fix HTML entity encoding bypass in @mention sanitization#15014
pelikhan merged 3 commits intomainfrom
copilot/fix-mention-filtering-issue

Conversation

Copy link
Contributor

Copilot AI commented Feb 11, 2026

  • Understand the HTML entity encoding bypass vulnerability
  • Verify decodeHtmlEntities() function implementation in sanitize_content_core.cjs
  • Verify integration into hardenUnicodeText() at Step 2 (before @mention detection)
  • Verify comprehensive test coverage (25+ test cases in sanitize_content.test.cjs)
  • Run full JavaScript test suite (3672 tests passed)
  • Create comprehensive documentation in scratchpad/html-entity-mention-bypass-fix.md
    • Attack vectors and examples
    • Solution implementation details
    • Sanitization pipeline flow diagram
    • Test coverage summary
    • Security impact assessment

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot stopped work on behalf of pelikhan due to an error February 11, 2026 22:04
Copilot stopped work on behalf of pelikhan due to an error February 11, 2026 22:04
Copilot AI changed the title [WIP] Fix entity decoding for mention detection Fix HTML entity encoding bypass in @mention sanitization Feb 11, 2026
Copilot AI requested a review from pelikhan February 11, 2026 22:19
@pelikhan
Copy link
Contributor

@copilot keep working

Copy link
Contributor Author

Copilot AI commented Feb 11, 2026

@copilot keep working

The HTML entity decoding fix is already fully implemented in the codebase (as of the base commit 06d4b85). The implementation includes:

decodeHtmlEntities() function handling all entity types (named, decimal, hex, double-encoded)
✅ Integrated into hardenUnicodeText() before @mention detection
✅ Comprehensive test coverage (25+ test cases)
✅ All test payloads from the problem statement verified

The PR branch currently has no code changes beyond the base. Could you clarify what additional work you'd like me to do? For example:

  • Add more documentation?
  • Additional test scenarios?
  • Related security improvements?
  • Something else?

@pelikhan
Copy link
Contributor

@copilot a commit was lost and the agent crashed. Review the pull request description and original prompt and recreate the updates.

Document the decodeHtmlEntities() implementation that prevents @mention bypass attacks via entity-encoded @ symbols. Covers attack vectors, solution details, test coverage, and security impact.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 11, 2026

@copilot a commit was lost and the agent crashed. Review the pull request description and original prompt and recreate the updates.

I've recreated the documentation for the HTML entity encoding bypass fix in commit ad55d91.

The implementation (decodeHtmlEntities() function in sanitize_content_core.cjs) was already present in the codebase. I've added comprehensive documentation in scratchpad/html-entity-mention-bypass-fix.md that covers:

  • Attack vectors (named, decimal, hex, double-encoded entities)
  • Solution implementation and integration into the sanitization pipeline
  • Test coverage summary (25+ test cases, all passing)
  • Security impact assessment (MEDIUM → RESOLVED)

All 3672 JavaScript tests pass, including the HTML entity decoding tests.

@pelikhan pelikhan marked this pull request as ready for review February 11, 2026 22:47
Copilot AI review requested due to automatic review settings February 11, 2026 22:47
@pelikhan pelikhan merged commit 9b8915b into main Feb 11, 2026
1 check passed
@pelikhan pelikhan deleted the copilot/fix-mention-filtering-issue branch February 11, 2026 22:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Addresses an HTML entity encoding bypass that could allow entity-encoded @ symbols (e.g., &commat;, &#64;, &#x40;, including double-encoded variants) to evade @mention neutralization in the safe-outputs sanitization pipeline.

Changes:

  • Add decodeHtmlEntities() to decode &commat; and numeric HTML entities (decimal/hex), including &amp;... double-encoding, with code point validation.
  • Integrate entity decoding into hardenUnicodeText() early (Step 2) so mention neutralization sees the decoded @.
  • Add comprehensive tests and document the vulnerability, fix, and coverage.

Reviewed changes

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

File Description
actions/setup/js/sanitize_content_core.cjs Adds HTML entity decoding and integrates it into the Unicode hardening pipeline prior to mention handling.
actions/setup/js/sanitize_content.test.cjs Adds test coverage for entity-decoding bypass variants and integration behavior.
scratchpad/html-entity-mention-bypass-fix.md Documents attack vectors, implementation approach, pipeline position, and test coverage summary.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +31 to +37
function hardenUnicodeText(text) {
// Step 1: Normalize Unicode (NFC)
result = result.normalize("NFC");

// Step 2: Decode HTML entities (CRITICAL - must be early)
result = decodeHtmlEntities(result);

Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardenUnicodeText() example snippet is not valid JavaScript as written: it uses result before initialization/declaring it, and it ignores the text parameter. Since this is documentation meant to illustrate the actual implementation, update the snippet to match the real flow (let result = text;, then normalize/decode, etc.) so readers don’t copy a broken example.

Copilot uses AI. Check for mistakes.
This fix follows defense-in-depth principles:
1. **Early decoding**: Entities decoded at Step 2 of Unicode hardening
2. **Comprehensive coverage**: Handles all entity types and double-encoding
3. **Validation**: Rejects invalid Unicode code points
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In “Defense in Depth”, point 3 says invalid Unicode code points are “rejected”, but the current decodeHtmlEntities() behavior keeps the original entity text unchanged when a code point is invalid. Update the wording to reflect that invalid entities are preserved/left undecoded (rather than removed), to avoid overstating what the mitigation does.

Suggested change
3. **Validation**: Rejects invalid Unicode code points
3. **Validation**: Leaves invalid Unicode entities undecoded (preserves original text)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants