HTML API: Ensure bookmark exaustion does not error#10616
HTML API: Ensure bookmark exaustion does not error#10616sirreal wants to merge 25 commits intoWordPress:trunkfrom
Conversation
| * @throws Exception When unable to allocate a bookmark for the next token in the input HTML document. | ||
| * |
There was a problem hiding this comment.
bookmark_token() actually throws, but it's not handled here. The annotation may not be appropriate.
| * otherwise might involve messier calling and return conventions. | ||
| */ | ||
| return false; | ||
| } catch ( Exception $e ) { |
There was a problem hiding this comment.
Exhausted bookmarks throw a generic Exception.
This block catches the exceptions thrown by insert_virtual_token().
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
Co-authored-by: Weston Ruter <westonruter@gmail.com>
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN: To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
There was a problem hiding this comment.
Pull request overview
This PR fixes an issue where bookmark exhaustion (typically from deep HTML nesting) causes the HTML Processor to throw an Exception. Instead of throwing exceptions, the processor now returns false when bookmark exhaustion is detected, sets an error state, and stops processing gracefully.
Changes:
- Modified
bookmark_token()to returnfalseinstead of throwing an exception when MAX_BOOKMARKS is exceeded - Updated all callers of
bookmark_token()andinsert_virtual_node()to check for and handlefalsereturn values - Removed outdated
@throwsdocumentation from methods that no longer throw exceptions - Added comprehensive tests to verify the processor handles extreme nesting without throwing exceptions
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/wp-includes/html-api/class-wp-html-processor.php |
Core implementation changes: modified bookmark_token() and insert_virtual_node() to return false on failure instead of throwing; added error checks in step(), step_before_html(), step_before_head(), step_after_head(), step_in_table(), and step_in_table_body() methods; updated PHPDoc to remove exception references |
tests/phpunit/tests/html-api/wpHtmlProcessor.php |
Added two comprehensive tests verifying the processor handles bookmark exhaustion gracefully for both regular and virtual tokens; fixed annotation from @group to @ticket |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
dmsnell
left a comment
There was a problem hiding this comment.
I definitely don’t love how pervasive and deep our error-checking has become here; did you attempt to use ->bail()? the existing code is throwing because that part of step() isn’t wrapped in a try/catch, right? what if we continue to throw inside bookmark_token() but trap that and handle aborting the same way we handle all deep control flow issues?
Me neither 😕
I did try using We could consider having
I have that in an earlier commit. It wasn't as pervasive (at b02d679): wordpress-develop/src/wp-includes/html-api/class-wp-html-processor.php Lines 1043 to 1050 in b02d679 wordpress-develop/src/wp-includes/html-api/class-wp-html-processor.php Lines 1157 to 1169 in b02d679 |
Co-authored-by: Weston Ruter <westonruter@gmail.com>
Why did you change it @sirreal? did I ask you to? 🙃 🤦♂️ |
|
No, I was just iterating and exploring. The method actually already had a I can't say I like the general Exception catching either, but it works. I don't have a strong preference, would you prefer that other version @dmsnell? |
westonruter
left a comment
There was a problem hiding this comment.
LGTM, although I defer to the expertise of @dmsnell.
|
@sirreal is there a copy of the exception-catching version available? what do you not like about it, or prefer about adding |
@dmsnell This diff should give an idea
I don't feel strongly one way or the other. Relying on error handling does reduce the checks required. My biggest concern is that it depends on methods being called inside of Some background to consider (all considering
The main thing I disliked was the general wordpress-develop/src/wp-includes/html-api/class-wp-html-processor.php Lines 1043 to 1050 in b02d679 wordpress-develop/src/wp-includes/html-api/class-wp-html-processor.php Lines 1157 to 1169 in b02d679 Perhaps that check should have compared the known message: I also considered introducing another HTML API |
|
The exception-throwing and catching version definitely suits me more, notably because it avoids exporting these details to the callers. I also think this is a middle-path to excluding internal bookmarks, so spreading out the protection throughout the class seems like it’s going to be code we then have to find and remove later (though maybe the recent PHPStan changes will help surface those cases).
If our goal is to never fail under normal usage and the failed bookmarks are only there to prevent infinite loops or memory-abuse, then perhaps we add a test to ensure that setting the bookmark fails and also that it doesn’t throw. |
I've restored the throw/catch version of these change.
The PR includes tests like that. They test the processing behavior without testing the bookmark functions directly. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
dmsnell
left a comment
There was a problem hiding this comment.
As a stop-gap I’m approving this, but there are a number of things I am still wrestling with.
- We can and I believe we should update the
MAX_BOOKMARKSto 10,000 or possibly 6,000 (if we think we should be overly conservative, which I don’t). My approval stands if we want to update the count in this PR. When we limited the bookmarks in the HTML Processor we didn’t know what a good subjective value would be, but I believe that we’ve learned since then that we are safe to open it up to a point where we can parse all reasonable HTML documents on the internet, which is somewhere around 3,000 or 6,000 depending on whether we want to be able to parse clearly spam content or not. - This is highlighting other work you’ve done to separate the namespace of internal and external bookmarks, or of eliminating them.
- The tests are hard to reason about, and they are making active assertions that the processor fails in tasks we want it to succeed in. I know we have to setup the test data in ways that should hit the failure point, but I don’t like knowing the fact that an update to fix the code will cause the tests to fail. This is not the same category for me as “we want to ensure nobody fixes this one piece without also fixing these other pieces” but more like we are writing a test that will bite us later.
Because I don’t want the tests to stand in the way I am giving the approval, but I would be happy as well if we either updated the tests to focus on the behaviors we want to enforce (that user-space bookmarking hits a limit), or even if we just removed the tests for now. I am nervous is all.
But yeah, please feel encouraged to bump the MAX_BOOKMARKS in this before merge unless you know a reason not to.
| WP_HTML_Processor::ERROR_EXCEEDED_MAX_BOOKMARKS, | ||
| $processor->get_last_error(), | ||
| 'Failed to report exceeded-max-bookmarks error.' | ||
| ); |
There was a problem hiding this comment.
this test confuses me.
Not sure if you noticed, but we should have $this->assertEqualsWithDelta( MAX_BOOKMARKS, $reached_tokens, 1 ); available to say "within 1 of the max"
but why the variability? and what does “depending on how the tokens align” mean? it feels flakey or not-fully-understood to me, and I get nervous adding an assertion of that in case we’re testing a happenstance detail and not a contract.
There was a problem hiding this comment.
This test is very tricky, and imperfect as it is. I want to confirm that these don't throw errors during processing, which is the big thing I want to address in this PR.
It would be fine to change this to assert that no assertions are thrown.
This test in particular was difficult because the virtual tokens had another failure path. Consider:
<table><td><table><td> produces:
└─TABLE (real)
└─TBODY (virtual)
└─TR (virtual)
└─TD (real)
└─TABLE (real)
└─TBODY (virtual)
└─TR (virtual)
└─TD (real)
To make this test more robust, it would probably need to process 3 times:
<table><td>…<div><table><td>…<div><div><table><td>…
Since there are pairs of real tokens TD (real) + TABLE (real), running with 0, 1, 2 offsets should ensure that the virtual token lands on the threshold at least once.
When bookmark exhaustion occurs during processing, return `false` instead of throwing an `Exception`. Developed in #10616. Props jonsurrell, westonruter, dmsnell. Fixes #64394. git-svn-id: https://develop.svn.wordpress.org/trunk@61755 602fd350-edb4-49c9-b593-d223f7449a82
When bookmark exhaustion occurs during processing, return `false` instead of throwing an `Exception`. Developed in WordPress/wordpress-develop#10616. Props jonsurrell, westonruter, dmsnell. Fixes #64394. Built from https://develop.svn.wordpress.org/trunk@61755 git-svn-id: http://core.svn.wordpress.org/trunk@61061 1a063a9b-81f0-0310-95a4-ce76da25c4cd
Bookmark exhaustion, typically from deep nesting, can cause the HTML Processor to throw an Exception.
Rather than throwing an exception, return
falsewhen bookmark exhaustion is detected. An error is set on the processor and processing stops.Trac ticket: https://core.trac.wordpress.org/ticket/64394
This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.