Fix invalid url escaping - replaced url parsing from url.parse to new…#178
Fix invalid url escaping - replaced url parsing from url.parse to new…#178dukei wants to merge 2 commits intohttptoolkit:mainfrom
Conversation
|
|
pimterry
left a comment
There was a problem hiding this comment.
Thanks! This makes sense, and I think in general it's probably the right direction to go. I've left some comments though - this is more complicated than it appears.
One other notable case is websockets - much of this same logic also appears in a very similar format in websocket-handler, and that needs to stay in sync with this.
Once this is raedy, I will probably release it as a major bump, since it might change some behaviour in meaningful ways, but that's OK, I think it's about the right time. Given that though, it would probably be sensible to mop up the many other url.parse usages in the rest of the codebase, to clean everything up fully and parse URLs consistently & more accurately everywhere. Is that something you'd be happy to look into? That's not strictly required to merge this as a useful first step, but it would be useful if you're interested.
| clientReq.remoteIpAddress, | ||
| getDnsLookupFunction(this.lookupOptions) | ||
| ); | ||
| ) || ""; |
There was a problem hiding this comment.
Is this correct? Currently it looks like we previously assumed hostname really could be null here, and accepted that (and getClientRelativeHostname preserves that). That's not great - I think in reality it is always set, since req.url is always an absolute URL in our case (we preprocess to ensure this in MockttpServer).
However, setting it to "" isn't very good either and might do all sorts of weird things.
In reality, with new URL() I think hostname is now always set, and that's recognized in the types, so really we should change getClientRelativeHostname here - either to remove the null support entirely so it's always set, or to improve the types so we know that string hostname in => string hostname out.
| } // Otherwise: falsey means don't touch it. | ||
|
|
||
| reqUrl = new URL(`${protocol}//${hostname}${(port ? `:${port}` : '')}${path}`).toString(); | ||
| reqUrl = new URL(`${protocol}//${hostname}${(port ? `:${port}` : '')}${pathname}${search}`).toString(); |
There was a problem hiding this comment.
Unfortunately, there are other differences here where the url.parse output is preferable... One notable example is a simple ? at the end of the path:
console.log(url.parse('https://example.com/?').search);
// '?'
console.log(new URL('https://example.com/?').search)
// ''These might be semantically equivalent in theory, but they're not actually the same in practice (a server will literally receive a different string, and can legitimately do different things based on that if it so chooses). With this change, when proxying a request to that URL through Node, the client will send a slightly different request to the server that it expected. For more context, there's a WHATWG discussion on this that you might find interesting at whatwg/url#779 (I've just added a comment there with my perspective).
This is not great, but of course you do make a good argument about the other differences where url.parse is wrong too! I think maybe we might want to make our own URL parser that wraps the two to use WHATWG parsing as standard but with a few extra checks for details like this that get lost otherwise. What do you think?
cb1a2e0 to
a0be747
Compare
This pull requests replaces url handling in request processing.
REASON
Some urls can change due to legacy parsing.
For example, parsing the following url
https://example.com:8080/validate?param=\\/|+-results in the following:
url.parse:https://example.com:8080/validate?param=%5C/%7C+-new URL:https://example.com:8080/validate?param=\\/|+-It can lead to incompatibilities with some sites.
As stated by NodeJS docs
url.parseis deprecated due to non-standard parsing and error-prone behavior. URL should be used instead.