Python: Fix syntax error when = is used as a format fill character#21274
Python: Fix syntax error when = is used as a format fill character#21274
= is used as a format fill character#21274Conversation
An example (provided by @redsun82) is the string `f"{x:=^20}"`. Parsing this (with unnamed nodes shown) illustrates the problem: ``` module [0, 0] - [2, 0] expression_statement [0, 0] - [0, 11] string [0, 0] - [0, 11] string_start [0, 0] - [0, 2] interpolation [0, 2] - [0, 10] "{" [0, 2] - [0, 3] expression: named_expression [0, 3] - [0, 9] name: identifier [0, 3] - [0, 4] ":=" [0, 4] - [0, 6] ERROR [0, 6] - [0, 7] "^" [0, 6] - [0, 7] value: integer [0, 7] - [0, 9] "}" [0, 9] - [0, 10] string_end [0, 10] - [0, 11] ``` Observe that we've managed to combine the format specifier token `:` and the fill character `=` in a single token (which doesn't match the `:` we expect in the grammar rule), and hence we get a syntax error. If we change the `=` to some other character (e.g. a `-`), we instead get ``` module [0, 0] - [2, 0] expression_statement [0, 0] - [0, 11] string [0, 0] - [0, 11] string_start [0, 0] - [0, 2] interpolation [0, 2] - [0, 10] "{" [0, 2] - [0, 3] expression: identifier [0, 3] - [0, 4] format_specifier: format_specifier [0, 4] - [0, 9] ":" [0, 4] - [0, 5] "}" [0, 9] - [0, 10] string_end [0, 10] - [0, 11] ``` and in particular no syntax error. To fix this, we want to ensure that the `:` is lexed on its own, and the `token(prec(1, ...))` construction can be used to do exactly this. Finally, you may wonder why `=` is special here. I think what's going on is that the lexer knows that `:=` is a token on its own (because it's used in the walrus operator), and so it greedily consumes the following `=` with this in mind.
There was a problem hiding this comment.
Pull request overview
This pull request fixes a syntax error that occurred when using = as a fill character in f-string format specifiers (e.g., f"{x:=^20}"). The issue was caused by the lexer greedily consuming := as a single token (the walrus operator) instead of lexing : separately followed by =.
Changes:
- Modified the grammar to use
token(prec(1, ':'))in format specifiers to ensure:is lexed independently - Added test cases for the fixed behavior in both
strings.pyandtemplate_strings_new.py - Regenerated tree-sitter parser artifacts (grammar.json, node-types.json, parser.h, array.h)
- Bumped extractor version from 7.1.7 to 7.1.8
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
python/ql/lib/change-notes/2026-02-05-fix-format-fill-character-misparse.md |
Documents the fix for format fill character parsing |
python/extractor/tsg-python/tsp/grammar.js |
Core fix: wraps : in format_specifier with token(prec(1, ...)) |
python/extractor/tsg-python/tsp/src/grammar.json |
Regenerated from grammar.js with the format_specifier change |
python/extractor/tsg-python/tsp/src/node-types.json |
Regenerated parser metadata |
python/extractor/tsg-python/tsp/src/tree_sitter/parser.h |
Updated tree-sitter runtime header |
python/extractor/tsg-python/tsp/src/tree_sitter/array.h |
Updated tree-sitter runtime header |
python/extractor/tests/parser/strings.py |
Added test case for f-string with = fill character |
python/extractor/tests/parser/template_strings_new.py |
Added test case for template string with format specifier |
python/extractor/tests/parser/template_strings_new.expected |
Regenerated expected output including new test |
python/extractor/semmle/util.py |
Version bump to 7.1.8 |
| if 6: | ||
| t"Implicit concatenation: " t"Hello, {name}!" t" How are you?" | ||
| if 7: | ||
| t"With a format specifier: {name:=^20}" |
There was a problem hiding this comment.
Syntax Error (in Python 3).
See below for a potential fix:
""
if 2:
f"Hello, {name}!"
if 3:
f"Value: {value:.2f}, Hex: {value:#x}"
if 4:
"Just a regular string."
if 5:
f"Multiple {first} and {second} placeholders."
if 6:
"Implicit concatenation: " f"Hello, {name}!" " How are you?"
if 7:
f"With a format specifier: {name:=^20}"
There was a problem hiding this comment.
This is pretty funny -- the alert is based on the current analysis, which indeed has a syntax error here (because of the parser issue that this PR fixes).
redsun82
left a comment
There was a problem hiding this comment.
LGTM, thanks for the quick fix!
I understand the tree_sitter C/C++ header changes may come from a tooling version bump. Might it make sense to mark those files as generated too?
Yeah, I really ought to exclude anything in I'll create a separate PR for this. |
An example (provided by @redsun82 from a report by @grahamcracker1234) is the string
f"{x:=^20}". Parsing this (with unnamed nodes shown) illustrates the problem:Observe that we've managed to combine the format specifier token
:and the fill character=in a single token (which doesn't match the:we expect in the grammar rule), and hence we get a syntax error.If we change the
=to some other character (e.g. a-), we instead getand in particular no syntax error.
To fix this, we want to ensure that the
:is lexed on its own, and thetoken(prec(1, ...))construction can be used to do exactly this.Finally, you may wonder why
=is special here. I think what's going on is that the lexer knows that:=is a token on its own (because it's used in the walrus operator), and so it greedily consumes the following=with this in mind.