[PULP-1200] Fix edge case when creating metadata file#1102
[PULP-1200] Fix edge case when creating metadata file#1102
Conversation
pulp_python/app/utils.py
Outdated
|
|
||
|
|
||
| def extract_wheel_metadata(filename: str) -> bytes | None: | ||
| def extract_non_normalized_pkg_name_with_version( |
There was a problem hiding this comment.
I have not found any function that already handles "de-normalization" so I created this one.
There was a problem hiding this comment.
I don't understand, will packaging.utils.parse_wheel_filename not work here?
There was a problem hiding this comment.
parse_wheel_filename returns a normalized name:
filename
'/var/lib/pulp/tmp/595@0ebca2e8c6ff/tmpk1042c5f/tmpiik0ieydshelf_reader-0.1-py2-none-any.whl'
parse_wheel_filename(filename.rsplit("/", 1)[1])
('tmpiik0ieydshelf-reader', <Version('0.1')>, (), frozenset({<py2-none-any @ 140595441787712>}))
but the path to the metadata file is shelf_reader-0.1.dist-info/METADATA.
Looking at https://packaging.python.org/en/latest/specifications/binary-distribution-format/#escaping-and-unicode, a distribution name should not contain any - characters, as this character separates components of the filename, so we could split the filename like this:
filename.rsplit("/", 1)[1].split("-")
['tmpiik0ieydshelf_reader', '0.1', 'py2', 'none', 'any.whl']
and then only handle the tmpiik0ieyd prefix. I will change this in my PR.
This docs page also says that "Tools producing wheels should verify that the filename components do not contain -, as the resulting file may not be processed correctly if they do.", but I am not sure if we want to / should do this (if so, then probably in a new PR).
For the rest of your questions:
- I think I answered this above.
- The path to the metadata file should look like
{distribution}-{version}.dist-info/METADATA(https://packaging.python.org/en/latest/specifications/binary-distribution-format/#file-contents). So if the shortest path means "as little nesting as possible", then it makes sense. - Yes. I changed one test to use
setuptools-80.9.0-py3-none-any.whl. It containssetuptools-80.9.0.dist-info/METADATA,setuptools/_vendor/autocommand-2.2.2.dist-info/METADATAetc.
There was a problem hiding this comment.
To point 1: Yes, the name in the metadata file does not have to be normalized. E.g. canonicalwebteam.blog-6.4.3-py3-none-any.whl has Name: canonicalwebteam.blog in its metadata file, but shelf_reader-0.1-py2-none-any.whl has Name: shelf-reader. So the value from metadata file may or may not match the package name in the filepath.
It seems that regex is probably required to handle the prefix (or using the "shortest metadata path").
There was a problem hiding this comment.
So I changed it to behave similarly to pkginfo, because regexes could introduce even more issues.
gerrod3
left a comment
There was a problem hiding this comment.
I'm not sure if I'm comfortable with this regex parsing. I feel there has to be a library we can use to find the file name.
Some questions:
- The name in the metadata file doesn't have to be normalized (I think). Modern packages should be but old ones like some in our fixtures have mismatches. But does the name(path) of the metadata file have to be normalized?
pkg_info.wheeltakes the list of all the files in the wheel and tries to find the shortest path to a METADATA file. Is this the correct method we should copy? Seems kind of crazy, is there always a guaranteed that the main package will have the shortest path for its METADATA?- Do any of our packages in our fixtures have multiple metadata files? Can we find one for the tests?
pulp_python/app/utils.py
Outdated
|
|
||
|
|
||
| def extract_wheel_metadata(filename: str) -> bytes | None: | ||
| def extract_non_normalized_pkg_name_with_version( |
There was a problem hiding this comment.
I don't understand, will packaging.utils.parse_wheel_filename not work here?
fixes #1101
📜 Checklist
See: Pull Request Walkthrough