Skip to content

[PULP-1200] Fix edge case when creating metadata file#1102

Open
jobselko wants to merge 1 commit intopulp:mainfrom
jobselko:fix_1101
Open

[PULP-1200] Fix edge case when creating metadata file#1102
jobselko wants to merge 1 commit intopulp:mainfrom
jobselko:fix_1101

Conversation

@jobselko
Copy link
Contributor

fixes #1101

📜 Checklist

  • Commits are cleanly separated with meaningful messages (simple features and bug fixes should be squashed to one commit)
  • A changelog entry or entries has been added for any significant changes
  • Follows the Pulp policy on AI Usage
  • (For new features) - User documentation and test coverage has been added

See: Pull Request Walkthrough

@jobselko jobselko self-assigned this Feb 12, 2026


def extract_wheel_metadata(filename: str) -> bytes | None:
def extract_non_normalized_pkg_name_with_version(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not found any function that already handles "de-normalization" so I created this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, will packaging.utils.parse_wheel_filename not work here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_wheel_filename returns a normalized name:

filename
'/var/lib/pulp/tmp/595@0ebca2e8c6ff/tmpk1042c5f/tmpiik0ieydshelf_reader-0.1-py2-none-any.whl'
parse_wheel_filename(filename.rsplit("/", 1)[1])
('tmpiik0ieydshelf-reader', <Version('0.1')>, (), frozenset({<py2-none-any @ 140595441787712>}))

but the path to the metadata file is shelf_reader-0.1.dist-info/METADATA.

Looking at https://packaging.python.org/en/latest/specifications/binary-distribution-format/#escaping-and-unicode, a distribution name should not contain any - characters, as this character separates components of the filename, so we could split the filename like this:

filename.rsplit("/", 1)[1].split("-")
['tmpiik0ieydshelf_reader', '0.1', 'py2', 'none', 'any.whl']

and then only handle the tmpiik0ieyd prefix. I will change this in my PR.
This docs page also says that "Tools producing wheels should verify that the filename components do not contain -, as the resulting file may not be processed correctly if they do.", but I am not sure if we want to / should do this (if so, then probably in a new PR).

For the rest of your questions:

  1. I think I answered this above.
  2. The path to the metadata file should look like {distribution}-{version}.dist-info/METADATA (https://packaging.python.org/en/latest/specifications/binary-distribution-format/#file-contents). So if the shortest path means "as little nesting as possible", then it makes sense.
  3. Yes. I changed one test to use setuptools-80.9.0-py3-none-any.whl. It contains setuptools-80.9.0.dist-info/METADATA, setuptools/_vendor/autocommand-2.2.2.dist-info/METADATA etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To point 1: Yes, the name in the metadata file does not have to be normalized. E.g. canonicalwebteam.blog-6.4.3-py3-none-any.whl has Name: canonicalwebteam.blog in its metadata file, but shelf_reader-0.1-py2-none-any.whl has Name: shelf-reader. So the value from metadata file may or may not match the package name in the filepath.
It seems that regex is probably required to handle the prefix (or using the "shortest metadata path").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I changed it to behave similarly to pkginfo, because regexes could introduce even more issues.

@jobselko jobselko marked this pull request as ready for review February 12, 2026 18:49
@jobselko jobselko requested a review from gerrod3 February 12, 2026 18:50
Copy link
Contributor

@gerrod3 gerrod3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I'm comfortable with this regex parsing. I feel there has to be a library we can use to find the file name.

Some questions:

  1. The name in the metadata file doesn't have to be normalized (I think). Modern packages should be but old ones like some in our fixtures have mismatches. But does the name(path) of the metadata file have to be normalized?
  2. pkg_info.wheel takes the list of all the files in the wheel and tries to find the shortest path to a METADATA file. Is this the correct method we should copy? Seems kind of crazy, is there always a guaranteed that the main package will have the shortest path for its METADATA?
  3. Do any of our packages in our fixtures have multiple metadata files? Can we find one for the tests?



def extract_wheel_metadata(filename: str) -> bytes | None:
def extract_non_normalized_pkg_name_with_version(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, will packaging.utils.parse_wheel_filename not work here?

@jobselko jobselko marked this pull request as draft February 13, 2026 15:39
@jobselko jobselko marked this pull request as ready for review February 13, 2026 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metadata file does not match the wheel metadata (PEP 658)

2 participants