Fix #2685: Redirect Latest News links to blog.python.org#2933
Fix #2685: Redirect Latest News links to blog.python.org#2933iampujan wants to merge 3 commits intopython:mainfrom
Conversation
- Updates python.org RSS parser to rewrite pythoninsider.blogspot.com URLs over to the canonical blog.python.org host. - Adds data migration to fix existing BlogEntry links previously imported from the old blogger domain. - Covers URL domain replacement inside test_parser.py.
There was a problem hiding this comment.
Pull request overview
This PR fixes issue #2685 by rewriting legacy pythoninsider.blogspot.com URLs to the canonical blog.python.org domain. The fix ensures that "Latest News" links on the Python homepage direct users to blog.python.org instead of the outdated blogspot subdomain.
Changes:
- Added URL rewriting logic in the RSS parser to transform blogspot URLs during feed processing
- Implemented a data migration to update existing database entries with the new URLs
- Added unit test with mocked feed data to verify URL transformation behavior
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| apps/blogs/parser.py | Adds string replacement logic to rewrite pythoninsider.blogspot.com to blog.python.org during RSS feed parsing |
| apps/blogs/tests/test_parser.py | Adds new test with mocked feedparser to verify URL rewriting functionality |
| apps/blogs/migrations/0004_auto_20260222_0834.py | Data migration to update existing BlogEntry URLs in the database from blogspot to blog.python.org |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Rewrite Blogger domains to canonical python.org domain (Issue #2685) | ||
| url = e["link"].replace( | ||
| "//pythoninsider.blogspot.com", | ||
| "//blog.python.org" | ||
| ) |
There was a problem hiding this comment.
URL rewriting uses a plain string replacement on the full URL. This can accidentally rewrite URLs whose hostname only starts with pythoninsider.blogspot.com (e.g., pythoninsider.blogspot.com.evil.com) and doesn’t explicitly validate the netloc. Consider parsing the URL and rewriting the hostname only when netloc exactly matches pythoninsider.blogspot.com (then reconstruct the URL), which is both safer and more robust.
| url = e["link"].replace( | ||
| "//pythoninsider.blogspot.com", | ||
| "//blog.python.org" | ||
| ) |
There was a problem hiding this comment.
This new multiline replace() call formatting doesn’t match the repo’s enforced ruff-format output (see pyproject.toml [tool.ruff.format] quote-style = "double"). Running ruff format / pre-commit should reformat this block (likely collapsing it to a single line and adjusting commas/parentheses); please apply the formatter so CI lint passes.
| url = e["link"].replace( | |
| "//pythoninsider.blogspot.com", | |
| "//blog.python.org" | |
| ) | |
| url = e["link"].replace("//pythoninsider.blogspot.com", "//blog.python.org") |
| } | ||
| ] |
There was a problem hiding this comment.
This new mocked return value literal is not ruff-formated (e.g., missing trailing commas in a multiline list/dict). Since CI runs pre-commit with ruff-format, please run ruff format (or pre-commit) to avoid formatting-only lint failures.
| } | |
| ] | |
| }, | |
| ], |
| BlogEntry.objects.bulk_update(entries_to_update, ['url']) | ||
|
|
||
|
|
||
| class Migration(migrations.Migration): | ||
|
|
||
| dependencies = [ | ||
| ('blogs', '0003_alter_relatedblog_creator_and_more'), |
There was a problem hiding this comment.
This migration file doesn’t follow the repo’s enforced ruff-format style (notably quote-style = "double" in pyproject.toml; e.g., ['url'] and ('blogs', ...) will be reformatted). Please run ruff format / pre-commit so the lint workflow doesn’t fail on formatting-only differences.
| BlogEntry.objects.bulk_update(entries_to_update, ['url']) | |
| class Migration(migrations.Migration): | |
| dependencies = [ | |
| ('blogs', '0003_alter_relatedblog_creator_and_more'), | |
| BlogEntry.objects.bulk_update(entries_to_update, ["url"]) | |
| class Migration(migrations.Migration): | |
| dependencies = [ | |
| ("blogs", "0003_alter_relatedblog_creator_and_more"), |
Fixes #2685. Adds URL manipulation logic during RSS parsing to rewrite legacy pythoninsider.blogspot.com URLs up to the canonical python.org domains. Also includes a data migration to process any preexisting database entries and a new mock unit test.