Skip to content

Conversation

@patrick91
Copy link
Member

@patrick91 patrick91 commented Feb 9, 2026

Summary

  • Add NLTK_DATA env var pointing to /home/app/nltk_data with correct ownership for the app user
  • Same root cause as Add cache for HF #4574: su -p preserves HOME=/root, so NLTK defaults to /root/nltk_data which the app user cannot write to
  • docs https://www.nltk.org/data.html

Test plan

  • Deploy and trigger similar_talks analysis that downloads NLTK stopwords
  • Verify no PermissionError for /root/nltk_data in logs

The entrypoint uses `su -p app` which preserves HOME=/root from the
root user. HuggingFace and NLTK default to ~/cache paths, resolving
to /root/ which the app user cannot write to.

Set HF_HOME and NLTK_DATA env vars to writable directories owned by
the app user.
@vercel
Copy link

vercel bot commented Feb 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
pycon Building Building Preview Feb 9, 2026 8:21pm

@claude
Copy link
Contributor

claude bot commented Feb 9, 2026

Dockerfile change adds NLTK_DATA environment variable and creates the directory with proper ownership for the app user, fixing permission errors when NLTK downloads stopwords data.

The fix correctly addresses the root cause: su -p preserves HOME=/root, so without the explicit NLTK_DATA env var, NLTK would default to /root/nltk_data which the app user cannot write to. The solution mirrors the existing HF_HOME pattern.

No issues found.

@patrick91 patrick91 merged commit 6e0957b into main Feb 9, 2026
7 of 8 checks passed
@patrick91 patrick91 deleted the fix/hf-nltk-cache-permissions branch February 9, 2026 20:22
@codecov
Copy link

codecov bot commented Feb 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.52%. Comparing base (32fc459) to head (4c63602).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4575   +/-   ##
=======================================
  Coverage   92.52%   92.52%           
=======================================
  Files         357      357           
  Lines       10690    10690           
  Branches      812      812           
=======================================
  Hits         9891     9891           
  Misses        687      687           
  Partials      112      112           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant