feat(dev): incremental update mechanism for vector store#283
feat(dev): incremental update mechanism for vector store#283
Conversation
… adding `slug` to metadata
…AI_EMBEDDINGS_MODEL
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
This PR adds an incremental update path for the jAI Qdrant vector store to keep embeddings in sync with dictionary changes, primarily via a new update script and an automated GitHub Actions workflow triggered after production deployments.
Changes:
- Added
dev/update-vector-store.jsto upsert/delete specific dictionary word slugs in Qdrant. - Added
.github/workflows/update-vector-store.ymlto detect dictionary changes post-deploy and run incremental updates (plus manual dispatch). - Refactored
dev/seed-vector-store.jsand updateddev/README.mdto usemetadata.slugand document the new workflow; added npm scripts inpackage.json.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| package.json | Adds update:jai and update:jai:ci scripts to run the incremental update utility locally/CI. |
| dev/update-vector-store.js | New script to incrementally delete/upsert vectors by metadata.slug using production API content. |
| dev/seed-vector-store.js | Simplifies seeding by creating Documents directly from API data and storing metadata.slug. |
| dev/README.md | Documents the new incremental update script/workflow and updates seed script behavior/output. |
| .github/workflows/update-vector-store.yml | Automates incremental updates after successful production deployments and supports manual runs. |
Comments suppressed due to low confidence (1)
dev/README.md:190
- Docs say the update script handles “missing or invalid CLI arguments”, but
parseArgscurrently ignores unknown flags and flags missing a value (e.g.--upsertwith no slugs) and may just exit with “No slugs provided” rather than treating the input as invalid. Either tighten argument validation (warn/error on unknown flags or missing values) or adjust this bullet to describe the current behavior more precisely.
The script includes robust error handling for:
- Missing or invalid CLI arguments (prints usage and exits gracefully)
- Words not found on the production API (404 — warns and continues with remaining slugs)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| while IFS=$'\t' read -r status file; do | ||
| if [[ "$file" == src/content/dictionary/*.mdx ]]; then | ||
| slug=$(basename "$file" .mdx) | ||
|
|
||
| if [[ "$status" == "D" ]]; then | ||
| DELETE_SLUGS="${DELETE_SLUGS:+$DELETE_SLUGS,}$slug" | ||
| else | ||
| UPSERT_SLUGS="${UPSERT_SLUGS:+$UPSERT_SLUGS,}$slug" | ||
| fi | ||
| fi | ||
| done < <(git diff --name-status HEAD~1 -- src/content/dictionary/) |
There was a problem hiding this comment.
The change-detection loop assumes git diff --name-status returns exactly two tab-separated columns (status + file). For renames, Git emits three columns (e.g., R100\told\tnew), so this will treat the old path as an upsert and ignore the new path entirely. That can leave stale vectors for the old slug and miss the new slug. Update the parsing to handle rename statuses by recording a delete for the old slug and an upsert for the new slug (and/or use --name-status --find-renames with proper field handling).
| while IFS=$'\t' read -r status file; do | |
| if [[ "$file" == src/content/dictionary/*.mdx ]]; then | |
| slug=$(basename "$file" .mdx) | |
| if [[ "$status" == "D" ]]; then | |
| DELETE_SLUGS="${DELETE_SLUGS:+$DELETE_SLUGS,}$slug" | |
| else | |
| UPSERT_SLUGS="${UPSERT_SLUGS:+$UPSERT_SLUGS,}$slug" | |
| fi | |
| fi | |
| done < <(git diff --name-status HEAD~1 -- src/content/dictionary/) | |
| while IFS=$'\t' read -r status old_path new_path; do | |
| # Handle renames (e.g., "R100<TAB>old<TAB>new") by deleting the old slug | |
| # and upserting the new slug so the vector store stays in sync. | |
| if [[ "$status" == R* ]]; then | |
| # Old path: remove old slug if it was a dictionary entry | |
| if [[ "$old_path" == src/content/dictionary/*.mdx ]]; then | |
| old_slug=$(basename "$old_path" .mdx) | |
| DELETE_SLUGS="${DELETE_SLUGS:+$DELETE_SLUGS,}$old_slug" | |
| fi | |
| # New path: upsert new slug if it is a dictionary entry | |
| if [[ "$new_path" == src/content/dictionary/*.mdx ]]; then | |
| new_slug=$(basename "$new_path" .mdx) | |
| UPSERT_SLUGS="${UPSERT_SLUGS:+$UPSERT_SLUGS,}$new_slug" | |
| fi | |
| else | |
| file="$old_path" | |
| if [[ "$file" == src/content/dictionary/*.mdx ]]; then | |
| slug=$(basename "$file" .mdx) | |
| if [[ "$status" == "D" ]]; then | |
| DELETE_SLUGS="${DELETE_SLUGS:+$DELETE_SLUGS,}$slug" | |
| else | |
| UPSERT_SLUGS="${UPSERT_SLUGS:+$UPSERT_SLUGS,}$slug" | |
| fi | |
| fi | |
| fi | |
| done < <(git diff --name-status --find-renames HEAD~1 -- src/content/dictionary/) |
There was a problem hiding this comment.
Will hold off on this for now.... Considering we will not exactly be renaming files in our word edit flows
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Description
This pull request introduces an incremental update system for the ✨jAI vector store, allowing dictionary changes to be efficiently propagated to Qdrant without a full reseed. The update process is automated via a new GitHub Actions workflow and can be triggered manually or on production deployments tied to specific PR labels. The documentation and scripts have been refactored for clarity and maintainability, and the seeding process now attaches metadata for future incremental updates.
Incremental Vector Store Update System
.github/workflows/update-vector-store.yml) to automate incremental updates to the Qdrant vector store. It triggers on production deployments or manual dispatch, gates on PR labels, detects dictionary file changes, and runs the update script only when necessary.dev/update-vector-store.js, a script that upserts or deletes only the changed dictionary words in Qdrant. It uses CLI arguments for slugs, fetches live API data, deletes old chunks bymetadata.slug, splits content, and updates the vector store with robust error handling.package.jsonto addupdate:jaiandupdate:jai:cinpm scripts for local and CI/CD usage of the incremental update script.Seeding and Documentation Improvements
dev/seed-vector-store.js) to create LangChainDocumentobjects directly from API data, attachmetadata.slugfor all words, and remove file system dependencies, ensuring compatibility with incremental updates. [1] [2] [3]dev/README.mdto document the new incremental update workflow, CLI usage, error handling, and example outputs. The seed and update processes are clearly differentiated, and the vector store requirements for incremental updates are explained. [1] [2] [3] [4]Related Issue
Fixes #196
Screenshots/Screencasts
NA
Notes to Reviewer
Add new env entry to github secrets and variable under actions
OPENAI_API_KEY- secretOPENAI_EMBEDDINGS_MODEL- variable