Skip to content

feat(dev): incremental update mechanism for vector store#283

Merged
babblebey merged 10 commits intomainfrom
sync-vector-db
Mar 4, 2026
Merged

feat(dev): incremental update mechanism for vector store#283
babblebey merged 10 commits intomainfrom
sync-vector-db

Conversation

@babblebey
Copy link
Member

@babblebey babblebey commented Mar 3, 2026

Description

This pull request introduces an incremental update system for the ✨jAI vector store, allowing dictionary changes to be efficiently propagated to Qdrant without a full reseed. The update process is automated via a new GitHub Actions workflow and can be triggered manually or on production deployments tied to specific PR labels. The documentation and scripts have been refactored for clarity and maintainability, and the seeding process now attaches metadata for future incremental updates.

Incremental Vector Store Update System

  • Added a new GitHub Actions workflow (.github/workflows/update-vector-store.yml) to automate incremental updates to the Qdrant vector store. It triggers on production deployments or manual dispatch, gates on PR labels, detects dictionary file changes, and runs the update script only when necessary.
  • Introduced dev/update-vector-store.js, a script that upserts or deletes only the changed dictionary words in Qdrant. It uses CLI arguments for slugs, fetches live API data, deletes old chunks by metadata.slug, splits content, and updates the vector store with robust error handling.
  • Updated package.json to add update:jai and update:jai:ci npm scripts for local and CI/CD usage of the incremental update script.

Seeding and Documentation Improvements

  • Refactored the seeding script (dev/seed-vector-store.js) to create LangChain Document objects directly from API data, attach metadata.slug for all words, and remove file system dependencies, ensuring compatibility with incremental updates. [1] [2] [3]
  • Overhauled dev/README.md to document the new incremental update workflow, CLI usage, error handling, and example outputs. The seed and update processes are clearly differentiated, and the vector store requirements for incremental updates are explained. [1] [2] [3] [4]

Related Issue

Fixes #196

Screenshots/Screencasts

NA

Notes to Reviewer

Add new env entry to github secrets and variable under actions

  • OPENAI_API_KEY - secret
  • OPENAI_EMBEDDINGS_MODEL - variable

@vercel
Copy link

vercel bot commented Mar 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
jargons-dev Ready Ready Preview, Comment Mar 4, 2026 5:39am

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an incremental update path for the jAI Qdrant vector store to keep embeddings in sync with dictionary changes, primarily via a new update script and an automated GitHub Actions workflow triggered after production deployments.

Changes:

  • Added dev/update-vector-store.js to upsert/delete specific dictionary word slugs in Qdrant.
  • Added .github/workflows/update-vector-store.yml to detect dictionary changes post-deploy and run incremental updates (plus manual dispatch).
  • Refactored dev/seed-vector-store.js and updated dev/README.md to use metadata.slug and document the new workflow; added npm scripts in package.json.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
package.json Adds update:jai and update:jai:ci scripts to run the incremental update utility locally/CI.
dev/update-vector-store.js New script to incrementally delete/upsert vectors by metadata.slug using production API content.
dev/seed-vector-store.js Simplifies seeding by creating Documents directly from API data and storing metadata.slug.
dev/README.md Documents the new incremental update script/workflow and updates seed script behavior/output.
.github/workflows/update-vector-store.yml Automates incremental updates after successful production deployments and supports manual runs.
Comments suppressed due to low confidence (1)

dev/README.md:190

  • Docs say the update script handles “missing or invalid CLI arguments”, but parseArgs currently ignores unknown flags and flags missing a value (e.g. --upsert with no slugs) and may just exit with “No slugs provided” rather than treating the input as invalid. Either tighten argument validation (warn/error on unknown flags or missing values) or adjust this bullet to describe the current behavior more precisely.
The script includes robust error handling for:
- Missing or invalid CLI arguments (prints usage and exits gracefully)
- Words not found on the production API (404 — warns and continues with remaining slugs)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +108 to +118
while IFS=$'\t' read -r status file; do
if [[ "$file" == src/content/dictionary/*.mdx ]]; then
slug=$(basename "$file" .mdx)

if [[ "$status" == "D" ]]; then
DELETE_SLUGS="${DELETE_SLUGS:+$DELETE_SLUGS,}$slug"
else
UPSERT_SLUGS="${UPSERT_SLUGS:+$UPSERT_SLUGS,}$slug"
fi
fi
done < <(git diff --name-status HEAD~1 -- src/content/dictionary/)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change-detection loop assumes git diff --name-status returns exactly two tab-separated columns (status + file). For renames, Git emits three columns (e.g., R100\told\tnew), so this will treat the old path as an upsert and ignore the new path entirely. That can leave stale vectors for the old slug and miss the new slug. Update the parsing to handle rename statuses by recording a delete for the old slug and an upsert for the new slug (and/or use --name-status --find-renames with proper field handling).

Suggested change
while IFS=$'\t' read -r status file; do
if [[ "$file" == src/content/dictionary/*.mdx ]]; then
slug=$(basename "$file" .mdx)
if [[ "$status" == "D" ]]; then
DELETE_SLUGS="${DELETE_SLUGS:+$DELETE_SLUGS,}$slug"
else
UPSERT_SLUGS="${UPSERT_SLUGS:+$UPSERT_SLUGS,}$slug"
fi
fi
done < <(git diff --name-status HEAD~1 -- src/content/dictionary/)
while IFS=$'\t' read -r status old_path new_path; do
# Handle renames (e.g., "R100<TAB>old<TAB>new") by deleting the old slug
# and upserting the new slug so the vector store stays in sync.
if [[ "$status" == R* ]]; then
# Old path: remove old slug if it was a dictionary entry
if [[ "$old_path" == src/content/dictionary/*.mdx ]]; then
old_slug=$(basename "$old_path" .mdx)
DELETE_SLUGS="${DELETE_SLUGS:+$DELETE_SLUGS,}$old_slug"
fi
# New path: upsert new slug if it is a dictionary entry
if [[ "$new_path" == src/content/dictionary/*.mdx ]]; then
new_slug=$(basename "$new_path" .mdx)
UPSERT_SLUGS="${UPSERT_SLUGS:+$UPSERT_SLUGS,}$new_slug"
fi
else
file="$old_path"
if [[ "$file" == src/content/dictionary/*.mdx ]]; then
slug=$(basename "$file" .mdx)
if [[ "$status" == "D" ]]; then
DELETE_SLUGS="${DELETE_SLUGS:+$DELETE_SLUGS,}$slug"
else
UPSERT_SLUGS="${UPSERT_SLUGS:+$UPSERT_SLUGS,}$slug"
fi
fi
fi
done < <(git diff --name-status --find-renames HEAD~1 -- src/content/dictionary/)

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will hold off on this for now.... Considering we will not exactly be renaming files in our word edit flows

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@babblebey babblebey merged commit 10d35bf into main Mar 4, 2026
6 checks passed
@babblebey babblebey deleted the sync-vector-db branch March 4, 2026 06:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vector Store Synchronization on New/Edit Word Action

2 participants