Skip to content

Fix four bugs in plot_template_data.py causing compute_oracle and pipeline to fail#90

Draft
Copilot wants to merge 2 commits intosess1from
copilot/sub-pr-89
Draft

Fix four bugs in plot_template_data.py causing compute_oracle and pipeline to fail#90
Copilot wants to merge 2 commits intosess1from
copilot/sub-pr-89

Conversation

Copy link

Copilot AI commented Feb 26, 2026

plot_template_data.py had multiple bugs that collectively caused the entire script to fail silently or with errors — the most subtle being that select_variables_and_clean returned an empty table, making all downstream logic operate on no data.

Bugs fixed

  • select_variables_and_clean (root cause): groupby(keys).count() replaced real effectif values with row counts. Since "Session" is part of the key, every (Session, formation) combo is unique — so groups[cible] > 1 always produced an empty DataFrame. Replaced with duplicated(subset=keys, keep=False) + inversion to keep only rows with a unique key combination while preserving actual values:

    # Before (broken)
    groups = df[[*keys, cible]].groupby(keys).count()
    filtered = groups[groups[cible] > 1].reset_index(drop=False)
    mask = filtered.duplicated(subset=keys, keep=False)
    return filtered[~mask][[*keys, cible]], cible
    
    # After
    subset = df[[*keys, cible]]
    mask = subset.duplicated(subset=keys, keep=False)
    return subset[~mask].reset_index(drop=True), cible
  • compute_oracle: .dropna(axis=0) was commented out as # fails, but it is required — without it the pivot table retains NaN for formations present in only one year, causing mean_absolute_error to raise ValueError. Uncommented it.

  • split_train_test: Parameter typo cublecible, and test-set rows used train_test mask instead of ~train_test.

  • make_pipeline: c != "cible" (string literal) → c != cible (variable), so the target column is correctly excluded from the feature list.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: xadupre <22452781+xadupre@users.noreply.github.com>
Copilot AI changed the title [WIP] Address feedback on add datasets teaching PR Fix four bugs in plot_template_data.py causing compute_oracle and pipeline to fail Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants