Skip to content

fix: resolve KeyError and fragile array slicing in haplotypes_frequencies_advanced()`#982

Open
blankirigaya wants to merge 2 commits intomalariagen:masterfrom
blankirigaya:fixed-haq_frq
Open

fix: resolve KeyError and fragile array slicing in haplotypes_frequencies_advanced()`#982
blankirigaya wants to merge 2 commits intomalariagen:masterfrom
blankirigaya:fixed-haq_frq

Conversation

@blankirigaya
Copy link
Contributor

Summary

Fixes KeyError: 'label' that makes haplotypes_frequencies_advanced()
completely unusable, and replaces fragile positional .to_numpy() slicing
with explicit named column selection.

Closes #<ISSUE_NUMBER>

Changes

In malariagen_data/anoph/hap_frq.py:

  1. Remove premature set_index("label") call that was dropping the
    label column before it could be read
  2. Replace df.to_numpy()[:, :n] positional slices with
    df[freq_cols].to_numpy() etc. using explicit column name lists
  3. Drop the now-unnecessary set_index entirely (return value is
    ds_out, not df_haps_sorted)
# Before
df_haps_sorted["label"] = [...]
df_haps_sorted.set_index(keys="label", drop=True, inplace=True)
ds_out["variant_label"] = "variants", df_haps_sorted["label"]   # KeyError
ds_out["event_frequency"] = ..., df_haps_sorted.to_numpy()[:, :n]
ds_out["event_count"]     = ..., df_haps_sorted.to_numpy()[:, n:2*n]
ds_out["event_nobs"]      = ..., df_haps_sorted.to_numpy()[:, 2*n:-2]

# After
labels = ["H" + str(i) for i in range(len(df_haps_sorted))]
df_haps_sorted["label"] = labels
freq_cols  = [c for c in df_haps_sorted.columns if c.startswith("frq_")]
count_cols = [c for c in df_haps_sorted.columns if c.startswith("count_")]
nobs_cols  = [c for c in df_haps_sorted.columns if c.startswith("nobs_")]
ds_out["variant_label"]   = "variants", df_haps_sorted["label"].values
ds_out["event_frequency"] = ("variants", "cohorts"), df_haps_sorted[freq_cols].to_numpy()
ds_out["event_count"]     = ("variants", "cohorts"), df_haps_sorted[count_cols].to_numpy()
ds_out["event_nobs"]      = ("variants", "cohorts"), df_haps_sorted[nobs_cols].to_numpy()

Files Changed

  • malariagen_data/anoph/hap_frq.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant