Skip to content

IO refactor#133

Open
ilia-kats wants to merge 4 commits intoscverse:mainfrom
ilia-kats:io_refactor
Open

IO refactor#133
ilia-kats wants to merge 4 commits intoscverse:mainfrom
ilia-kats:io_refactor

Conversation

@ilia-kats
Copy link
Copy Markdown
Collaborator

@ilia-kats ilia-kats commented Mar 30, 2026

Switch to AnnData's public API as much as possible.

This now uses read_dispatched to enable custom logic when reading individual modalities. Unfortunately, the same approach was not possible for writing: If we're writing a backed file, only the metadata should be written, while the X should remain intact. Unfortunately, calling write_dispatched on the entire AnnData object first clears the HDF5 group completely, so the X is gone. Therefore, we still need to write the individual modalities by hand.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.50%. Comparing base (d1177b1) to head (f339957).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #133      +/-   ##
==========================================
+ Coverage   91.02%   91.50%   +0.48%     
==========================================
  Files          11       10       -1     
  Lines        1805     1754      -51     
==========================================
- Hits         1643     1605      -38     
+ Misses        162      149      -13     
Files with missing lines Coverage Δ
src/mudata/_core/io.py 95.08% <100.00%> (+0.09%) ⬆️
src/mudata/_core/mudata.py 93.18% <100.00%> (-0.01%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ilia-kats ilia-kats force-pushed the io_refactor branch 2 times, most recently from 62b142a to da9e2d3 Compare March 31, 2026 09:35
@ilia-kats ilia-kats marked this pull request as ready for review April 10, 2026 15:59
@ilia-kats ilia-kats requested review from gtca and ilan-gold April 10, 2026 15:59
@ilan-gold
Copy link
Copy Markdown
Contributor

@ilia-kats On the subject of an hdf5 blindly clearing a store, do you think that is a bug? I noticed something similar but the person on that PR scverse/anndata#2366 (review) didn't reply. If so, I think we should just restrict to subgroups.

@ilia-kats
Copy link
Copy Markdown
Collaborator Author

At least for groups, it definitely makes sense to clear them. For example, the user might have deleted elements of .obsm or columns of .obs and we don't want the old ones still hanging around in that case. The root store, at least for hdf5, is cleared anyway in non-backed mode, since in that case the file is opened in w mode. The problem is: what do we do in backed mode, in particular if the AnnData object is not at the root of the store, but e.g. in a MuData container? The group clearing currently happens in write_elem before any dispatch on groups/callbacks, so it basically clears the entire AnnData group. This makes sense if we assume that the group may have previously contained whatever, not necessarily an AnnData object, particularly since this is now part of the public API and may be used by everyone for non-AnnData non-MuData files.

I'm not sure what a good solution is. I think generally, the current behavior makes sense, X in combination with backed mode is just a very special case.

@ilan-gold
Copy link
Copy Markdown
Contributor

But the store is only cleared if you write to / theoretically - I wouldn't think writing to a subkey mod of the mudata store would clear the entire HDF5 group i.e., you write to a sub group and the whole thing gets deleted.

@ilia-kats
Copy link
Copy Markdown
Collaborator Author

When writing to a subkey of mod, you give the mod group, the modality name k, and the AnnData object to write_elem, in which case this line runs, deleting the entire group mod/k.

@ilan-gold
Copy link
Copy Markdown
Contributor

ilan-gold commented Apr 14, 2026

When writing to a subkey of mod, you give the mod group, the modality name k, and the AnnData object to write_elem, in which case this line runs, deleting the entire group mod/k.

Can't you skip the X key then in a custom write_dispatched implementation i.e., don't call write_elem(store_that_contains_X, "X", X_elem) if in backed mode?

EDIT: Ok I think I see, when you do write_dispatched on the mod, it clears the whole mod - is that the problem?

@ilia-kats
Copy link
Copy Markdown
Collaborator Author

I'm not sure if you mean what I think you mean, but I tried anndata.experimental.write_dispatched with a custom callback. However, the group is deleted before the callback is called, so I can't just do write_dispatched(mod_group, modname, adata, callback), I have to write each element of the AnnData individually, which is basically what the current code already does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants