Skip to content

fix: copy size array before reorder in multi-color scatter plots#4040

Open
Ekin-Kahraman wants to merge 3 commits intoscverse:mainfrom
Ekin-Kahraman:fix/size-array-reorder-4024
Open

fix: copy size array before reorder in multi-color scatter plots#4040
Ekin-Kahraman wants to merge 3 commits intoscverse:mainfrom
Ekin-Kahraman:fix/size-array-reorder-4024

Conversation

@Ekin-Kahraman
Copy link
Copy Markdown
Contributor

Summary

Fixes #4024.

  • Root cause: In embedding(), the per-point size array was reordered in-place (size = np.array(size)[order]) inside the per-color loop. Each iteration's z-order permutation was applied to the already-reordered array from the previous iteration, causing cumulative corruption of marker sizes across subplots.
  • Fix: Use a loop-local variable _size so each iteration reorders from the original size array. This matches the behavior of color_source_vector, color_vector, and coords, which are all freshly derived each iteration.

Changes

  • src/scanpy/plotting/_tools/scatterplots.py: Replace in-place mutation of size with a loop-local _size variable; update all downstream references within the loop body.
  • tests/test_plotting.py: Add test_scatter_size_not_mutated_across_panels regression test that verifies the input size array is not modified after plotting with multiple color keys.

Test plan

  • test_scatter_size_not_mutated_across_panels passes — confirms the size array is preserved
  • Existing scatter/embedding tests continue to pass (no behavioral change for single-panel plots or scalar sizes)

Ekin-Kahraman and others added 2 commits April 8, 2026 02:22
When plotting with multiple color keys and a per-point size array,
the size variable was reordered in-place within the per-color loop.
Each subsequent subplot applied its z-order permutation to the
already-reordered array from the previous iteration, causing
cumulative corruption of marker sizes.

Use a loop-local variable `_size` so each iteration reorders from
the original array.

Closes scverse#4024

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.61%. Comparing base (9bc2c1e) to head (53581e2).
⚠️ Report is 4 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4040      +/-   ##
==========================================
+ Coverage   78.51%   78.61%   +0.10%     
==========================================
  Files         117      117              
  Lines       12753    12726      -27     
==========================================
- Hits        10013    10005       -8     
+ Misses       2740     2721      -19     
Flag Coverage Δ
hatch-test.low-vers 77.91% <100.00%> (+0.11%) ⬆️
hatch-test.pre 78.50% <100.00%> (+13.66%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/scanpy/plotting/_tools/scatterplots.py 83.48% <100.00%> (-0.04%) ⬇️

... and 3 files with indirect coverage changes

Copy link
Copy Markdown

@JhonatanFelix JhonatanFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix!

I think that you could strengthen the regression test on test_scatter_size_not_mutated_across_panels because currently it cheacks that the caller's original sizes array is unchanged, but the bug in #4024 is really that later subplots receive the wrong per-point sizes.

Would it make sense to asset the plotted sizes from the matplotliv artists for each axis, so the test proves that all pannels get the same size mapping?? Another thing is that the issue's reproduction used three panels (["louvain", "n_genes", "n_counts"]), which might exercise the cumulative reorder more clearly than the current 2 panels case

sizes = rng.uniform(10, 200, size=pbmc.n_obs)
sizes_original = sizes.copy()

axes = sc.pl.umap(pbmc, color=["louvain", "n_genes"], size=sizes, show=False)
Copy link
Copy Markdown

@JhonatanFelix JhonatanFelix Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable axes is not getting used afterwards and giving problems with the ruff check, you can remove it and just call directly the sc.pl.umap(...) or call it as _ = sc.pl.umap(...).

edit: I made a small PR just to take off the variable.

Strengthen the regression test per review feedback:
- Use 3 color keys (categorical + two continuous) matching the original
  issue reproduction, exercising the cumulative reorder more clearly.
- Extract actual plotted sizes from each axis's PathCollection and
  verify they match the expected size distribution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Ekin-Kahraman
Copy link
Copy Markdown
Contributor Author

Thanks for the review @JhonatanFelix! Both great suggestions — I've pushed an update that:

  1. Uses 3 color keys (["louvain", "n_genes", "n_counts"]) matching the original issue reproduction, so the test exercises the cumulative reorder across a categorical + two continuous panels.
  2. Extracts the actual plotted sizes from each axis's PathCollection via get_sizes() and asserts that the sorted sizes match the expected array for every panel, not just checking that the caller's input is unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Per-point size array is cumulatively reordered across multi-color subplots

2 participants