Repeated tree updates need a batch-aware apply engine
Summary
While benchmarking Plate against Slate on huge documents, I ran into a large transform cliff in Slate for repeated exact-path node updates.
The original trigger on the Plate side was fixed separately, but the Slate benchmark showed a more general problem:
- repeated exact-path updates on wide sibling arrays are dominated by repeated immutable branch rewrites
normalize is not the main cost in that workload
- path/range ref transforms are not the main cost either
- the main cost is redoing shared ancestor work over and over in the per-op apply path
That turned this from a narrow setNodes observation into a broader batch-engine problem.
Requirements
Any upstream solution needs to preserve a few important semantics:
editor.apply(op) remains the single public/plugin seam
- plugin authors should not need to learn a second override path just to stay correct in batches
- code that calls downstream
apply(op) and then immediately inspects editor.children should still see correct state
- previously published node references should remain immutable
Current direction
The direction implemented in PR #6039 is:
Editor.withBatch(editor, fn) as an explicit transaction boundary
Transforms.applyBatch(editor, ops) as sugar over batched execution
- a private batch draft for tree changes
- accessor-backed
editor.children, so committed state and staged state can be separated cleanly
- an optimized exact-path
set_node path inside the batch executor
- generic batched semantics for other tree operations, even where they are not yet highly optimized
This keeps one public seam (editor.apply(op)) while moving batching into the engine underneath it.
Benchmarks
Using:
yarn node -r ./config/babel/register.cjs ./packages/slate/test/perf/set-nodes-bench.js --blocks=5000 --group-size=50 --repeats=3
Current branch results:
- flat exact
set_node
Transforms.applyBatch(...): 6.03 ms
editor.apply(set_node) loop inside Editor.withoutNormalizing(...): 95.86 ms
Transforms.setNodes(...) inside Editor.withoutNormalizing(...): 92.84 ms
Transforms.setNodes(...) with full per-call normalize: 1311.22 ms
- grouped exact
set_node
Transforms.applyBatch(...): 6.88 ms
editor.apply(set_node) loop inside Editor.withoutNormalizing(...): 13.26 ms
Transforms.setNodes(...): 42.21 ms
- mixed exact
set_node batch plus one tail insert_node: 9.91 ms
- pure
insert_node batch on an empty document
Transforms.applyBatch(...): 3152.11 ms
- replay-ish
editor.apply(insert_node) loop inside Editor.withoutNormalizing(...): 3709.12 ms
So the batch engine now solves the original repeated exact-path set_node cliff and provides the right execution model for broader batching, but further op-family optimizations still need to be justified by benchmarks.
Why I’m rewriting this issue
The original issue body focused on a set_node microbenchmark and possible exact-path bulk-update helpers.
At this point the underlying conclusion is clearer:
- the root problem is the per-op apply engine on repeated tree updates
- the durable solution is a batch-aware engine, not a public one-off
setNodesBatch API
- exact-path
set_node is the first optimized family, not the entire end state
Tracking
Repeated tree updates need a batch-aware apply engine
Summary
While benchmarking Plate against Slate on huge documents, I ran into a large transform cliff in Slate for repeated exact-path node updates.
The original trigger on the Plate side was fixed separately, but the Slate benchmark showed a more general problem:
normalizeis not the main cost in that workloadThat turned this from a narrow
setNodesobservation into a broader batch-engine problem.Requirements
Any upstream solution needs to preserve a few important semantics:
editor.apply(op)remains the single public/plugin seamapply(op)and then immediately inspectseditor.childrenshould still see correct stateCurrent direction
The direction implemented in PR #6039 is:
Editor.withBatch(editor, fn)as an explicit transaction boundaryTransforms.applyBatch(editor, ops)as sugar over batched executioneditor.children, so committed state and staged state can be separated cleanlyset_nodepath inside the batch executorThis keeps one public seam (
editor.apply(op)) while moving batching into the engine underneath it.Benchmarks
Using:
Current branch results:
set_nodeTransforms.applyBatch(...):6.03 mseditor.apply(set_node)loop insideEditor.withoutNormalizing(...):95.86 msTransforms.setNodes(...)insideEditor.withoutNormalizing(...):92.84 msTransforms.setNodes(...)with full per-call normalize:1311.22 msset_nodeTransforms.applyBatch(...):6.88 mseditor.apply(set_node)loop insideEditor.withoutNormalizing(...):13.26 msTransforms.setNodes(...):42.21 msset_nodebatch plus one tailinsert_node:9.91 msinsert_nodebatch on an empty documentTransforms.applyBatch(...):3152.11 mseditor.apply(insert_node)loop insideEditor.withoutNormalizing(...):3709.12 msSo the batch engine now solves the original repeated exact-path
set_nodecliff and provides the right execution model for broader batching, but further op-family optimizations still need to be justified by benchmarks.Why I’m rewriting this issue
The original issue body focused on a
set_nodemicrobenchmark and possible exact-path bulk-update helpers.At this point the underlying conclusion is clearer:
setNodesBatchAPIset_nodeis the first optimized family, not the entire end stateTracking