Skip to content

Add formula token cache to reduce repeated parsing overhead#4829

Merged
oleibman merged 9 commits intoPHPOffice:masterfrom
kemo:perf/formula-ast-parsing
Apr 21, 2026
Merged

Add formula token cache to reduce repeated parsing overhead#4829
oleibman merged 9 commits intoPHPOffice:masterfrom
kemo:perf/formula-ast-parsing

Conversation

@kemo
Copy link
Copy Markdown
Contributor

@kemo kemo commented Mar 10, 2026

Summary

  • Adds a static bounded formula token cache (max 1000 entries) to the Calculation engine
  • Before parsing a formula, parseFormula() checks the cache — on hit, returns cached tokens immediately, skipping all regex operations
  • When the cache reaches capacity, it is fully cleared (simple eviction)
  • Provides clearFormulaTokenCache() and getFormulaTokenCacheSize() static methods for testing and memory management

Changes

  • src/PhpSpreadsheet/Calculation/Calculation.php: Static $formulaTokenCache array, cache lookup in parseFormula(), clear/size methods
  • tests/PhpSpreadsheetTests/Calculation/FormulaTokenCacheTest.php: 8 tests covering correctness, cache reuse, clearing, complex formulas, eviction
  • tests/PhpSpreadsheetTests/Benchmark/FormulaTokenCacheBenchmark.php: Benchmark comparing cache hit vs miss for 1K formula cells and 10K parseFormula calls

Test plan

  • Verify formula parsing produces identical results with caching
  • Verify identical formulas reuse cached tokens
  • Verify the cache can be cleared and rebuilt
  • Verify cache eviction at 1000 entries
  • Run benchmark: vendor/bin/phpunit --group benchmark --filter FormulaTokenCacheBenchmark --stderr

@kemo
Copy link
Copy Markdown
Contributor Author

kemo commented Mar 11, 2026

Benchmark Results (formula-ast-parsing)

Targeted benchmark: parse repeated formula tokens (500 distinct patterns).

Benchmark Master Branch Speedup
10k formulas cold 263ms 17ms 15x
10k formulas warm 261ms 1.4ms 186x
50k formulas cold 1,366ms 21ms 65x
50k formulas warm 1,345ms 7ms 192x
100k formulas cold 2,625ms 28ms 94x
100k formulas warm 2,726ms 14ms 195x
Calculate 1k cells 168ms 165ms ~same
Calculate 5k cells 795ms 805ms ~same

Enormous parsing speedup (15–195x). Calculation time unchanged — the win is entirely in token parsing, not evaluation. Memory neutral.

@oleibman
Copy link
Copy Markdown
Collaborator

I strive to avoid the introduction of new static properties. Is it possible to make $formulaTokenCacheMaxSize and $formulaTokenCache instance variables rather than static? And add a routine to set the value of the former? And, arguably, make it default to zero, making its use opt-in ( the results are impressive but I'm nervous)?

@kemo
Copy link
Copy Markdown
Contributor Author

kemo commented Mar 24, 2026

Makes sense — I'll refactor both to instance properties on the FormulaParser and default the cache max size to 0 so it's opt-in. Users who want the parsing speedup can explicitly enable it.

@oleibman oleibman enabled auto-merge April 21, 2026 04:08
@oleibman oleibman added this pull request to the merge queue Apr 21, 2026
Merged via the queue into PHPOffice:master with commit d6a4e39 Apr 21, 2026
14 checks passed
@oleibman
Copy link
Copy Markdown
Collaborator

Thank you for your contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants