Add optional XMLReader streaming mode for XLSX cell data parsing#4830
Add optional XMLReader streaming mode for XLSX cell data parsing#4830kemo wants to merge 12 commits intoPHPOffice:masterfrom
Conversation
Benchmark Results (xlsx-xmlreader-streaming)Targeted benchmark: read XLSX files of various sizes, including generated large files.
No improvement on these test files. Slight regression on the largest generated file (+5%). Memory usage unchanged. The XMLReader streaming approach adds overhead for files that fit comfortably in memory — the benefit would need to be demonstrated on files large enough to cause memory pressure with the DOM-based parser. |
|
Thank you for the summaries. For purposes of prioritization, I am going to put the PRs with less demonstrable improvement, including this one, in draft status for now. This does not mean that I will not return to them. |
Thank you for going through them, we're pretty heavy users of the library so it is great to give back. |
|
I have restructured your change. What you are trying to do is replace a large block of code with an even larger block of code. Nothing wrong with that per se, but now we have two large blocks of code. It seems like you would be better served if I made it easier to extend the Xlsx Reader class by putting the original block in a protected method, allowing your change to just extend the class with your new methods. Xlsx Reader is desparately in need of refactoring; I might be interested in advancing this change if only because it finally got me to do a small bit of refactoring. I admittedly have made the interface for the new method hideous. I will see if I can do better, but didn't want to delay getting this change out there. The results do not look promising. I am consistently seeing Simple perform in about 80% of the time of Streaming, and neither seems to report a memory delta greater than 0. For that reason, I have taken the new class that I made from your code out of the |
Summary
$useStreamingReaderflag (defaultfalse) to the XLSX readerXMLReaderinstead ofSimpleXMLfor the main cell data parsing loop, processing rows one at a time instead of loading the entire sheet XML into memoryChanges
src/PhpSpreadsheet/Reader/Xlsx.php: AddedsetUseStreamingReader()/getUseStreamingReader(),loadSheetDataWithXmlReader()method with helperstests/PhpSpreadsheetTests/Reader/Xlsx/StreamingReadTest.php: 10 tests (78 assertions) covering all data types, cell-by-cell comparison with SimpleXML, styles, inline stringstests/PhpSpreadsheetTests/Benchmark/XlsxStreamingReaderBenchmark.php: Benchmark with 2000x15 mixed-type XLSX comparing SimpleXML vs XMLReader timing and memoryTest plan
falsevendor/bin/phpunit --group benchmark --filter XlsxStreamingReaderBenchmark --stderr