From OCR-D/ocrd_fileformat#46
@kba:
It would be very useful to have a transformation that extracts any tables from PAGE-XML to CSV.
@bertsky:
Thoughts:
- each TableRegion needs its own CSV, so it's not immediately clear how this fits with the page→page converter paradigm
(e.g. for page→text, one could simply paste the CSV in the middle of the plaintext, but maybe creating a multitude of output files is usually better)
- CSV may already be too coarse (no multi-span, no header distinction)
- perhaps better transfer to ocr-fileformat subrepo?
From OCR-D/ocrd_fileformat#46
@kba:
@bertsky: