Thank you for this excellent muPdf wrapper!
One feature that muPdf does not implement natively is layout-preserving plain text extraction.
This is how the PyMuPdf fitz module does it:
https://github.com/pymupdf/PyMuPDF/blob/main/fitz/__main__.py#L577
When layout preservation is a must, there is currently no other way than invoking pdftotext from the go app or - even nastier - calling the fitz python module from go.
How hard would it be to add this to go-fitz as well?
Thank you for this excellent muPdf wrapper!
One feature that muPdf does not implement natively is layout-preserving plain text extraction.
layoutmode as standard:https://www.mankier.com/1/pdftotext
layoutmode by default:python -m fitz gettext input.pdfhttps://pymupdf.readthedocs.io/en/latest/module.html#text-extraction
This is how the PyMuPdf fitz module does it:
https://github.com/pymupdf/PyMuPDF/blob/main/fitz/__main__.py#L577
When layout preservation is a must, there is currently no other way than invoking pdftotext from the go app or - even nastier - calling the fitz python module from go.
How hard would it be to add this to go-fitz as well?