Skip to content

Implement Text Extraction in PyMuPdf Fitz Layout Mode #86

@MarcoWel

Description

@MarcoWel

Thank you for this excellent muPdf wrapper!

One feature that muPdf does not implement natively is layout-preserving plain text extraction.

This is how the PyMuPdf fitz module does it:
https://github.com/pymupdf/PyMuPDF/blob/main/fitz/__main__.py#L577

When layout preservation is a must, there is currently no other way than invoking pdftotext from the go app or - even nastier - calling the fitz python module from go.

How hard would it be to add this to go-fitz as well?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions