Thank you for providing this great tool! I'm writing with two questions about transforming textract output into page xml.
First, when I use the digi.bib.uni-mannheim.de hosted version of the ocr-fileformat application and try to transform textract JSON to page, the resulting XML file is empty.
Here's a sample textract input and resulting page output.
No errors - just an empty file.
For context, the AWS CLI command I used to produce this input was (with Bucket, Name, and Region obfuscated here):
$ aws textract start-document-analysis --document '{"S3Object":{"Bucket":"my-bucket","Name":"my-path/T172-09-0003.tif"}}' --feature-types '["LAYOUT"]' --region my-region
Does anyone have tips for ways to ensure ocr-fileformat can process textract output? (If not, I could ask at the upstream project https://github.com/slub/textract2page/ - but I figured I'd start here.)
Second, when I try to process the output locally via Docker, I don't see the textract option. From https://digi.bib.uni-mannheim.de/ocr-fileformat/:
From version running locally in Docker:
Does anyone have suggestions for getting the textract input option to appear when running locally via Docker?
Thank you!
Thank you for providing this great tool! I'm writing with two questions about transforming textract output into page xml.
First, when I use the digi.bib.uni-mannheim.de hosted version of the ocr-fileformat application and try to transform textract JSON to page, the resulting XML file is empty.
Here's a sample textract input and resulting page output.
No errors - just an empty file.
For context, the AWS CLI command I used to produce this input was (with Bucket, Name, and Region obfuscated here):
Does anyone have tips for ways to ensure ocr-fileformat can process textract output? (If not, I could ask at the upstream project https://github.com/slub/textract2page/ - but I figured I'd start here.)
Second, when I try to process the output locally via Docker, I don't see the textract option. From https://digi.bib.uni-mannheim.de/ocr-fileformat/:
From version running locally in Docker:
Does anyone have suggestions for getting the textract input option to appear when running locally via Docker?
Thank you!