Challenges processing textract

Thank you for providing this great tool! I'm writing with two questions about transforming textract output into page xml.

First, when I use the digi.bib.uni-mannheim.de hosted version of the ocr-fileformat application and try to transform textract JSON to page, the resulting XML file is empty. 

Here's a [sample textract input](https://github.com/user-attachments/files/16741576/T172-09-0003.json) and [resulting page output](https://github.com/user-attachments/files/16741582/T172-09-0003.json.page.xml.zip).

No errors - just an empty file.

For context, the AWS CLI command I used to produce this input was (with Bucket, Name, and Region obfuscated here):

```
$ aws textract start-document-analysis --document '{"S3Object":{"Bucket":"my-bucket","Name":"my-path/T172-09-0003.tif"}}' --feature-types '["LAYOUT"]' --region my-region
```

Does anyone have tips for ways to ensure ocr-fileformat can process textract output? (If not, I could ask at the upstream project https://github.com/slub/textract2page/ - but I figured I'd start here.)

Second, when I try to process the output locally via Docker, I don't see the textract option. From https://digi.bib.uni-mannheim.de/ocr-fileformat/:

> <img width="974" alt="Screenshot 2024-08-25 at 14 52 05" src="https://github.com/user-attachments/assets/15e4212c-9c3b-460e-83f9-bb6b6d38df2f">

From version running locally in Docker:

> <img width="994" alt="Screenshot 2024-08-25 at 14 52 31" src="https://github.com/user-attachments/assets/f28f22e5-8ddb-42a2-8a6e-5ba2bc221491">

Does anyone have suggestions for getting the textract input option to appear when running locally via Docker?

Thank you!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Challenges processing textract #187

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Challenges processing textract #187

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions