Bug
DoclingDocument's add_node_items doesn't update caption reference.
Steps to reproduce
convert a document:
source = "https://arxiv.org/pdf/2408.09869" # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
create a new one, only from the first table:
new_document = DoclingDocument(name="Table0")
new_document.add_node_items(doc=document,
parent=new_document.body,
node_items=[document.tables[0]])
The caption's text reference remains '#/texts/72' instead of '#/texts/0'
"tables": [
{
"self_ref": "#/tables/0",
"parent": {
"cref": "#/body"
},
"children": [
{
"cref": "#/texts/0"
}
],
"content_layer": "body",
"label": "table",
"prov": [
{
"page_no": 5,
"bbox": {
"l": 133.27708435058594,
"t": 634.9401245117188,
"r": 478.2610168457031,
"b": 542.4001007080078,
"coord_origin": "BOTTOMLEFT"
},
"charspan": [
0,
0
]
}
],
"captions": [
{
"cref": "#/texts/72"
}
],
Docling version
2.53.0
Python version
3.12
Bug
DoclingDocument's add_node_items doesn't update caption reference.
Steps to reproduce
convert a document:
create a new one, only from the first table:
The caption's text reference remains '#/texts/72' instead of '#/texts/0'
Docling version
2.53.0
Python version
3.12