Skip to content

Add_node_items doesn't update caption reference #2298

@bnemeth

Description

@bnemeth

Bug

DoclingDocument's add_node_items doesn't update caption reference.

Steps to reproduce

convert a document:

source = "https://arxiv.org/pdf/2408.09869"  # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)

create a new one, only from the first table:

new_document = DoclingDocument(name="Table0")

new_document.add_node_items(doc=document, 
                            parent=new_document.body, 
                            node_items=[document.tables[0]])

The caption's text reference remains '#/texts/72' instead of '#/texts/0'

  "tables": [
    {
      "self_ref": "#/tables/0",
      "parent": {
        "cref": "#/body"
      },
      "children": [
        {
          "cref": "#/texts/0"
        }
      ],
      "content_layer": "body",
      "label": "table",
      "prov": [
        {
          "page_no": 5,
          "bbox": {
            "l": 133.27708435058594,
            "t": 634.9401245117188,
            "r": 478.2610168457031,
            "b": 542.4001007080078,
            "coord_origin": "BOTTOMLEFT"
          },
          "charspan": [
            0,
            0
          ]
        }
      ],
      "captions": [
        {
          "cref": "#/texts/72"
        }
      ],

Docling version

2.53.0

Python version

3.12

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions