Skip to content

[Bug]: refresh_ref_docs() / arefresh_ref_docs() drop kwargs after the first document in a batch #21518

@gautamvarmadatla

Description

@gautamvarmadatla

Bug Description

insert_kwargs and update_kwargs passed to refresh_ref_docs() are silently dropped after the first matching document because the method calls .pop() on the shared update_kwargs dict inside the document loop. So, the first inserted or updated document receives the expected kwargs, but subsequent documents in the same batch receive {} without any error.

Version

0.14.21

Steps to Reproduce

from typing import Any, List

from llama_index.core import Document, VectorStoreIndex
from llama_index.core.schema import BaseNode, TransformComponent


class RecordKwargs(TransformComponent):
    def __call__(self, nodes: List[BaseNode], **kwargs: Any) -> List[BaseNode]:
        print(f"transform received: {kwargs}")
        return nodes


docs = [Document(text=f"doc {i}") for i in range(3)]
index = VectorStoreIndex([], transformations=[RecordKwargs()])

print("refresh_ref_docs with insert_kwargs={'my_flag': True}:")
index.refresh_ref_docs(docs, insert_kwargs={"my_flag": True})

Relevant Logs/Tracebacks

refresh_ref_docs with insert_kwargs={'my_flag': True}:
transform received: {'my_flag': True}
transform received: {}
transform received: {}
[True, True, True]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageIssue needs to be triaged/prioritized

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions