Bug Description
Summary
TreeSelectLeafRetriever builds a Response with source_nodes=[] in its query path, which drops retrieval provenance and breaks source/citation visibility for consumers that rely on response.source_nodes.
Why This Is a Bug
The retriever traverses real leaf nodes, but the final response explicitly discards all sources. This creates an answer with no traceable origin, even when the underlying index has valid source nodes.
Affected Area
- Package:
llama-index-core
- File:
llama_index/core/indices/tree/select_leaf_retriever.py
- Relevant line behavior:
_query() returns Response(response_str, source_nodes=[]) with an inline TODO: fix source nodes.
CLI isn’t installed in this environment:
Steps to Reproduce
- Set up a clean env at repo root:
- Run this minimal script:
from llama_index.core import Document, TreeIndex
from llama_index.core.indices.tree.select_leaf_retriever import TreeSelectLeafRetriever
from llama_index.core.schema import QueryBundle
docs = [
Document(text="Paris is the capital of France."),
Document(text="Berlin is the capital of Germany."),
]
index = TreeIndex.from_documents(docs)
retriever = TreeSelectLeafRetriever(index=index, child_branch_factor=1)
# Direct query path in retriever implementation
resp = retriever._query(QueryBundle("What is the capital of France?"))
print("response:", str(resp))
print("source_nodes:", len(resp.source_nodes))
- Observe that
source_nodes is empty.
Relevant Logs/Tracebacks
Expected Behavior
Response.source_nodes should include the selected leaf node(s) used to synthesize the final answer.
Actual Behavior
Response.source_nodes is always empty for this query path.
User Impact
- Citation features cannot show where answers came from.
- Evaluation/debug workflows lose retriever provenance.
- Downstream integrations expecting source nodes can misbehave or display incomplete output.
Proposed Fix
- Track selected leaf nodes during traversal in
_query_level() / _query_with_selected_node().
- Return
Response(response_str, source_nodes=[...]) instead of an empty list.
- Add regression tests asserting non-empty
source_nodes for successful tree leaf selection.
Validation Checklist
Definition of Done
Bug Description
Summary
TreeSelectLeafRetrieverbuilds aResponsewithsource_nodes=[]in its query path, which drops retrieval provenance and breaks source/citation visibility for consumers that rely onresponse.source_nodes.Why This Is a Bug
The retriever traverses real leaf nodes, but the final response explicitly discards all sources. This creates an answer with no traceable origin, even when the underlying index has valid source nodes.
Affected Area
llama-index-corellama_index/core/indices/tree/select_leaf_retriever.py_query()returnsResponse(response_str, source_nodes=[])with an inlineTODO: fix source nodes.CLI isn’t installed in this environment:
Steps to Reproduce
uv syncsource_nodesis empty.Relevant Logs/Tracebacks
Expected Behavior
Response.source_nodesshould include the selected leaf node(s) used to synthesize the final answer.Actual Behavior
Response.source_nodesis always empty for this query path.User Impact
Proposed Fix
_query_level()/_query_with_selected_node().Response(response_str, source_nodes=[...])instead of an empty list.source_nodesfor successful tree leaf selection.Validation Checklist
source_nodesbefore fix.Response.source_nodes.llama-index-core/tests.uv run make lintpasses.uv run -- pytestpasses for modified package(s).Definition of Done