When a large PDF (e.g., 500 pages) is provided, the document content exceeds the LLM's context window limit. As a result, the LLM cannot process the entire document at once and fails to generate a complete tree structure representing the document hierarchy.
Current Behavior
Tree structure generation works for smaller documents.
For large PDFs, context length is exceeded before the entire document is processed.
The generated tree is incomplete and does not cover the full document.
Expected Behavior
The system should be able to generate a tree structure for the entire document, regardless of document size.
Limitation
Due to context length restrictions, the current approach requires chunking the document and using vector search/retrieval, which may lose global document structure and hierarchy information.
When a large PDF (e.g., 500 pages) is provided, the document content exceeds the LLM's context window limit. As a result, the LLM cannot process the entire document at once and fails to generate a complete tree structure representing the document hierarchy.
Current Behavior
Tree structure generation works for smaller documents.
For large PDFs, context length is exceeded before the entire document is processed.
The generated tree is incomplete and does not cover the full document.
Expected Behavior
The system should be able to generate a tree structure for the entire document, regardless of document size.
Limitation
Due to context length restrictions, the current approach requires chunking the document and using vector search/retrieval, which may lose global document structure and hierarchy information.