Dependency Issue: Missing 'typer' module when using unstructured for word document parsing
Bug Description
When attempting to parse Word documents using the default unstructured parser, the parsing fails with a ModuleNotFoundError: No module named 'typer'.
Stack Trace:
ed at step 'parse_document': Parsing failed with unstructured: Unstructured dependencies not available: No module named 'typer'
Traceback (most recent call last):
File "/root/workspaces/xagent/src/xagent/providers/pdf_parser/basic.py", line 138, in extract_text_with_unstructured
from unstructured.partition.docx import partition_docx
File "/root/workspaces/xagent/.venv/lib/python3.12/site-packages/unstructured/partition/docx.py", line 48, in <module>
from unstructured.partition.text_type import (
File "/root/workspaces/xagent/.venv/lib/python3.12/site-packages/unstructured/partition/text_type.py", line 20, in <module>
from unstructured.nlp.tokenize import pos_tag, sent_tokenize, word_tokenize
File "/root/workspaces/xagent/.venv/lib/python3.12/site-packages/unstructured/nlp/tokenize.py", line 16, in <module>
import spacy
File "/root/workspaces/xagent/.venv/lib/python3.12/site-packages/spacy/__init__.py", line 18, in <module>
from .cli.info import info # noqa: F401
File "/root/workspaces/xagent/.venv/lib/python3.12/site-packages/spacy/cli/__init__.py", line 4, in <module>
from . import download as download_module # noqa: F401
File "/root/workspaces/xagent/.venv/lib/python3.12/site-packages/spacy/cli/download.py", line 8, in <module>
import typer
ModuleNotFoundError: No module named 'typer'
Root Cause Analysis
The dependency chain is:
- xagent uses unstructured for document parsing
- unstructured imports unstructured.nlp.tokenize which imports spacy
- spacy imports spacy.cli.download which depends on typer for command-line interface
The issue is:
typer is an optional dependency for spacy and is not installed by default
However, unstructured does not declare typer as a required dependency even though it imports modules that indirectly require it
When the parser tries to load the module hierarchy, it fails because typer is missing Expected Behavior
unstructured should either:
Declare typer as a required dependency in its pyproject.toml/setup.py, or Avoid importing modules that require optional dependencies from spacy at the top level, or Handle the import gracefully with a meaningful error message asking users to install additional dependencies.
Current Workaround:
Choose deepdoc parser in GUI when upload word file(s), do not choose default(unstructured)
Dependency Issue: Missing 'typer' module when using unstructured for word document parsing
Bug Description
When attempting to parse Word documents using the default unstructured parser, the parsing fails with a ModuleNotFoundError: No module named 'typer'.
Stack Trace:
Root Cause Analysis
The dependency chain is:
The issue is:
typer is an optional dependency for spacy and is not installed by default
However, unstructured does not declare typer as a required dependency even though it imports modules that indirectly require it
When the parser tries to load the module hierarchy, it fails because typer is missing Expected Behavior
unstructured should either:
Declare typer as a required dependency in its pyproject.toml/setup.py, or Avoid importing modules that require optional dependencies from spacy at the top level, or Handle the import gracefully with a meaningful error message asking users to install additional dependencies.
Current Workaround:
Choose
deepdocparser in GUI when upload word file(s), do not choose default(unstructured)