Skip to content

bug(dependency): Missing 'typer' module when using unstructured for document parsing #255

@tanbro

Description

@tanbro

Dependency Issue: Missing 'typer' module when using unstructured for word document parsing

Bug Description

When attempting to parse Word documents using the default unstructured parser, the parsing fails with a ModuleNotFoundError: No module named 'typer'.
Stack Trace:

ed at step 'parse_document': Parsing failed with unstructured: Unstructured dependencies not available: No module named 'typer'
Traceback (most recent call last):
  File "/root/workspaces/xagent/src/xagent/providers/pdf_parser/basic.py", line 138, in extract_text_with_unstructured
    from unstructured.partition.docx import partition_docx
  File "/root/workspaces/xagent/.venv/lib/python3.12/site-packages/unstructured/partition/docx.py", line 48, in <module>
    from unstructured.partition.text_type import (
  File "/root/workspaces/xagent/.venv/lib/python3.12/site-packages/unstructured/partition/text_type.py", line 20, in <module>
    from unstructured.nlp.tokenize import pos_tag, sent_tokenize, word_tokenize
  File "/root/workspaces/xagent/.venv/lib/python3.12/site-packages/unstructured/nlp/tokenize.py", line 16, in <module>
    import spacy
  File "/root/workspaces/xagent/.venv/lib/python3.12/site-packages/spacy/__init__.py", line 18, in <module>
    from .cli.info import info  # noqa: F401
  File "/root/workspaces/xagent/.venv/lib/python3.12/site-packages/spacy/cli/__init__.py", line 4, in <module>
    from . import download as download_module  # noqa: F401
  File "/root/workspaces/xagent/.venv/lib/python3.12/site-packages/spacy/cli/download.py", line 8, in <module>
    import typer
ModuleNotFoundError: No module named 'typer'

Root Cause Analysis

The dependency chain is:

  • xagent uses unstructured for document parsing
  • unstructured imports unstructured.nlp.tokenize which imports spacy
  • spacy imports spacy.cli.download which depends on typer for command-line interface

The issue is:

typer is an optional dependency for spacy and is not installed by default
However, unstructured does not declare typer as a required dependency even though it imports modules that indirectly require it

When the parser tries to load the module hierarchy, it fails because typer is missing Expected Behavior
unstructured should either:

Declare typer as a required dependency in its pyproject.toml/setup.py, or Avoid importing modules that require optional dependencies from spacy at the top level, or Handle the import gracefully with a meaningful error message asking users to install additional dependencies.

Current Workaround:

Choose deepdoc parser in GUI when upload word file(s), do not choose default(unstructured)


Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions