Skip to content

tabulate numparse silently rounds numeric strings in table cells #583

@lulmer

Description

@lulmer

Bug

When exporting tables to markdown, tabulate is called without disable_numparse=True on the first attempt (markdown.py#L553, document.py#L2268). The disable_numparse=True fallback is only triggered when a ValueError is raised.

By default, tabulate auto-detects numeric strings, parses them as Python float, and reformats them — silently losing precision. This is particularly problematic for financial documents where exact decimal values matter.

Root cause: In docling_core/transforms/serializer/markdown.py (MarkdownTableSerializer.serialize), line 553:

try:
    table_text = tabulate(rows[1:], headers=rows[0], tablefmt="github")
except ValueError:
    table_text = tabulate(
        rows[1:],
        headers=rows[0],
        tablefmt="github",
        disable_numparse=True,
    )

The same pattern exists in the legacy path at docling_core/types/doc/document.py line 2268.

The problem is that tabulate successfully parses the numeric strings (no ValueError), but reformats them with reduced precision. The fix is straightforward: always pass disable_numparse=True to tabulate in both call sites, as a document converter should never silently alter source data. Optionally, expose a disable_numparse parameter on MarkdownParams (default True) for users who explicitly want tabulate's numeric alignment.

Steps to reproduce

  1. Create or use a DOCX file containing a table with precise numeric values (e.g. 225.8183, 20896.7184)
  2. Convert the document using docling:
from docling.document_converter import DocumentConverter

converter = DocumentConverter()
result = converter.convert("document.docx")
md = result.document.export_to_markdown()
print(md)
  1. Observe that numeric values in tables are silently rounded:
Original cell text Markdown output Precision lost
225.8183 225.818 last digit dropped
24797.34 24797.3 last digit dropped
20896.7184 20896.7 3 digits dropped
17358.138 17358.1 2 digits dropped

All values are truncated to approximately 6 significant figures, which is tabulate's default float formatting behavior.

Docling version

docling 2.81.0, docling-core 2.70.2

Python version

Python 3.14

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions