Bug
When exporting tables to markdown, tabulate is called without disable_numparse=True on the first attempt (markdown.py#L553, document.py#L2268). The disable_numparse=True fallback is only triggered when a ValueError is raised.
By default, tabulate auto-detects numeric strings, parses them as Python float, and reformats them — silently losing precision. This is particularly problematic for financial documents where exact decimal values matter.
Root cause: In docling_core/transforms/serializer/markdown.py (MarkdownTableSerializer.serialize), line 553:
try:
table_text = tabulate(rows[1:], headers=rows[0], tablefmt="github")
except ValueError:
table_text = tabulate(
rows[1:],
headers=rows[0],
tablefmt="github",
disable_numparse=True,
)
The same pattern exists in the legacy path at docling_core/types/doc/document.py line 2268.
The problem is that tabulate successfully parses the numeric strings (no ValueError), but reformats them with reduced precision. The fix is straightforward: always pass disable_numparse=True to tabulate in both call sites, as a document converter should never silently alter source data. Optionally, expose a disable_numparse parameter on MarkdownParams (default True) for users who explicitly want tabulate's numeric alignment.
Steps to reproduce
- Create or use a DOCX file containing a table with precise numeric values (e.g.
225.8183, 20896.7184)
- Convert the document using docling:
from docling.document_converter import DocumentConverter
converter = DocumentConverter()
result = converter.convert("document.docx")
md = result.document.export_to_markdown()
print(md)
- Observe that numeric values in tables are silently rounded:
| Original cell text |
Markdown output |
Precision lost |
225.8183 |
225.818 |
last digit dropped |
24797.34 |
24797.3 |
last digit dropped |
20896.7184 |
20896.7 |
3 digits dropped |
17358.138 |
17358.1 |
2 digits dropped |
All values are truncated to approximately 6 significant figures, which is tabulate's default float formatting behavior.
Docling version
docling 2.81.0, docling-core 2.70.2
Python version
Python 3.14
Bug
When exporting tables to markdown,
tabulateis called withoutdisable_numparse=Trueon the first attempt (markdown.py#L553, document.py#L2268). Thedisable_numparse=Truefallback is only triggered when aValueErroris raised.By default,
tabulateauto-detects numeric strings, parses them as Pythonfloat, and reformats them — silently losing precision. This is particularly problematic for financial documents where exact decimal values matter.Root cause: In
docling_core/transforms/serializer/markdown.py(MarkdownTableSerializer.serialize), line 553:The same pattern exists in the legacy path at
docling_core/types/doc/document.pyline 2268.The problem is that
tabulatesuccessfully parses the numeric strings (noValueError), but reformats them with reduced precision. The fix is straightforward: always passdisable_numparse=Truetotabulatein both call sites, as a document converter should never silently alter source data. Optionally, expose adisable_numparseparameter onMarkdownParams(defaultTrue) for users who explicitly wanttabulate's numeric alignment.Steps to reproduce
225.8183,20896.7184)225.8183225.81824797.3424797.320896.718420896.717358.13817358.1All values are truncated to approximately 6 significant figures, which is
tabulate's default float formatting behavior.Docling version
docling 2.81.0, docling-core 2.70.2
Python version
Python 3.14