Skip to content

Upgrade AI extraction with confidence signals and optional Python ML post-enrichment#84

Open
0xxy0 wants to merge 11 commits intovas3k:mainfrom
0xxy0:main
Open

Upgrade AI extraction with confidence signals and optional Python ML post-enrichment#84
0xxy0 wants to merge 11 commits intovas3k:mainfrom
0xxy0:main

Conversation

@0xxy0
Copy link
Copy Markdown

@0xxy0 0xxy0 commented Mar 31, 2026

Summary

This PR introduces a low-disruption AI/ML upgrade to TaxHacker’s invoice/receipt analysis flow by adding a lightweight post-processing stage and an optional Python-based enrichment hook.

The goal is to improve extraction quality and trustworthiness without changing the primary user workflow or adding significant runtime overhead.


What Changed

1. AI Enrichment Module

  • Added ai/enrichment.ts
  • Performs post-LLM normalization and quality checks:
    • Normalizes money values (total, convertedTotal, item totals)
    • Normalizes currency codes and flags unusual values
    • Normalizes issuedAt date format and warns on invalid/future dates
  • Produces:
    • confidence score (0..1)
    • warnings list
    • usedPythonEnricher flag

2. Optional Python ML Sidecar Enrichment (Safe Fallback)

  • Added optional env-configured hook:
    • TAXHACKER_PYTHON_ENRICHER_CMD
    • TAXHACKER_PYTHON_ENRICHER_ARGS (JSON array)
    • TAXHACKER_PYTHON_ENRICHER_TIMEOUT_MS

Execution behavior:

  • Uses spawn with shell: false for safer execution
  • Bounded timeout and buffered I/O
  • Graceful fallback:
    • On missing config, timeout, error, or invalid JSON
    • System continues with standard extraction (no user-facing failure)

3. Analysis Contract Extension

  • Updated AnalysisResult in ai/analyze.ts to include:
    • confidence
    • warnings
    • usedPythonEnricher
  • Existing output and tokensUsed behavior preserved
  • Cached parse results now store enriched output

4. UX Improvements (Unsorted Analyze Flow)

  • Updated components/unsorted/analyze-form.tsx to display:
    • AI confidence badge
    • Warning messages from enrichment
    • Indicator when Python enrichment is active
  • Added components/forms/warning.tsx:
    • Dedicated warning UI component

5. Documentation & Config Updates

  • Updated .env.example with optional Python enrichment variables
  • Updated README.md with:
    • Setup instructions
    • Payload contract for Python post-enrichment

Files Changed

  • ai/enrichment.ts (new)
  • ai/analyze.ts
  • components/unsorted/analyze-form.tsx
  • components/forms/warning.tsx (new)
  • .env.example
  • README.md

Why This Is a Good Fit

  • Complements the existing LLM pipeline (no replacement)
  • Adds meaningful trust and quality signals
  • Keeps runtime overhead controlled and bounded
  • Python integration is optional and non-breaking
  • No schema migration required

Validation

  • npm run build ✅ Successful
  • npm run lint ⚠️ Fails due to pre-existing repo-wide issues (unrelated to this PR)
  • Automated code review feedback addressed iteratively
  • codeql_checker ✅ No alerts found

Risk / Compatibility

  • Fully backward compatible
  • If Python enrichment is not configured:
    • Behavior remains unchanged aside from improved normalization and metadata
  • No breaking API changes for current UI flows

Operational Notes

To enable Python enrichment in self-hosted environments, configure:

  • TAXHACKER_PYTHON_ENRICHER_CMD
  • (Optional) TAXHACKER_PYTHON_ENRICHER_ARGS
  • (Optional) TAXHACKER_PYTHON_ENRICHER_TIMEOUT_MS

If unset, the enrichment sidecar is automatically skipped.

Copilot AI and others added 11 commits March 31, 2026 14:48
Agent-Logs-Url: https://github.com/Subhamsinghania18/TaxHacker/sessions/835b4d78-bd7d-4c75-a250-5e7ebfa307d0

Co-authored-by: Subhamsinghania18 <147546813+Subhamsinghania18@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Subhamsinghania18/TaxHacker/sessions/835b4d78-bd7d-4c75-a250-5e7ebfa307d0

Co-authored-by: Subhamsinghania18 <147546813+Subhamsinghania18@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Subhamsinghania18/TaxHacker/sessions/835b4d78-bd7d-4c75-a250-5e7ebfa307d0

Co-authored-by: Subhamsinghania18 <147546813+Subhamsinghania18@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Subhamsinghania18/TaxHacker/sessions/835b4d78-bd7d-4c75-a250-5e7ebfa307d0

Co-authored-by: Subhamsinghania18 <147546813+Subhamsinghania18@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Subhamsinghania18/TaxHacker/sessions/835b4d78-bd7d-4c75-a250-5e7ebfa307d0

Co-authored-by: Subhamsinghania18 <147546813+Subhamsinghania18@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Subhamsinghania18/TaxHacker/sessions/835b4d78-bd7d-4c75-a250-5e7ebfa307d0

Co-authored-by: Subhamsinghania18 <147546813+Subhamsinghania18@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Subhamsinghania18/TaxHacker/sessions/835b4d78-bd7d-4c75-a250-5e7ebfa307d0

Co-authored-by: Subhamsinghania18 <147546813+Subhamsinghania18@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Subhamsinghania18/TaxHacker/sessions/835b4d78-bd7d-4c75-a250-5e7ebfa307d0

Co-authored-by: Subhamsinghania18 <147546813+Subhamsinghania18@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Subhamsinghania18/TaxHacker/sessions/835b4d78-bd7d-4c75-a250-5e7ebfa307d0

Co-authored-by: Subhamsinghania18 <147546813+Subhamsinghania18@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Subhamsinghania18/TaxHacker/sessions/835b4d78-bd7d-4c75-a250-5e7ebfa307d0

Co-authored-by: Subhamsinghania18 <147546813+Subhamsinghania18@users.noreply.github.com>
Upgrade AI extraction with confidence signals and optional Python ML post-enrichment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants