By Patrick McCurley

Invoice Processing Pipeline — How We Get to 100% Accuracy

By Patrick McCurley · Created Mar 20, 2026 · Updated 14 days ago public

Courier invoices arrive in many formats — from clean spreadsheets to dense, multi-page PDFs with merged cells and multi-row records. The pipeline tries the cheapest approach first and only escalates when accuracy is measurably insufficient.

End-to-End Flow

Extract

Spreadsheets (CSV, Excel) are parsed natively. Wide CSVs (250+ columns like UPS) have empty columns stripped automatically before being sent to the LLM.

PDFs go through Docling (IBM open-source) in 10-20 page chunks. Docling's memory accumulates per document (~100-600MB) with no way to reclaim, so the pipeline monitors RSS via a health endpoint and proactively restarts Docling when it exceeds 3GB — checked before every chunk and every page render.

Both produce identical output: a list of tables with column headers and rows.

Tier 1: Text Mapping

The LLM receives column headers, sample rows (8 for large tables, all rows for small tables ≤20 rows), and a column activity summary. It returns:

CSVs and PDFs get different prompts. Each courier has optional hints:

Courier Hint
UPS Skip rollup rows (empty Charge Description). Use "Net Amount" only.
DPD CSV: expand surcharge columns. PDF: base charges only in "Amount".
DHL Use "Total amount (incl. VAT)" — don't map component columns.
Evri No hint needed — clean format.

Self-Healing (85-115% threshold)

After applying rules, the pipeline checks: does the extracted total match the LLM-detected invoice total? If accuracy is outside 85-115%, it retries with feedback — which tables overcounted, which column it chose, and the per-table breakdown. Up to 3 attempts, each with more sample rows.

The 85-115% band is tight enough to catch real errors (the LLM mapping a wrong column at 77%) while loose enough to accept minor gaps from surcharges on summary pages (~1-5%).

Tier 2: Guided Text Extraction

Only triggers when two conditions are met:

  1. Text mapping accuracy is below 95%
  2. The extraction ratio is below 15% — fewer than 15% of input rows became line items

The extraction ratio gate prevents this tier from triggering on DPD invoices at 94% accuracy with thousands of items (format is understood, gap is from surcharges). It only activates for genuinely complex formats like DHL's multi-row shipment records (22 items from 600 rows = 3.6%).

Tier 3: Vision Fallback

Pages rendered as high-resolution PNG images on-demand (2 at a time, discarded after extraction) and read visually by AI models. This bypasses text extraction entirely — the last resort when the Docling text output is too damaged or the format too complex for even guided text extraction.

Tested Results

12 invoices from 4 couriers. UPS CSV was previously non-deterministic at 77% — fixed by tightening the self-healing threshold from 50-150% to 85-115%, allowing the pipeline to catch and correct wrong column mapping.

Complexity Accuracy Cost Speed Method
CSVs (UPS, DPD) 100% ~$0.03 11-16s Text mapping + courier hints + self-healing
Small PDFs (Evri, DHL 2pg) 100% ~$0.03-0.06 8-48s Text mapping or vision
Medium PDFs (DPD 9-49pg) 96-100% ~$0.03 36-187s Text mapping
Large PDFs (DPD 120pg) 95.4% ~$0.09 384s Text mapping + chunked extraction
Complex PDFs (DHL 32pg) 94-105% ~$0.08-0.21 347-1278s Guided text or vision

Average accuracy: 98.3%. Average cost: ~$0.05/invoice.

Summary

What matters How it's achieved
Accuracy Three self-correcting tiers. Text mapping self-heals at 85-115%. Guided text self-heals via Sonnet feedback loop. Vision reads images directly.
Cost $0.03 for most invoices. Guided text ~$0.08 only for complex formats. Vision ~$0.18 as last resort.
Speed CSVs: 11-16s. Small PDFs: under a minute. Large PDFs: 2-7 min. Complex formats: 6-21 min.
Smart escalation Extraction ratio gate prevents expensive guided text tier from triggering on flat-table invoices where the gap is from surcharges, not format complexity.
Courier agnostic Tested on DPD, DHL, Evri, UPS. Courier hints are optional — the pipeline works without them, just better with them.