Invoice Processing Pipeline — How We Get to 100% Accuracy
Courier invoices arrive in many formats — from clean spreadsheets to dense, multi-page PDFs with merged cells and multi-row records. The pipeline tries the cheapest approach first and only escalates when accuracy is measurably insufficient.
End-to-End Flow
Extract
Spreadsheets (CSV, Excel) are parsed natively. Wide CSVs (250+ columns like UPS) have empty columns stripped automatically before being sent to the LLM.
PDFs go through Docling (IBM open-source) in 10-20 page chunks. Docling's memory accumulates per document (~100-600MB) with no way to reclaim, so the pipeline monitors RSS via a health endpoint and proactively restarts Docling when it exceeds 3GB — checked before every chunk and every page render.
Both produce identical output: a list of tables with column headers and rows.
Tier 1: Text Mapping
The LLM receives column headers, sample rows (8 for large tables, all rows for small tables ≤20 rows), and a column activity summary. It returns:
- Column mappings — which column maps to which schema field
- Classification rules — how to categorise services from text patterns
- Skip rules — which rows to ignore (totals only — never empty cells, which would break multi-row formats)
- Summary items — charges extracted directly from small summary tables
- Invoice total — the pre-VAT total found in summary tables (or from repeated-value columns in CSVs)
CSVs and PDFs get different prompts. Each courier has optional hints:
| Courier | Hint |
|---|---|
| UPS | Skip rollup rows (empty Charge Description). Use "Net Amount" only. |
| DPD | CSV: expand surcharge columns. PDF: base charges only in "Amount". |
| DHL | Use "Total amount (incl. VAT)" — don't map component columns. |
| Evri | No hint needed — clean format. |
Self-Healing (85-115% threshold)
After applying rules, the pipeline checks: does the extracted total match the LLM-detected invoice total? If accuracy is outside 85-115%, it retries with feedback — which tables overcounted, which column it chose, and the per-table breakdown. Up to 3 attempts, each with more sample rows.
The 85-115% band is tight enough to catch real errors (the LLM mapping a wrong column at 77%) while loose enough to accept minor gaps from surcharges on summary pages (~1-5%).
Tier 2: Guided Text Extraction
Only triggers when two conditions are met:
- Text mapping accuracy is below 95%
- The extraction ratio is below 15% — fewer than 15% of input rows became line items
The extraction ratio gate prevents this tier from triggering on DPD invoices at 94% accuracy with thousands of items (format is understood, gap is from surcharges). It only activates for genuinely complex formats like DHL's multi-row shipment records (22 items from 600 rows = 3.6%).
Tier 3: Vision Fallback
Pages rendered as high-resolution PNG images on-demand (2 at a time, discarded after extraction) and read visually by AI models. This bypasses text extraction entirely — the last resort when the Docling text output is too damaged or the format too complex for even guided text extraction.
Tested Results
12 invoices from 4 couriers. UPS CSV was previously non-deterministic at 77% — fixed by tightening the self-healing threshold from 50-150% to 85-115%, allowing the pipeline to catch and correct wrong column mapping.
| Complexity | Accuracy | Cost | Speed | Method |
|---|---|---|---|---|
| CSVs (UPS, DPD) | 100% | ~$0.03 | 11-16s | Text mapping + courier hints + self-healing |
| Small PDFs (Evri, DHL 2pg) | 100% | ~$0.03-0.06 | 8-48s | Text mapping or vision |
| Medium PDFs (DPD 9-49pg) | 96-100% | ~$0.03 | 36-187s | Text mapping |
| Large PDFs (DPD 120pg) | 95.4% | ~$0.09 | 384s | Text mapping + chunked extraction |
| Complex PDFs (DHL 32pg) | 94-105% | ~$0.08-0.21 | 347-1278s | Guided text or vision |
Average accuracy: 98.3%. Average cost: ~$0.05/invoice.
Summary
| What matters | How it's achieved |
|---|---|
| Accuracy | Three self-correcting tiers. Text mapping self-heals at 85-115%. Guided text self-heals via Sonnet feedback loop. Vision reads images directly. |
| Cost | $0.03 for most invoices. Guided text ~$0.08 only for complex formats. Vision ~$0.18 as last resort. |
| Speed | CSVs: 11-16s. Small PDFs: under a minute. Large PDFs: 2-7 min. Complex formats: 6-21 min. |
| Smart escalation | Extraction ratio gate prevents expensive guided text tier from triggering on flat-table invoices where the gap is from surcharges, not format complexity. |
| Courier agnostic | Tested on DPD, DHL, Evri, UPS. Courier hints are optional — the pipeline works without them, just better with them. |