Invoice Pipeline — Accuracy Snapshot
12/12 passing. 99.1% average accuracy. Zero errors. Tested across 4 couriers (DPD, DHL, Evri, UPS), CSVs and PDFs from 1 to 120 pages. 7 invoices at 100%.
Results
| Invoice | Courier | Accuracy | Items | Extracted | Journey | Time | LLM Cost | Cost/Item | File |
|---|---|---|---|---|---|---|---|---|---|
| UPS CSV | UPS | 100% | 349 | £15,283 | Text mapping (1st attempt) | 14s | ~$0.03 | $0.00009 | UPS/Invoice_66321728...csv |
| DPD CSV | DPD | 100% | 7,956 | £18,221 | Text mapping (1st attempt) | 16s | ~$0.03 | $0.000004 | DPD/451806.16785367.csv |
| Evri 1pg (Oct) | Evri | 100% | 1 | £247 | Text mapping (1st attempt) | 10s | ~$0.03 | $0.03 | Evri/H1416_...BAINV00291481...pdf |
| Evri 1pg (Feb) | Evri | 100% | 1 | £149 | Text mapping (1st attempt) | 8s | ~$0.03 | $0.03 | Evri/H1416_...BAINV00302995...pdf |
| DHL 2pg | DHL | 100% | 6 | £92 | Text mapping → self-healed (213% → 100%) | 27s | ~$0.06 | $0.01 | DHL/dhl-invoice-1.pdf |
| Evri 3pg | Evri | 100% | 28 | £12,741 | Text mapping (1st attempt) | 15s | ~$0.03 | $0.001 | Evri/H1416_...BAINV00302644...pdf |
| DHL 32pg | DHL | 100% | 356 | £8,393 | Text mapping (297%) → guided text (200%) → vision (100%) | 1,119s | ~$0.21 | $0.0006 | DHL/GLAIR04128331.cleaned.pdf |
| DPD 36pg | DPD | 99.3% | 1,437 | £13,552 | Text mapping (1st attempt) | 139s | ~$0.03 | $0.00002 | DPD/3006995.I61645007.pdf |
| DPD 11pg | DPD | 99.2% | 315 | £1,617 | Text mapping (1st attempt) | 43s | ~$0.03 | $0.0001 | DPD/3029461.I61853081.pdf |
| DPD 9pg | DPD | 98.9% | 206 | £1,774 | Text mapping (1st attempt) | 34s | ~$0.03 | $0.0001 | DPD/3029460.I61853080.pdf |
| DPD 49pg | DPD | 96.0% | 2,066 | £17,087 | Text mapping (1st attempt, summary items rejected) | 194s | ~$0.03 | $0.00002 | DPD/451806.I16806375.pdf |
| DPD 120pg | DPD | 95.4% | 5,452 | £35,950 | Text mapping (1st attempt, summary items rejected) | 399s | ~$0.09 | $0.00002 | DPD/116154.I16806945.pdf |
Summary
Accuracy by Complexity
| Complexity | Invoices | Accuracy | Method |
|---|---|---|---|
| CSVs | UPS, DPD | 100% | Text mapping with courier hints |
| Small PDFs (1-3 pages) | Evri ×3, DHL 2pg | 100% | Text mapping + self-healing |
| Medium PDFs (9-36 pages) | DPD ×4 | 98.9-99.3% | Text mapping |
| Large PDFs (49-120 pages) | DPD ×2 | 95.4-96% | Text mapping + chunked extraction |
| Complex PDFs (multi-row) | DHL 32pg | 100% | Vision fallback with render retry |
LLM Pricing — OpenRouter (Verified)
Prices confirmed from the OpenRouter /api/v1/models endpoint on 23 Mar 2026.
All prices per 1M tokens via OpenRouter. The pipeline uses a tiered model strategy: expensive models for mapping decisions, cheaper models for bulk extraction.
Estimated Cost per Invoice
Calibrated against actual benchmark data: a 36-page DPD invoice cost $1.18 via full vision fallback (gpt-5.4-mini), confirming our per-batch token estimates.
Happy path = text mapping works first time (GPT-5.4 only). Vision = worst case, every page rendered as PNG and sent to GPT-5.4-mini. The 36-page row is highlighted — calibrated against actual benchmark ($1.10 estimated vs $1.18 measured).
Why happy path cost barely scales with pages
Text mapping sends table samples to GPT-5.4, not the full document. A 5-page invoice and a 250-page invoice send roughly the same prompt size (~8-12K tokens) because the LLM only sees representative rows. Cost scales with vision fallback because every 2-page batch gets rendered and processed individually.
Vision Fallback Risk — Per Courier
Expected Cost per Courier
Based on observed tier usage, typical invoice sizes, and the pricing model above. "Likely" assumes the most common path from test data. "Worst case" assumes vision fallback triggers.
| Courier | Typical Size | Likely Tier | Likely Cost | Likely Cost/Item | Worst Case | Worst Cost/Item |
|---|---|---|---|---|---|---|
| UPS | CSV, 300-600 rows | Text mapping | $0.03 | $0.0001 | $0.06 (retry) | $0.0002 |
| DPD (CSV) | CSV, 2000-8000 rows | Text mapping | $0.03 | $0.000004 | $0.06 (retry) | $0.000008 |
| DPD (PDF) | 10-50 pages | Text mapping | $0.13 | $0.0001 | $1.56 (50pg vision) | $0.001 |
| DPD (PDF large) | 100-250 pages | Text mapping | $0.15 | $0.00003 | $7.30 (250pg vision) | $0.001 |
| Evri | 1-3 pages | Text mapping | $0.03 | $0.03 | $0.31 (3pg vision) | $0.01 |
| DHL | 10-50 pages | Vision likely | $1.56 | $0.005 | $2.99 (100pg vision) | $0.01 |
Cost per Item Analysis
Evri's high per-item cost is misleading — they have very few line items per invoice (1-28), so the fixed LLM cost ($0.03) divides across few items. The absolute cost is the lowest of any courier. DHL is the real cost driver — complex multi-row layouts force vision fallback, and per-item cost is 50× higher than DPD PDF.
Pipeline Tier Frequency (from 12 test invoices)
- 10/12 (83%) resolved via text mapping on the first attempt — cost: $0.03-0.13
- 1/12 (8%) self-healed via retry (DHL 2-page, 213% → 100%) — cost: $0.06
- 1/12 (8%) needed full vision fallback (DHL 32-page multi-row) — cost: $0.21
- 0/12 needed guided text extraction
What drives vision fallback?
The only invoice that triggered vision was the 32-page DHL invoice. DHL uses a complex multi-row table layout where Docling's OCR merges columns, making text mapping unreliable. This is a structural characteristic of DHL's invoice format, not a page-count issue — small DHL invoices (2 pages) self-heal, but large ones (32+ pages) need vision.
Key insight: Vision fallback correlates with courier format complexity, not invoice size. A 120-page DPD PDF costs $0.09 (text mapping), while a 32-page DHL PDF costs $0.21 (vision). DHL will almost always be the most expensive courier per invoice.
Monthly Cost Projection
Assuming a client processes invoices monthly:
| Scenario | DPD | DHL | Evri | UPS | Monthly Total |
|---|---|---|---|---|---|
| Small client (1 invoice each) | $0.13 | $1.56 | $0.03 | $0.03 | $1.75 |
| Medium client (4 DPD, 2 DHL, 2 Evri, 1 UPS) | $0.52 | $3.12 | $0.06 | $0.03 | $3.73 |
| Large client (10 DPD, 4 DHL, 4 Evri, 2 UPS) | $1.30 | $6.24 | $0.12 | $0.06 | $7.72 |
| Enterprise (20 DPD, 8 DHL, 8 Evri, 4 UPS) | $2.60 | $12.48 | $0.24 | $0.12 | $15.44 |
DHL dominates cost at every scale due to vision fallback. If DHL vision can be eliminated (e.g. by improving Docling's multi-row handling or using a DHL-specific text extractor), enterprise costs drop from $15.44 to ~$3.00/month.