By Patrick McCurley

Invoice Pipeline — Accuracy Snapshot

By Patrick McCurley · Created Mar 20, 2026 · Updated 13 days ago public

12/12 passing. 99.1% average accuracy. Zero errors. Tested across 4 couriers (DPD, DHL, Evri, UPS), CSVs and PDFs from 1 to 120 pages. 7 invoices at 100%.

Results

Invoice Courier Accuracy Items Extracted Journey Time LLM Cost Cost/Item File
UPS CSV UPS 100% 349 £15,283 Text mapping (1st attempt) 14s ~$0.03 $0.00009 UPS/Invoice_66321728...csv
DPD CSV DPD 100% 7,956 £18,221 Text mapping (1st attempt) 16s ~$0.03 $0.000004 DPD/451806.16785367.csv
Evri 1pg (Oct) Evri 100% 1 £247 Text mapping (1st attempt) 10s ~$0.03 $0.03 Evri/H1416_...BAINV00291481...pdf
Evri 1pg (Feb) Evri 100% 1 £149 Text mapping (1st attempt) 8s ~$0.03 $0.03 Evri/H1416_...BAINV00302995...pdf
DHL 2pg DHL 100% 6 £92 Text mapping → self-healed (213% → 100%) 27s ~$0.06 $0.01 DHL/dhl-invoice-1.pdf
Evri 3pg Evri 100% 28 £12,741 Text mapping (1st attempt) 15s ~$0.03 $0.001 Evri/H1416_...BAINV00302644...pdf
DHL 32pg DHL 100% 356 £8,393 Text mapping (297%) → guided text (200%) → vision (100%) 1,119s ~$0.21 $0.0006 DHL/GLAIR04128331.cleaned.pdf
DPD 36pg DPD 99.3% 1,437 £13,552 Text mapping (1st attempt) 139s ~$0.03 $0.00002 DPD/3006995.I61645007.pdf
DPD 11pg DPD 99.2% 315 £1,617 Text mapping (1st attempt) 43s ~$0.03 $0.0001 DPD/3029461.I61853081.pdf
DPD 9pg DPD 98.9% 206 £1,774 Text mapping (1st attempt) 34s ~$0.03 $0.0001 DPD/3029460.I61853080.pdf
DPD 49pg DPD 96.0% 2,066 £17,087 Text mapping (1st attempt, summary items rejected) 194s ~$0.03 $0.00002 DPD/451806.I16806375.pdf
DPD 120pg DPD 95.4% 5,452 £35,950 Text mapping (1st attempt, summary items rejected) 399s ~$0.09 $0.00002 DPD/116154.I16806945.pdf

Summary

Accuracy by Complexity

Complexity Invoices Accuracy Method
CSVs UPS, DPD 100% Text mapping with courier hints
Small PDFs (1-3 pages) Evri ×3, DHL 2pg 100% Text mapping + self-healing
Medium PDFs (9-36 pages) DPD ×4 98.9-99.3% Text mapping
Large PDFs (49-120 pages) DPD ×2 95.4-96% Text mapping + chunked extraction
Complex PDFs (multi-row) DHL 32pg 100% Vision fallback with render retry

LLM Pricing — OpenRouter (Verified)

Prices confirmed from the OpenRouter /api/v1/models endpoint on 23 Mar 2026.

All prices per 1M tokens via OpenRouter. The pipeline uses a tiered model strategy: expensive models for mapping decisions, cheaper models for bulk extraction.

Estimated Cost per Invoice

Calibrated against actual benchmark data: a 36-page DPD invoice cost $1.18 via full vision fallback (gpt-5.4-mini), confirming our per-batch token estimates.

Happy path = text mapping works first time (GPT-5.4 only). Vision = worst case, every page rendered as PNG and sent to GPT-5.4-mini. The 36-page row is highlighted — calibrated against actual benchmark ($1.10 estimated vs $1.18 measured).

Why happy path cost barely scales with pages

Text mapping sends table samples to GPT-5.4, not the full document. A 5-page invoice and a 250-page invoice send roughly the same prompt size (~8-12K tokens) because the LLM only sees representative rows. Cost scales with vision fallback because every 2-page batch gets rendered and processed individually.

Vision Fallback Risk — Per Courier

Expected Cost per Courier

Based on observed tier usage, typical invoice sizes, and the pricing model above. "Likely" assumes the most common path from test data. "Worst case" assumes vision fallback triggers.

Courier Typical Size Likely Tier Likely Cost Likely Cost/Item Worst Case Worst Cost/Item
UPS CSV, 300-600 rows Text mapping $0.03 $0.0001 $0.06 (retry) $0.0002
DPD (CSV) CSV, 2000-8000 rows Text mapping $0.03 $0.000004 $0.06 (retry) $0.000008
DPD (PDF) 10-50 pages Text mapping $0.13 $0.0001 $1.56 (50pg vision) $0.001
DPD (PDF large) 100-250 pages Text mapping $0.15 $0.00003 $7.30 (250pg vision) $0.001
Evri 1-3 pages Text mapping $0.03 $0.03 $0.31 (3pg vision) $0.01
DHL 10-50 pages Vision likely $1.56 $0.005 $2.99 (100pg vision) $0.01

Cost per Item Analysis

Evri's high per-item cost is misleading — they have very few line items per invoice (1-28), so the fixed LLM cost ($0.03) divides across few items. The absolute cost is the lowest of any courier. DHL is the real cost driver — complex multi-row layouts force vision fallback, and per-item cost is 50× higher than DPD PDF.

Pipeline Tier Frequency (from 12 test invoices)

What drives vision fallback?

The only invoice that triggered vision was the 32-page DHL invoice. DHL uses a complex multi-row table layout where Docling's OCR merges columns, making text mapping unreliable. This is a structural characteristic of DHL's invoice format, not a page-count issue — small DHL invoices (2 pages) self-heal, but large ones (32+ pages) need vision.

Key insight: Vision fallback correlates with courier format complexity, not invoice size. A 120-page DPD PDF costs $0.09 (text mapping), while a 32-page DHL PDF costs $0.21 (vision). DHL will almost always be the most expensive courier per invoice.

Monthly Cost Projection

Assuming a client processes invoices monthly:

Scenario DPD DHL Evri UPS Monthly Total
Small client (1 invoice each) $0.13 $1.56 $0.03 $0.03 $1.75
Medium client (4 DPD, 2 DHL, 2 Evri, 1 UPS) $0.52 $3.12 $0.06 $0.03 $3.73
Large client (10 DPD, 4 DHL, 4 Evri, 2 UPS) $1.30 $6.24 $0.12 $0.06 $7.72
Enterprise (20 DPD, 8 DHL, 8 Evri, 4 UPS) $2.60 $12.48 $0.24 $0.12 $15.44

DHL dominates cost at every scale due to vision fallback. If DHL vision can be eliminated (e.g. by improving Docling's multi-row handling or using a DHL-specific text extractor), enterprise costs drop from $15.44 to ~$3.00/month.