ETL Pipeline — Full Invoice Test Run
Date: 2026-03-17 | Invoices: 12 | Couriers: DHL, DPD, Evri, UPS | Feature: Text Fallback Tier (TECH-569)
This report covers a full test run of every invoice in the POC dataset through the ETL pipeline, including the new PDF text fallback path that bypasses Docling's table detection when accuracy is insufficient.
Pipeline Architecture
Results — All 12 Invoices
| # | Courier | File | Pgs | Tbls | Items | Extracted (GBP) | Invoice Total (GBP) | Accuracy | Path | Time (s) |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | DHL | GLAIR04128331.cleaned.pdf |
32 | 36 | 363 | 8,392.51 | 8,392.51 | 100.0% | FALLBACK | 340.7 |
| 2 | DHL | dhl-invoice-1.pdf |
2 | 4 | 10 | 184.76 | — | — | TABLE | 21.5 |
| 3 | DPD | 116154.I16806945.pdf |
120 | 125 | 5,235 | 35,183.44 | 29,579.00 | 118.9% | TABLE | 903.5 |
| 4 | DPD | 3006995.I61645007.pdf |
36 | 42 | 1,384 | 11,505.27 | — | — | CACHED | 103.0 |
| 5 | DPD | 3029460.I61853080.pdf |
9 | 11 | 199 | 1,557.94 | — | — | CACHED | 16.1 |
| 6 | DPD | 3029461.I61853081.pdf |
11 | 13 | 308 | 1,395.18 | — | — | CACHED | 23.5 |
| 7 | DPD | 451806.16785367.csv |
— | 1 | 7,956 | 18,220.76 | — | — | TABLE | 20.6 |
| 8 | DPD | 451806.I16806375.pdf |
49 | 55 | 1,995 | 16,821,638.45 | — | — | CACHED | 137.6 |
| 9 | Evri | BAINV00302644_14_02_2026p.pdf |
3 | 5 | 28 | 12,741.23 | — | — | TABLE | 13.4 |
| 10 | Evri | BAINV00302995_18_02_2026p.pdf |
1 | 2 | 1 | 148.59 | — | — | CACHED | 0.5 |
| 11 | Evri | BAINV00291481_16_10_2025p.pdf |
1 | 2 | 1 | 247.48 | — | — | CACHED | 0.5 |
| 12 | UPS | Invoice_66321728_021326.csv |
— | 1 | 696 | 30,566.86 | — | — | TABLE | 19.1 |
Aggregate Summary
| Metric | Value |
|---|---|
| Total invoices processed | 12 |
| Total line items extracted | 18,176 |
| Total processing time | 1,600s (26.7 min) |
| Cache hits | 6 / 12 (50%) |
| New LLM calls (table path) | 5 |
| Text fallback triggers | 1 |
| Avg time — cached | 46.9s |
| Avg time — new LLM | 195.6s |
Fallback: DHL Duty/Tax Invoice
The headline result. GLAIR04128331.cleaned.pdf is a 32-page DHL duty/tax invoice where Docling's TableFormer model collapses multiple charge rows into single cells (e.g. "47.01 26.52 4.00"). The self-healing retry loop detects the problem but can't fix it because the source data is already mangled.
Every item has a tracking number (363/363). The fallback correctly identified the invoice date (2026-03-16) which the table path missed entirely.
Issues Found
CRITICAL: False Cache Hit on DPD 451806.I16806375
The DPD mapper created from invoice 116154 was applied to 451806.I16806375 and produced GBP 16.8 million — obviously wrong. The mapper validation accepts results that "produce items" but has no sanity check on the resulting total.
The cached mapper's fingerprint matched on header structure, but the invoices have fundamentally different data layouts. The validation gate needs tightening.
HIGH: DPD 116154 at 119% Accuracy
The largest invoice (120 pages, 125 tables) overcounted by ~19%. The fallback was triggered but produced worse results (GBP 1,376 — the text layout doesn't preserve table alignment well on DPD's dense format). The table path won by being "less wrong". This needs investigation into which tables are causing the overcount.
MEDIUM: No Invoice Total Detection for Most Invoices
Only 2 of 12 invoices had a detectable invoice total from summary tables. Without a total, there's no accuracy check, no retry feedback, and no fallback trigger. The detectInvoiceTotal function only scans small tables (<=10 rows) — it may be missing totals embedded in larger tables or in different formats.
Processing Time Breakdown
Extraction dominates for large PDFs (DPD 116154 spent 364s in Docling alone). Cached transforms are instant (<30ms). The text fallback adds ~80s of LLM time when triggered.
Cache Effectiveness
The DPD mapper created from 116154 was reused successfully by 3 other DPD invoices (3006995, 3029460, 3029461) — same table structure, instant transform. But it failed badly on 451806.I16806375 which has a different layout despite being the same courier.
Next Steps
- Fix false cache hits — add total-amount sanity check on cached mapper results. If the result is wildly implausible (e.g. >100x expected range for the courier), reject the cache hit and fall through to LLM
- Investigate DPD 119% overcount — identify which of the 125 tables are contributing duplicate/summary rows
- Improve invoice total detection — scan more table formats, look for totals in larger tables and page headers