By Patrick McCurley

ETL Pipeline — Full Invoice Test Run

By Patrick McCurley · Created Mar 17, 2026 public

Date: 2026-03-17 | Invoices: 12 | Couriers: DHL, DPD, Evri, UPS | Feature: Text Fallback Tier (TECH-569)

This report covers a full test run of every invoice in the POC dataset through the ETL pipeline, including the new PDF text fallback path that bypasses Docling's table detection when accuracy is insufficient.

Pipeline Architecture

Results — All 12 Invoices

# Courier File Pgs Tbls Items Extracted (GBP) Invoice Total (GBP) Accuracy Path Time (s)
1 DHL GLAIR04128331.cleaned.pdf 32 36 363 8,392.51 8,392.51 100.0% FALLBACK 340.7
2 DHL dhl-invoice-1.pdf 2 4 10 184.76 TABLE 21.5
3 DPD 116154.I16806945.pdf 120 125 5,235 35,183.44 29,579.00 118.9% TABLE 903.5
4 DPD 3006995.I61645007.pdf 36 42 1,384 11,505.27 CACHED 103.0
5 DPD 3029460.I61853080.pdf 9 11 199 1,557.94 CACHED 16.1
6 DPD 3029461.I61853081.pdf 11 13 308 1,395.18 CACHED 23.5
7 DPD 451806.16785367.csv 1 7,956 18,220.76 TABLE 20.6
8 DPD 451806.I16806375.pdf 49 55 1,995 16,821,638.45 CACHED 137.6
9 Evri BAINV00302644_14_02_2026p.pdf 3 5 28 12,741.23 TABLE 13.4
10 Evri BAINV00302995_18_02_2026p.pdf 1 2 1 148.59 CACHED 0.5
11 Evri BAINV00291481_16_10_2025p.pdf 1 2 1 247.48 CACHED 0.5
12 UPS Invoice_66321728_021326.csv 1 696 30,566.86 TABLE 19.1

Aggregate Summary

Metric Value
Total invoices processed 12
Total line items extracted 18,176
Total processing time 1,600s (26.7 min)
Cache hits 6 / 12 (50%)
New LLM calls (table path) 5
Text fallback triggers 1
Avg time — cached 46.9s
Avg time — new LLM 195.6s

Fallback: DHL Duty/Tax Invoice

The headline result. GLAIR04128331.cleaned.pdf is a 32-page DHL duty/tax invoice where Docling's TableFormer model collapses multiple charge rows into single cells (e.g. "47.01 26.52 4.00"). The self-healing retry loop detects the problem but can't fix it because the source data is already mangled.

Every item has a tracking number (363/363). The fallback correctly identified the invoice date (2026-03-16) which the table path missed entirely.

Issues Found

CRITICAL: False Cache Hit on DPD 451806.I16806375

The DPD mapper created from invoice 116154 was applied to 451806.I16806375 and produced GBP 16.8 million — obviously wrong. The mapper validation accepts results that "produce items" but has no sanity check on the resulting total.

The cached mapper's fingerprint matched on header structure, but the invoices have fundamentally different data layouts. The validation gate needs tightening.

HIGH: DPD 116154 at 119% Accuracy

The largest invoice (120 pages, 125 tables) overcounted by ~19%. The fallback was triggered but produced worse results (GBP 1,376 — the text layout doesn't preserve table alignment well on DPD's dense format). The table path won by being "less wrong". This needs investigation into which tables are causing the overcount.

MEDIUM: No Invoice Total Detection for Most Invoices

Only 2 of 12 invoices had a detectable invoice total from summary tables. Without a total, there's no accuracy check, no retry feedback, and no fallback trigger. The detectInvoiceTotal function only scans small tables (<=10 rows) — it may be missing totals embedded in larger tables or in different formats.

Processing Time Breakdown

Extraction dominates for large PDFs (DPD 116154 spent 364s in Docling alone). Cached transforms are instant (<30ms). The text fallback adds ~80s of LLM time when triggered.

Cache Effectiveness

The DPD mapper created from 116154 was reused successfully by 3 other DPD invoices (3006995, 3029460, 3029461) — same table structure, instant transform. But it failed badly on 451806.I16806375 which has a different layout despite being the same courier.

Next Steps

  1. Fix false cache hits — add total-amount sanity check on cached mapper results. If the result is wildly implausible (e.g. >100x expected range for the courier), reject the cache hit and fall through to LLM
  2. Investigate DPD 119% overcount — identify which of the 125 tables are contributing duplicate/summary rows
  3. Improve invoice total detection — scan more table formats, look for totals in larger tables and page headers