By Patrick McCurley

Vision Pipeline Findings — Root Cause & Fix

By Patrick McCurley · Created Mar 19, 2026 public

The Root Cause

The 300-460% overcount was NOT model hallucination. It was summary pages being extracted as line items.

What Was Happening

The 36-page DPD invoice has three types of pages:

Pages	Type	Content	Count
2-32	Line items	Individual shipment charges (£4-25 each)	~1,400 rows
1	Summary	Invoice overview: "Consignments £29,579"	Totals only
33-35	Summary	Surcharge breakdown, VAT analysis, payment summary	Aggregate totals
36	Empty	Blank page	Nothing

When all pages were sent to the vision model, pages 1 and 33-35 produced items like:

"Fuel and Energy Charge"    → totalAmount: £1,385.32  (surcharge TOTAL, not a line item)
"Consignments"              → totalAmount: £29,579.00 (invoice TOTAL)
"VAT"                       → totalAmount: £7,540.06  (VAT TOTAL)

These aggregate amounts were summed alongside the real per-shipment charges (£4.42, £5.20, etc.), inflating the total by 3-5×.

The Fix: LLM Page Classification

One cheap LLM call (~$0.001) classifies each distinct table layout as "line_items" or "summary":

What the LLM receives

I have a courier invoice PDF with 36 pages. Here are the distinct table layouts:

Layout 1 (pages 2-32, 1420 rows):
  Headers: Collection Date | Consignment Number | Reference | ...

Layout 2 (page 1, 11 rows):
  Headers: Current Charges | Invoice Number 61645007

Layout 3 (page 33, 4 rows):
  Headers: Code | Description | Surcharge Rate Code | ...

Layout 4 (page 34, 2 rows):
  Headers: Number | Carriage | Miscellaneous Charges | ...

Layout 5 (page 35, 2 rows):
  Headers: Payment Reference | Document Type | ...

Classify each as "line_items" or "summary".

What it returns

[
  {"layout": 1, "type": "line_items"},
  {"layout": 2, "type": "summary"},
  {"layout": 3, "type": "summary"},
  {"layout": 4, "type": "summary"},
  {"layout": 5, "type": "summary"}
]

Pages 1, 33, 34, 35 excluded. 32 data pages re-batched into 16 batches of 2.

Updated Vision Pipeline Architecture

Results After Fix

Metric	Before (all pages)	After (data pages only)
Batches processed	18	16
Pages excluded	0	4 (pages 1, 33, 34, 35)
Total items	1,432	1,407
Total amount	£76,335	£11,796
Accuracy	467%	86.4%
Per-batch amounts	✗ 3 batches wildly wrong	✓ All batches correct

Per-Batch Breakdown (after fix)

All batches now show reasonable per-item averages:

Pages	Items	Total	Avg/item
2-3	86	£1,414	£16.44 (By 10:30 service)
4-5	92	£908	£9.87
6-7	92	£613	£6.66
8-9	92	£588	£6.39
10-29	920	~£6,500	~£7
30-31	86	£793	£9.22 (offshore mix)
32+36	39	£811	£20.79 (offshore/tail)

Remaining Gap: 86% → 100%

The 14% undercount is NOT from wrong amounts — it's from the model extracting fewer items than exist:

Early stopping — some batches return 50 items when 92 are visible. The model decides it's "done" mid-page. finish_reason: stop at only 6,606 tokens.
Missing offshore/ROI items — pages 31-32 have different service types (offshore surcharges) that the model sometimes skips.

These are model-capability issues, not architectural problems. Sonnet 4.6 doesn't have them (proven 100%). The architecture is now sound — the question is whether to accept 86% from GPT-5.4-mini or spend $3.45 on Sonnet.

Summary

The "model hallucination" was actually a page classification bug. The model extracted amounts correctly — but summary/total pages containing aggregate values (£29K, £48K) were processed alongside individual shipment rows (£4-25), inflating the total by 4-5×.

The fix was one cheap LLM call (~$0.001) to classify which page layouts are summaries vs line items, then excluding summary pages from extraction batches. Total added cost: negligible.

Vision Pipeline Findings — Root Cause & Fix

The Root Cause

What Was Happening

The Fix: LLM Page Classification

What the LLM receives

What it returns

Updated Vision Pipeline Architecture

Results After Fix

Per-Batch Breakdown (after fix)

Remaining Gap: 86% → 100%

Summary

Sign in to Emberflow

This doc was made with emberflow

Appearance

API Keys

Team

Create your organization

Share