By Patrick McCurley

Invoice Processing — Temporal Architecture (Increment 1)

By Patrick McCurley · Created Mar 23, 2026 public

This document describes the production architecture for Claimit's invoice processing pipeline, migrated from a monolithic Express route handler to Temporal workflow orchestration. Increment 1 covers infrastructure setup and the extraction phase.

System Overview

The invoice pipeline extracts structured line items from courier invoices (PDF/CSV) using a multi-tier accuracy strategy. Temporal orchestrates the pipeline, replacing hand-rolled retry loops, health checks, and SSE streaming with durable workflow execution.

What Changed: POC → Temporal

The POC was a single 3,600-line Express route handler (parse.routes.ts) that orchestrated everything inline. The migration breaks it into discrete, independently retryable activities coordinated by a durable workflow.

Project Structure

Three npm workspaces. The new worker/ workspace contains the Temporal worker, workflow, activities, and services.

Docker Compose Topology

Five services across two networks. The worker bridges both — reaching Docling on invoice-poc and the Temporal server on temporal-network.

Extraction Phase — Sequence

The extract activity downloads the invoice file, calls Docling in 10-page chunks with heartbeats between each, and returns structured tables. This is the only activity implemented in Increment 1.

Activity Configuration

Setting Value Rationale
startToCloseTimeout 30 minutes Large PDFs (100+ pages) can take 20+ minutes with chunking
heartbeatTimeout 60 seconds Detects stuck Docling between chunks
retry.maximumAttempts 3 Docling may be restarting (Docker OOM) — retry after backoff
retry.initialInterval 10 seconds Give Docker time to restart the Docling container
retry.backoffCoefficient 2 Exponential: 10s → 20s → 40s

Frontend Progress Polling

The workflow exposes a getProgress query. The frontend polls the Express BFF every 2-3 seconds, which proxies the query to Temporal. This replaces SSE streaming — no open connections, stateless, works across container restarts.

Full Pipeline Phases (Planned)

Increment 1 implements extraction. The remaining phases will be added in subsequent increments — each is a separate Temporal activity with its own retry policy and timeout.

Key Design Decisions

Docling as HTTP Sidecar (not Activity)

Docling loads ML models on startup (~30s). Making it an activity would mean either loading models per execution (30s overhead per PDF) or keeping a persistent process — which is exactly what the HTTP sidecar already is. The sidecar pattern gives us: persistent warm models, independent memory lifecycle (Docker restarts on OOM), and clean language boundary (Python ML / TypeScript orchestration).

Node.js Worker (not C# port)

The pipeline logic is 3,600 lines of TypeScript. Porting to C# would be a massive effort with high regression risk. By using the Node.js Temporal SDK, we keep the existing TypeScript code, extract it into activities, and get Temporal's orchestration benefits without rewriting business logic.

Separate Worker Workspace

The worker runs in its own npm workspace (worker/) rather than sharing backend/. This gives clean separation: the Express BFF only needs @temporalio/client (lightweight), while the worker needs @temporalio/worker + @temporalio/workflow + @temporalio/activity (heavier, includes native binaries). Different deployment lifecycle, different scaling profile.

Files Created in Increment 1

File Lines Purpose
worker/package.json 28 Temporal SDK dependencies
worker/tsconfig.json 16 TypeScript config for Node ESM
worker/src/types/index.ts 118 All shared types: DoclingTable, LineItem, workflow I/O, progress
worker/src/services/docling-client.ts 131 Typed HTTP client for all Docling endpoints + chunked extraction
worker/src/activities/extract.ts 75 Extract activity: download → chunk → heartbeat → return tables
worker/src/workflows/invoice-processing.ts 97 Skeleton workflow: extract phase + getProgress query
worker/src/worker.ts 33 Worker entrypoint: connect, register, run
docker/worker-entrypoint.sh 5 Container entrypoint