Invoice Processing — Temporal Architecture (Increment 1)
This document describes the production architecture for Claimit's invoice processing pipeline, migrated from a monolithic Express route handler to Temporal workflow orchestration. Increment 1 covers infrastructure setup and the extraction phase.
System Overview
The invoice pipeline extracts structured line items from courier invoices (PDF/CSV) using a multi-tier accuracy strategy. Temporal orchestrates the pipeline, replacing hand-rolled retry loops, health checks, and SSE streaming with durable workflow execution.
What Changed: POC → Temporal
The POC was a single 3,600-line Express route handler (parse.routes.ts) that orchestrated everything inline. The migration breaks it into discrete, independently retryable activities coordinated by a durable workflow.
Project Structure
Three npm workspaces. The new worker/ workspace contains the Temporal worker, workflow, activities, and services.
Docker Compose Topology
Five services across two networks. The worker bridges both — reaching Docling on invoice-poc and the Temporal server on temporal-network.
Extraction Phase — Sequence
The extract activity downloads the invoice file, calls Docling in 10-page chunks with heartbeats between each, and returns structured tables. This is the only activity implemented in Increment 1.
Activity Configuration
| Setting | Value | Rationale |
|---|---|---|
startToCloseTimeout |
30 minutes | Large PDFs (100+ pages) can take 20+ minutes with chunking |
heartbeatTimeout |
60 seconds | Detects stuck Docling between chunks |
retry.maximumAttempts |
3 | Docling may be restarting (Docker OOM) — retry after backoff |
retry.initialInterval |
10 seconds | Give Docker time to restart the Docling container |
retry.backoffCoefficient |
2 | Exponential: 10s → 20s → 40s |
Frontend Progress Polling
The workflow exposes a getProgress query. The frontend polls the Express BFF every 2-3 seconds, which proxies the query to Temporal. This replaces SSE streaming — no open connections, stateless, works across container restarts.
Full Pipeline Phases (Planned)
Increment 1 implements extraction. The remaining phases will be added in subsequent increments — each is a separate Temporal activity with its own retry policy and timeout.
Key Design Decisions
Docling as HTTP Sidecar (not Activity)
Docling loads ML models on startup (~30s). Making it an activity would mean either loading models per execution (30s overhead per PDF) or keeping a persistent process — which is exactly what the HTTP sidecar already is. The sidecar pattern gives us: persistent warm models, independent memory lifecycle (Docker restarts on OOM), and clean language boundary (Python ML / TypeScript orchestration).
Node.js Worker (not C# port)
The pipeline logic is 3,600 lines of TypeScript. Porting to C# would be a massive effort with high regression risk. By using the Node.js Temporal SDK, we keep the existing TypeScript code, extract it into activities, and get Temporal's orchestration benefits without rewriting business logic.
Separate Worker Workspace
The worker runs in its own npm workspace (worker/) rather than sharing backend/. This gives clean separation: the Express BFF only needs @temporalio/client (lightweight), while the worker needs @temporalio/worker + @temporalio/workflow + @temporalio/activity (heavier, includes native binaries). Different deployment lifecycle, different scaling profile.
Files Created in Increment 1
| File | Lines | Purpose |
|---|---|---|
worker/package.json |
28 | Temporal SDK dependencies |
worker/tsconfig.json |
16 | TypeScript config for Node ESM |
worker/src/types/index.ts |
118 | All shared types: DoclingTable, LineItem, workflow I/O, progress |
worker/src/services/docling-client.ts |
131 | Typed HTTP client for all Docling endpoints + chunked extraction |
worker/src/activities/extract.ts |
75 | Extract activity: download → chunk → heartbeat → return tables |
worker/src/workflows/invoice-processing.ts |
97 | Skeleton workflow: extract phase + getProgress query |
worker/src/worker.ts |
33 | Worker entrypoint: connect, register, run |
docker/worker-entrypoint.sh |
5 | Container entrypoint |