By Patrick McCurley

Bug Investigation: Intermittent 502s on /api/checkout

By Patrick McCurley · Created Mar 21, 2026 · Updated Mar 22, 2026 public

Severity: P1 — ~8% of checkout requests returned 502 errors over an 18-hour window. Root cause: PostgreSQL connection pool leak in the inventory reservation path. Estimated revenue impact: ~$47,000.

Incident Timeline

Observable Symptoms

All four symptoms were correlated — the connection pool leak was the single upstream cause, with Redis timeouts being a downstream effect of request queue backpressure.

Root Cause Chain

The Buggy Code

The reserveInventory() function acquires a connection from the pool but fails to release it when an error occurs in the catch block:

async function reserveInventory(orderId, items) {
  const client = await pool.connect();
  try {
    await client.query('BEGIN');

    for (const item of items) {
      const res = await client.query(
        'UPDATE inventory SET reserved = reserved + $1 WHERE sku = $2 AND available >= $1 RETURNING *',
        [item.quantity, item.sku]
      );
      if (res.rowCount === 0) {
        throw new Error(`Insufficient inventory for SKU ${item.sku}`);
      }
    }

    await client.query('COMMIT');
    client.release();              // ✓ released on success
  } catch (err) {
    await client.query('ROLLBACK');
    // ✗ BUG: client.release() is never called here
    //   → connection is leaked back to the pool
    throw err;
  }
}

Every time an inventory reservation fails (out-of-stock, constraint violation, or any other error), the pool connection is permanently leaked. Once all 20 connections are consumed, the entire checkout pathway goes down.

The Fix

The fix moves client.release() into a finally block, guaranteeing the connection is returned regardless of outcome:

async function reserveInventory(orderId, items) {
  const client = await pool.connect();
  try {
    await client.query('BEGIN');

    for (const item of items) {
      const res = await client.query(
        'UPDATE inventory SET reserved = reserved + $1 WHERE sku = $2 AND available >= $1 RETURNING *',
        [item.quantity, item.sku]
      );
      if (res.rowCount === 0) {
        throw new Error(`Insufficient inventory for SKU ${item.sku}`);
      }
    }

    await client.query('COMMIT');
  } catch (err) {
    await client.query('ROLLBACK');
    throw err;
  } finally {
    client.release();              // ✓ always released
  }
}

This is a one-line fix — but it eliminates the entire failure cascade.

Impact Summary

Takeaways

Always use finally for resource cleanup — try/catch alone is not sufficient when acquiring pooled resources like database connections
Set pool idle timeouts — pg supports idleTimeoutMillis which would have reclaimed leaked connections after a delay, limiting blast radius
Add connection pool metrics — pool utilization should be a first-class dashboard metric with alerts at 80% capacity
Reproduce in staging — the leak only manifests under inventory failure conditions, which weren't covered by the existing load test suite

Bug Investigation: Intermittent 502s on /api/checkout

Incident Timeline

Observable Symptoms

Root Cause Chain

The Buggy Code

The Fix

Impact Summary

Takeaways

Sign in to Emberflow

This doc was made with emberflow

Appearance

API Keys

Team

Create your organization

Share