By Patrick McCurley

Closed-Loop AI Development: Self-Verifying Agents with VictoriaLogs, Docker & Playwright

By Patrick McCurley · Created Mar 17, 2026 public

Modern AI coding agents don't just write code — they can verify their own work. By combining centralized logging, browser automation, and isolated local infrastructure, we've built a development environment where AI agents operate in a closed loop: write code, run it, observe the results, and fix issues — all without touching production.

This guide walks through how Claimit achieves this using VictoriaLogs, Docker Compose, Playwright, and a CLI that safely imports production data into a local sandbox.

The Closed-Loop Concept

Traditional AI coding assistance is open-loop: the agent writes code and hopes it works. A closed-loop system gives the agent feedback channels to observe the consequences of its changes in real time.

The key insight: every feedback channel the agent needs — logs, test results, workflow state, database queries — is available locally through Docker containers and CLI tools. The agent never needs to touch production to verify its work.

VictoriaLogs: The AI Feedback Channel

VictoriaLogs is a high-performance log storage engine that serves as the central nervous system for local development observability. Every application running locally ships its logs to VictoriaLogs, and AI agents can query those logs in real time through a dedicated MCP (Model Context Protocol) server.

How Logs Flow

The Serilog Pipeline

Applications use Serilog with two custom formatters to ship logs to VictoriaLogs:

VictoriaLogsJsonFormatter converts each Serilog event into a JSON object with VictoriaLogs-specific fields:

{
  "_time": "2026-03-17T14:30:00.000Z",
  "_msg": "Claim DI058251104 transitioned to ChasingCourier",
  "level": "Information",
  "application": "Claimit.Portal",
  "ClaimId": 12345,
  "TrackingNumber": "DI058251104"
}

VictoriaLogsBatchFormatter wraps multiple events into newline-delimited JSON (JSON Lines) — the format VictoriaLogs expects at its /insert/jsonline endpoint.

The configuration is straightforward — when running locally (not in Azure), the HTTP sink is enabled:

// Only active in local development — Azure uses OpenTelemetry instead
loggerConfig.WriteTo.Async(a => a.Http(
    requestUri: "http://localhost:9428/insert/jsonline?_stream_fields=application,level",
    queueLimitBytes: 10_000_000,
    textFormatter: new VictoriaLogsJsonFormatter(),
    batchFormatter: new VictoriaLogsBatchFormatter()
));

Querying from an AI Agent

The MCP server at port 8081 exposes VictoriaLogs queries via the SSE protocol. An AI agent connected to this MCP server can run queries like:

"Show me all errors in the last 10 minutes"
"Find log entries for tracking number DI058251104"
"What exceptions occurred during the last Playwright test run?"

This is the critical "eyes" that make closed-loop development possible — the agent doesn't just write code and hope, it watches what happens.

Local Docker Infrastructure

A single docker-compose.infrastructure.yml file brings up everything needed for local development. No manual setup, no cloud dependencies.

Starting Infrastructure

# Start everything (recommended)
./scripts/launch/infrastructure-only/start.sh

# Check what's running
./scripts/launch/infrastructure-only/start.sh --status

# Start without Temporal (lighter weight)
./scripts/launch/infrastructure-only/start.sh --no-temporal

# Tear down
./scripts/launch/infrastructure-only/start.sh --stop

Temporal Search Attributes

The infrastructure auto-registers custom search attributes on first boot via a one-shot init container. These enable powerful workflow queries:

TrackingNumber, CourierName, ClientName, ClaimStatusId,
ClaimStatusName, WorkflowStatus, RemindersSent,
HasPendingIntervention, InterventionType, InterventionPriority,
InterventionCreatedAt, NextReminderDue

AI agents can use these to query workflow state — for example, finding all stuck claims for a specific courier.

Safe Production Data Import

The local database starts empty, but real development work requires realistic data. The Claimit CLI bridges this gap by importing production data into the local Docker database — safely, with explicit permission gates.

CLI Commands

cd backend/Temporal/Claimit.Cli

# Step 1: Create/update the database schema
dotnet run -- localdb seed

# Step 2: Import recent production data
dotnet run -- sync all --days 14

# Import with force (overwrite conflicts)
dotnet run -- sync all --days 14 --force

What Gets Synced

The sync all command imports a time-bounded slice of production:

Claims from the specified time period, with all their timeline entries
Log entries for those claims (activity history)
Email credentials (connection info, not passwords)
Evidence files (DOR documents, invoices, courier responses)
Reference data (couriers, claim statuses, configuration)

The Safety Model

The CLI has two separate environment switches — this is intentional:

Flag	What it connects to	Risk level
`--temporal-prod`	Temporal Cloud (read-only queries)	Safe — read only
`--db-prod` / `--prod`	Production SQL database	Requires explicit approval

The launch scripts prompt for sync automatically on first startup, so developers don't need to remember the CLI commands. The sync is also idempotent — running it again just updates what's changed.

Playwright E2E Testing

Playwright provides the browser-level verification layer. AI agents can trigger end-to-end tests that exercise the full stack — from clicking buttons in the Portal to verifying data in the database.

Architecture

Running Tests

cd frontend/test/Claimit.Portal.Playwright

# Run all tests
dotnet test

# Run specific test class
dotnet test --filter "FullyQualifiedName~LoginPageTests"

# Headed mode (see the browser)
dotnet test --settings playwright.runsettings

# With custom configuration
PORTAL_BASE_URL="http://localhost:5001" dotnet test

The Page Object Pattern

Tests use page objects with data-automation-id attributes for reliable element selection. This means tests don't break when CSS classes or DOM structure changes:

public class DashboardPage
{
    private readonly IPage _page;

    // Stable selectors — immune to visual redesigns
    private ILocator SavedAmount => _page.Locator("[data-automation-id='atmid-dashboard-lifetimesavings-tile']");
    private ILocator ClaimsSubmitted => _page.Locator("[data-automation-id='atmid-dashboard-claims-submitted-card']");
    private ILocator AddClaimButton => _page.Locator("[data-automation-id='atmid-dashboard-addclaim-btn']");
}

Available Dashboard Automation IDs

Automation ID	Element
`atmid-dashboard-page`	Main dashboard container
`atmid-dashboard-lifetimesavings-tile`	Lifetime savings display
`atmid-dashboard-claims-submitted-card`	Submitted claims metric
`atmid-dashboard-claims-value-card`	Claims value metric
`atmid-dashboard-claims-won-card`	Won claims metric
`atmid-dashboard-credit-received-card`	Credit received metric
`atmid-dashboard-daterange-btn`	Date range picker
`atmid-dashboard-currency-btn`	Currency selector
`atmid-dashboard-addclaim-btn`	New Claim button

The AI Agent Workflow Loop

This is where everything comes together. Here's how an AI agent uses the full infrastructure to verify its own work:

Concrete Example

Imagine an AI agent is asked to add a new column to the dashboard showing "Average Days to Resolution":

Edit: Agent adds a new card component with data-automation-id="atmid-dashboard-avg-resolution-card", writes the backend query, and updates the DTO
Build: dotnet build Claimit.sln — catches any type errors or missing references
Test: Agent writes a Playwright test asserting the new card appears with a numeric value, runs dotnet test
Logs: Agent queries VictoriaLogs via MCP — "any errors from Claimit.Portal in the last 2 minutes?" — confirms no exceptions
Verify: Temporal workflows unaffected (no workflow changes in this task)
Result: All green — agent commits the change

The entire loop runs locally. Production is never touched.

SafeMode for AI Tools

When developing with the AI Comms Tool (the email simulator), the launch script enforces SafeMode — a set of guardrails that prevent accidental side effects:

This means an AI agent can freely experiment with email processing logic — classifying emails, generating responses, updating claim statuses — without any risk of:

Sending real emails to couriers
Modifying production mailboxes
Requiring authentication credentials

The launch script handles all of this automatically:

./scripts/launch/ai-comms-tool/ai-comms-tool-local.sh

It starts infrastructure, seeds the database, syncs recent production data, enforces SafeMode, and opens the browser — one command.

Dev Tools & Launch Scripts

Three launch scripts cover the main development scenarios:

Daily Development Workflow

# Morning: start infrastructure (prompts for sync if stale)
./scripts/launch/infrastructure-only/start.sh

# Open your IDE, debug Portal at https://localhost:7001

# Run Playwright tests after changes
cd frontend/test/Claimit.Portal.Playwright && dotnet test

# Check logs in browser
# http://localhost:9428/select/vmui

# Look up a specific claim
cd backend/Temporal/Claimit.Cli
dotnet run -- lookup DI058251104

# Simulate a Temporal workflow interactively
dotnet run -- temporal simulate 6A04083571549

# End of day: stop everything
./scripts/launch/infrastructure-only/start.sh --stop

Why This Matters

The closed-loop approach changes what AI agents can reliably do:

Open-Loop (traditional)	Closed-Loop (this approach)
Agent writes code, human verifies	Agent writes code and verifies
Bugs found in PR review	Bugs found before commit
"It compiles" = done	Compiles + tests pass + no log errors = done
Agent guesses if changes work	Agent observes if changes work
Requires production access to debug	Everything runs locally

The infrastructure cost is a docker compose up command and about 4 GB of RAM. The benefit is an AI agent that can work autonomously through multi-step tasks with confidence that its changes actually work — all inside a safe, disposable sandbox that mirrors production.