Closed-Loop AI Development: Self-Verifying Agents with VictoriaLogs, Docker & Playwright
Modern AI coding agents don't just write code — they can verify their own work. By combining centralized logging, browser automation, and isolated local infrastructure, we've built a development environment where AI agents operate in a closed loop: write code, run it, observe the results, and fix issues — all without touching production.
This guide walks through how Claimit achieves this using VictoriaLogs, Docker Compose, Playwright, and a CLI that safely imports production data into a local sandbox.
The Closed-Loop Concept
Traditional AI coding assistance is open-loop: the agent writes code and hopes it works. A closed-loop system gives the agent feedback channels to observe the consequences of its changes in real time.
The key insight: every feedback channel the agent needs — logs, test results, workflow state, database queries — is available locally through Docker containers and CLI tools. The agent never needs to touch production to verify its work.
VictoriaLogs: The AI Feedback Channel
VictoriaLogs is a high-performance log storage engine that serves as the central nervous system for local development observability. Every application running locally ships its logs to VictoriaLogs, and AI agents can query those logs in real time through a dedicated MCP (Model Context Protocol) server.
How Logs Flow
The Serilog Pipeline
Applications use Serilog with two custom formatters to ship logs to VictoriaLogs:
VictoriaLogsJsonFormatter converts each Serilog event into a JSON object with VictoriaLogs-specific fields:
{
"_time": "2026-03-17T14:30:00.000Z",
"_msg": "Claim DI058251104 transitioned to ChasingCourier",
"level": "Information",
"application": "Claimit.Portal",
"ClaimId": 12345,
"TrackingNumber": "DI058251104"
}VictoriaLogsBatchFormatter wraps multiple events into newline-delimited JSON (JSON Lines) — the format VictoriaLogs expects at its /insert/jsonline endpoint.
The configuration is straightforward — when running locally (not in Azure), the HTTP sink is enabled:
// Only active in local development — Azure uses OpenTelemetry instead
loggerConfig.WriteTo.Async(a => a.Http(
requestUri: "http://localhost:9428/insert/jsonline?_stream_fields=application,level",
queueLimitBytes: 10_000_000,
textFormatter: new VictoriaLogsJsonFormatter(),
batchFormatter: new VictoriaLogsBatchFormatter()
));Querying from an AI Agent
The MCP server at port 8081 exposes VictoriaLogs queries via the SSE protocol. An AI agent connected to this MCP server can run queries like:
- "Show me all errors in the last 10 minutes"
- "Find log entries for tracking number DI058251104"
- "What exceptions occurred during the last Playwright test run?"
This is the critical "eyes" that make closed-loop development possible — the agent doesn't just write code and hope, it watches what happens.
Local Docker Infrastructure
A single docker-compose.infrastructure.yml file brings up everything needed for local development. No manual setup, no cloud dependencies.
Starting Infrastructure
# Start everything (recommended)
./scripts/launch/infrastructure-only/start.sh
# Check what's running
./scripts/launch/infrastructure-only/start.sh --status
# Start without Temporal (lighter weight)
./scripts/launch/infrastructure-only/start.sh --no-temporal
# Tear down
./scripts/launch/infrastructure-only/start.sh --stopTemporal Search Attributes
The infrastructure auto-registers custom search attributes on first boot via a one-shot init container. These enable powerful workflow queries:
TrackingNumber, CourierName, ClientName, ClaimStatusId,
ClaimStatusName, WorkflowStatus, RemindersSent,
HasPendingIntervention, InterventionType, InterventionPriority,
InterventionCreatedAt, NextReminderDueAI agents can use these to query workflow state — for example, finding all stuck claims for a specific courier.
Safe Production Data Import
The local database starts empty, but real development work requires realistic data. The Claimit CLI bridges this gap by importing production data into the local Docker database — safely, with explicit permission gates.
CLI Commands
cd backend/Temporal/Claimit.Cli
# Step 1: Create/update the database schema
dotnet run -- localdb seed
# Step 2: Import recent production data
dotnet run -- sync all --days 14
# Import with force (overwrite conflicts)
dotnet run -- sync all --days 14 --forceWhat Gets Synced
The sync all command imports a time-bounded slice of production:
- Claims from the specified time period, with all their timeline entries
- Log entries for those claims (activity history)
- Email credentials (connection info, not passwords)
- Evidence files (DOR documents, invoices, courier responses)
- Reference data (couriers, claim statuses, configuration)
The Safety Model
The CLI has two separate environment switches — this is intentional:
| Flag | What it connects to | Risk level |
|---|---|---|
--temporal-prod |
Temporal Cloud (read-only queries) | Safe — read only |
--db-prod / --prod |
Production SQL database | Requires explicit approval |
The launch scripts prompt for sync automatically on first startup, so developers don't need to remember the CLI commands. The sync is also idempotent — running it again just updates what's changed.
Playwright E2E Testing
Playwright provides the browser-level verification layer. AI agents can trigger end-to-end tests that exercise the full stack — from clicking buttons in the Portal to verifying data in the database.
Architecture
Running Tests
cd frontend/test/Claimit.Portal.Playwright
# Run all tests
dotnet test
# Run specific test class
dotnet test --filter "FullyQualifiedName~LoginPageTests"
# Headed mode (see the browser)
dotnet test --settings playwright.runsettings
# With custom configuration
PORTAL_BASE_URL="http://localhost:5001" dotnet testThe Page Object Pattern
Tests use page objects with data-automation-id attributes for reliable element selection. This means tests don't break when CSS classes or DOM structure changes:
public class DashboardPage
{
private readonly IPage _page;
// Stable selectors — immune to visual redesigns
private ILocator SavedAmount => _page.Locator("[data-automation-id='atmid-dashboard-lifetimesavings-tile']");
private ILocator ClaimsSubmitted => _page.Locator("[data-automation-id='atmid-dashboard-claims-submitted-card']");
private ILocator AddClaimButton => _page.Locator("[data-automation-id='atmid-dashboard-addclaim-btn']");
}Available Dashboard Automation IDs
| Automation ID | Element |
|---|---|
atmid-dashboard-page |
Main dashboard container |
atmid-dashboard-lifetimesavings-tile |
Lifetime savings display |
atmid-dashboard-claims-submitted-card |
Submitted claims metric |
atmid-dashboard-claims-value-card |
Claims value metric |
atmid-dashboard-claims-won-card |
Won claims metric |
atmid-dashboard-credit-received-card |
Credit received metric |
atmid-dashboard-daterange-btn |
Date range picker |
atmid-dashboard-currency-btn |
Currency selector |
atmid-dashboard-addclaim-btn |
New Claim button |
The AI Agent Workflow Loop
This is where everything comes together. Here's how an AI agent uses the full infrastructure to verify its own work:
Concrete Example
Imagine an AI agent is asked to add a new column to the dashboard showing "Average Days to Resolution":
- Edit: Agent adds a new card component with
data-automation-id="atmid-dashboard-avg-resolution-card", writes the backend query, and updates the DTO - Build:
dotnet build Claimit.sln— catches any type errors or missing references - Test: Agent writes a Playwright test asserting the new card appears with a numeric value, runs
dotnet test - Logs: Agent queries VictoriaLogs via MCP — "any errors from Claimit.Portal in the last 2 minutes?" — confirms no exceptions
- Verify: Temporal workflows unaffected (no workflow changes in this task)
- Result: All green — agent commits the change
The entire loop runs locally. Production is never touched.
SafeMode for AI Tools
When developing with the AI Comms Tool (the email simulator), the launch script enforces SafeMode — a set of guardrails that prevent accidental side effects:
This means an AI agent can freely experiment with email processing logic — classifying emails, generating responses, updating claim statuses — without any risk of:
- Sending real emails to couriers
- Modifying production mailboxes
- Requiring authentication credentials
The launch script handles all of this automatically:
./scripts/launch/ai-comms-tool/ai-comms-tool-local.shIt starts infrastructure, seeds the database, syncs recent production data, enforces SafeMode, and opens the browser — one command.
Dev Tools & Launch Scripts
Three launch scripts cover the main development scenarios:
Daily Development Workflow
# Morning: start infrastructure (prompts for sync if stale)
./scripts/launch/infrastructure-only/start.sh
# Open your IDE, debug Portal at https://localhost:7001
# Run Playwright tests after changes
cd frontend/test/Claimit.Portal.Playwright && dotnet test
# Check logs in browser
# http://localhost:9428/select/vmui
# Look up a specific claim
cd backend/Temporal/Claimit.Cli
dotnet run -- lookup DI058251104
# Simulate a Temporal workflow interactively
dotnet run -- temporal simulate 6A04083571549
# End of day: stop everything
./scripts/launch/infrastructure-only/start.sh --stopWhy This Matters
The closed-loop approach changes what AI agents can reliably do:
| Open-Loop (traditional) | Closed-Loop (this approach) |
|---|---|
| Agent writes code, human verifies | Agent writes code and verifies |
| Bugs found in PR review | Bugs found before commit |
| "It compiles" = done | Compiles + tests pass + no log errors = done |
| Agent guesses if changes work | Agent observes if changes work |
| Requires production access to debug | Everything runs locally |
The infrastructure cost is a docker compose up command and about 4 GB of RAM. The benefit is an AI agent that can work autonomously through multi-step tasks with confidence that its changes actually work — all inside a safe, disposable sandbox that mirrors production.