Migration Plan: Monolith → Microservices
Here's your migration plan visualized. We're decomposing the Node.js monolith into five bounded services using the Strangler Fig pattern — no big-bang cutover, each phase is independently reversible.
Current Architecture
Today everything runs in a single Express.js process. Auth, catalog, orders, payments, and notifications share one PostgreSQL database and deploy as a unit.
The monolith is a single point of failure. A memory leak in the notification handler took down checkout last month. Deploys require the full test suite (47 minutes) even for a copy change.
Target Architecture
Five independently deployable services behind an API gateway. Each service owns its database. Asynchronous communication via RabbitMQ for event-driven flows.
Migration Phases
We follow the Strangler Fig pattern — a routing layer wraps the monolith, and we migrate one domain at a time. Each phase includes a rollback window.
Checkout Flow (Critical Path)
Checkout is the highest-risk migration target — it touches auth, catalog, orders, and payments in a single user journey. Here's the post-migration sequence:
Risk Assessment
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Data inconsistency during dual-write | High | Critical | Change-data-capture (Debezium) with idempotent consumers. Hourly reconciliation job. |
| Increased latency from network hops | Medium | High | Service mesh with connection pooling. Budget: 50ms/hop, 200ms total. |
| Payment service PCI compliance gap | Low | Critical | Dedicated VPC for payment service. QSA auditor engaged at Phase 3 kickoff. |
| Message queue backpressure | Medium | Medium | Dead-letter queue with alerting. Auto-scaling consumers on queue depth. |
| Team skill gap on distributed systems | High | Medium | Pair programming rotation. Each phase includes a 1-week spike before implementation. |
| Cascading failure across services | Medium | Critical | Circuit breakers on all inter-service calls. Bulkhead isolation per service. |
Key Decisions
Why Strangler Fig over Big Bang? The monolith serves 12k RPM. A flag-day cutover risks extended downtime. Strangler Fig lets us migrate one domain at a time with instant rollback.
Why RabbitMQ over Kafka? Event volume is ~500/sec — well within RabbitMQ range. Kafka adds partition complexity we don't need yet. We can migrate later if volume 10x's.
Why separate databases? Shared databases create hidden coupling. Short-term migration pain is worth long-term independence — each service owns its schema.