Designing event-driven systems that age well

Jan 12, 2025 · 8 min read

Event-driven architectures are often sold as a scalability silver bullet. In practice, they are long-lived systems that accumulate complexity quickly if not designed with time, change, and failure in mind.

The most important property of an event-driven system is not throughput or decoupling, but its ability to evolve safely while being operated by humans.

Events are contracts, not messages

Treat every event as a public API. Once emitted, it will be consumed in ways you do not control. This means schemas must be explicit, versioned, and backward compatible by default.

If you cannot change an event without coordinating every consumer, the system will calcify. Favor additive changes and avoid semantic overloading of fields.

Design for idempotency early

Duplicate delivery is not a bug in distributed systems. It is a guarantee. Every consumer must be able to safely process the same event multiple times without side effects.

Idempotency is easiest to design at the beginning and extremely painful to retrofit later.

Operational simplicity beats theoretical purity

The systems that survive are the ones operators understand. Favor fewer topics, clearer naming, and boring retry strategies over clever abstractions that only make sense on whiteboards.

A system that can be debugged at 3 a.m. by someone new to the team is more valuable than one that is perfectly decoupled.