The Fundamentals

Event sourcing is most often used with time- or linear process-based systems. A shop ordering app might transition transactions between “pending”, “approved” and “shipped” states. The transition between each state is a distinct “event” in the lifecycle of that order.

This system could be modelled using a traditional relational database:

In most systems, knowing the current status isn’t enough though. You also want to see when the order became approved. We could do this by adding extra fields:

Now it’s clear the order was created on April 30th and approved on May 1st; its shipping is still pending. For many applications, this structure works well. It can become restrictive as more states are added though.

Let’s now look at this example restructured to use event sourcing:

The order’s state transitions have been separated into a dedicated event log. With this model, it’s trivial to add a new event type in the future. If a customer needs to cancel an order, we can write an order.cancelled event into the log.

Event sourcing could also be used to track data about the order itself. You might add event types for order.apply_discount or order.process_refund. You can record an event at any time, creating a new state for the order while keeping its previous states accessible.

Reconstructing An Object’s State

You can determine an object’s current state by fetching all the events that relate to it. If you want to know whether an order’s been approved, check whether its event collection contains an order.approved event.

You can reconstruct historical states in exactly the same way. If you need to know the order’s state on a particular date, fetch all the events logged on or before that day.

Event sourcing is an incremental model that tracks an object’s chronology automatically. Every change to the object’s state should be captured as a timestamped event. When you use event sourcing, you have automatic version control and a complete audit log of state transitions.

If you ever need to rebuild an object’s history, you can discard any events created after a given date. Imagine that a user’s account was compromised by a malicious actor who took various actions on their behalf. In a fully event-sourced system, you could delete any events linked to that user in the questionable timeframe. This would recover their account to a good state.

To preserve this ability, you must ensure events are immutable. Once an event has been logged, you should never alter its properties. If an event needs adjustment, you’d log another event to effect the change.

The system’s immutability means you can safely use it to travel back in time and rebuild historical states. If you wanted to reproduce your database as it was two years ago, you could discard all the events logged in the intervening time.

Another benefit of event sourcing is increased visibility into the state of the system. If a user reports a bug, you can clone the database, discard new events and repeat their steps from a known good point. Analysing logged events can help you pinpoint bugs in your codebase.

Event Sourcing in The Real World

Event sourcing has a reputation for being complex, unwieldy and overcomplicated. Historically, event sourcing has been tied to applications with stringent audit requirements and a proven need to replay events.

This doesn’t have to be the case though. If you’re an experienced developer, you’ve probably implemented something close to event sourcing in the past. Any system which keeps a record of “events” – such as user login attempts, website page hits, or order processing stages – naturally gravitates towards an event sourced approach, even if you’re not implementing one intentionally.

A deliberate implementation of event sourcing in code can be cumbersome. Developers usually assume that data fetched from databases is an accurate representation of an object’s current state. With event sourcing, the data you fetch carries little value on its own. You need to “replay” events in your codebase to create a representation of the state.

Event sourcing can give rise to performance overheads. In the example above, creating a complete representation of the order object now requires much more data to be fetched. In the traditional model, a single selection provides all the data associated with the order.

There’s other downsides too. In the event something does go wrong, event sourcing can be harder to fix. You can’t write a quick patch or manually hotfix the database. You’ll need to gracefully transition your event schemas while ensuring the historical chronology remains intact.

Combining Event Sourcing and CQRS

Event sourcing is commonly combined with CQRS (Command Query Responsibility Segregation). This pattern advocates the separation of commands (which write data) from queries (which read data).

The use of event sourcing necessitates a degree of CQRS. Data is written into your database via events. The write model is therefore execeedingly simple: it’s a one-way append-only log of the events that occur. Events are a manifestation of the “commands” described by CQRS.

The data retrieval model is completely independent. You can use the most appropriate query system to fetch events and layer them into the application’s current state. The commands (events) transition the system into a new state; queries expose the state aspects that your application requests.

Put a different way, the read and write models have no awareness of how the other functions. The entities in your codebase (such as an Order) are stored as a sequence of chronological events. The stored events can’t be rehydrated into the entity they pertain to without knowledge of how that entity’s state is derived. That knowledge is implemented within the read (query) model, which brings data back into your application.

This characteristic allows the persistence layer to be simplified to a single record insertion for each operation. The stored data doesn’t need to precisely represent any particular entity as the query model will manipulate it later. This contrasts with a “traditional” relational database where table fields often closely align with the properties of objects in your codebase.

The output of the query model is known as a projection. A projection is a representation of data using a different perspective to its storage system. When using event sourcing, data is stored as a stream of events but projected into a representation of the application’s current state. That representation is stateless, immutable and idempotent – creating the representation doesn’t modify the underlying event data in any way.

A single projection might need to interact with several different event types. Projections look at the data in aggregate, in a way that makes sense to the application’s functions. In our example from earlier, an Order projection could output an object containing CreatedDate, ApprovedDate and ShippedData properties by examining the events associated with the subject order.

Conclusion

Event sourcing is a specialised software architecture with innate support for accountability, chronology and state recreation. The pattern can significantly simplify the implementation of applications where these qualities are desirable.

Event sourcing can also be useful in other scenarios, although diminishing returns may be encountered. Adopting event sourcing requires developers implement code to determine the current state, adding overheads which don’t arise in systems which only store the current state in their database.

Event sourcing is best combined with other software architecture techniques such as CQRS, the observer pattern and eventual consistency. You don’t need to use event sourcing across your entire app – often, individual submodules will benefit from event sourcing while the bulk of your project sticks with a traditional relational database.