Book review: Patterns of Enterprise Application Architecture

book

After starting a new job and diving into the largest codebase I’ve ever worked on, I soon realized that I was missing something in my understanding of software architecture. I didn’t quite know what it was, but I knew that old adages like “use fat models” and “keep your controllers thin” were not enough to organize the code. I encountered models loaded with data munging and unrelated business logic, and controllers that had become a dumping ground for the rest of the application’s logic. In these situations, it’s tempting to curse our tools, stomp our feet, and search the web for the next shiny thing that promises to fix our problems. But I knew that another tool or MVC maxim wasn’t going to help; I needed a deeper understanding of software architecture.

So I asked my manager for suggestions on how to improve in this area, and he suggested Patterns of Enterprise Application Architecture (EAA) by Martin Fowler. It’s basically a textbook, but I wanted a comprehensive resource; something beyond the light articles I was used to reading about MVC. And to my surprise, I found the book very accessible, which I credit to the separation of its two parts:

  • Part 1: A short (~100 pages) linear narrative that walks through common problems in application development. Each problem also includes a brief mention of the solution (the pattern).

  • Part 2: An dictionary of patterns that are referenced in Part 1. Although this section is much more dense, its comprehensive descriptions are offset by the bird’s eye narratives in Part 1.

Naturally, I started with Part 1, which is loaded with references to the patterns in Part 2. It felt foreign at first, as I was constantly looking up the patterns in Part 2 to keep up with the narrative. But after getting a hang of the basics like layering, and the essence of object-relational mapping, it wasn’t long before I was conceptualizing the patterns (or lack thereof) in my own work.

To be clear, many of the patterns are still over my head, but at least his explanations don’t make me feel like a dunce. Eventually my understanding of our codebase morphed from a tangle of MVC dependencies into a clearer distinction of view, controller, service, domain, and data layers (even if the seams between those layers are still somewhat tangled). Furthermore, the underlying libraries I’m working with no longer seems so opaque, as I now understand the crux that libraries like ActiveRecord are solving (and where they fall short).

To share these learnings, I picked a couple examples which I’ll describe in detail.

Example #1: Embedded Value pattern Link to heading

I was initially concerned that EAA was going to be heavy with ivory tower theory, but I found much of the advice pragmatic. While reading the narratives, one of the patterns that I was able to immediately pick up was the “Embedded Value” pattern:

In some cases referential integrity can make updates more complex… Small Value Objects, such as date ranges and money objects clearly shouldn’t be represented as their own table in the database. Instead, take all the fields of the Value Object and embed them into the linked object as an Embedded Value. (Mapping to Relational Databases, pg.44)

This led me to look up the “Value Object” definition:

VALUE OBJECT

With object systems of various kinds, I’ve found it useful to distinguish between reference objects and Value Objects. Of the two, a Value Object is usually the smaller; it’s similar to the primitive types present in many languages that aren’t purely object-oriented…

A key difference between reference and value objects is how they deal with equality. A reference object uses identity as the basis for equality… A Value Object bases its notion on equality on field values within the class (eg two date objects are the same if their month/day/year values are the same).

Value Objects shouldn’t be persisted as complete records. Instead use Embedded Value or Serialized LOB. (Value Object, pg. 486)

and the “Embedded Value” pattern:

EMBEDDED VALUE

Maps an object into several fields of another object’s table.

The simplest cases for Embedded Value are the clear, simple Value Objects like money and date range. (Embedded Value, pg. 268)

Although it wasn’t my first time using this pattern, it was my first time consciously applying it, and putting a name to it. Grokking the definitive meaning behind this pattern has helped structure better solutions as well.

This pattern also exemplifies a persistent theme throughout the book: that it might cause more damage than good if trying to implement the wrong pattern, or even the right pattern at the wrong time.

Patterns often include great detail about their mechanics, but lack clarity about when they should be applied (the notorious “given a hammer, everything needs pounding” symptom comes to mind). In contrast, each pattern in EAA has a “When to Use It” section that addresses this issue quite well. Under “Embedded Value”, this section says:

When to Use It

This is one of those patterns where the doing of it is very straightforward, but knowing when to use it a little more complicated…

The grey area is in whether it’s worth storing reference objects, such as an order and a shipping object, using Embedded Value. The principal question here is whether the shipping data has any relevance outside the context of the order. One issue is the loading and saving. If you only load the shipping data into memory when you load the order, that’s an argument for saving both in the same table… (Embedded Value, pg. 269)

The section continues with an elaboration of other contexts where Embedded Value should/shouldn’t be applied. I’ve found these caveats invaluable once I’m actually in the weeds of applying them.

Example #2: Offline Concurrency Link to heading

EAA is often lauded for it’s section on offline concurrency, especially in how it breaks down the complex topic into practical lessons. The narrative on this section is unwavering in the reservations you should make before going forth with offline concurrency, which echoes his guidelines on distributed systems as well (“Rule #1: avoid doing it!”):

As much as possible, you should let your transaction system deal with concurrency problems. Handling concurrency control that spans system transactions plonks you firmly in the murky waters of dealing with concurrency yourself. This water is full of virtual sharks, jellyfish, piranhas, and other, less friendly creatures. Unfortunately, the mismatch between business and system transactions means you sometimes just have to wade in…

If you can make all your business transactions fit into a system transaction by ensuring that they fit within a single request, then do that. If you can get away with long transactions by forsaking scalability, then do that. By leaving concurrency control in the hands of your transaction software, you’ll avoid a great deal of trouble. These techniques are what you have to use when you can’t do that. Because of the tricky nature on concurrency, we have to stress again that the patterns are a starting point, not a destination. We’ve found them useful, but we don’t claim to have found a cure for all concurrency ills. (Concurrency: Patterns for Offline Concurrency Control, pg. 76)

Hopefully that warning is enough to get our hearts racing! My overall takeaway is that concurrency-related bugs are tough to handle as they may not be reproducible in your test environment, but will pop up in production, so put in some thoughtful design and get it right the first time.

Luckily, EAA provides a framework to reason about offline concurrency, and our first choice when handling these problems is committing to one of two patterns:

  1. Optimistic Offline Lock
  2. Pessimistic Offline Lock

And either pattern can be (and usually is) supplemented with the following strategies:

  1. Coarse grained lock
  2. Implicit lock

Note: Getting a solid grasp on these definitions and patterns may feel like a chore, so I created some spaced-repetition flashcards in Anki to help retain and clarify this knowledge (linked below!)

It wasn’t long after reading this section that I was faced with an offline concurrency bug, but my newfound understanding helped me to grasp the issue. It involved the use of Postgres’s pg_advisory_xact_lock, as described in Postgres’s docs (emphasis added):

locks an application-defined resource, which can be identified either by a single 64-bit key value or two 32-bit key values (note that these two key spaces do not overlap). If another session already holds a lock on the same resource identifier, this function will wait until the resource becomes available. The lock is exclusive. Multiple lock requests stack, so that if the same resource is locked three times it must then be unlocked three times to be released for other sessions’ use.

With the key word being exclusive. This means that it’s a pessimistic lock! Although I wasn’t familiar with advisory locks, I was able to recognize the pessimistic locking pattern:

Pessimistic Offline Lock prevents conflicts by avoiding them altogether. It forces a business transaction to acquire a lock on a piece of data before it start to use it, so that, most of the time, once you begin a business transaction you can be pretty sure you’ll complete it without being bounced by concurrency control. (Pessimistic Offline Lock, pg. 427)

Additionally, the Postgres docs go on to describe pg_advisory_xact_lock:

The lock, if acquired, is automatically released at the end of the current transaction and cannot be released explicitly.

Ah yes! The key words here being “cannot be released explicitly”, which reminds me of an Implicit Lock:

The key to any locking scheme is that there are no gaps in its use. Forgetting to write a single line of code that acquires a lock can render an entire offline locking scheme useless… Not releasing locks won’t corrupt your record data, but it will eventually bring productivity to a halt. Because offline concurrency management is difficult to test, such errors might go undetected by all of your test suites. One solution is to not allow developers to make such a mistake… (Implicit Lock, pg. 449)

The simplest rule for releasing locks is to do it when the business transaction completes. Releasing a lock prior to completion might be allowable, depending on your lock type and your intention to use that object again within the transaction. Still, unless you have a very specific reason to release early, such as a particularly nasty system liveliness issue, stick to doing it upon completion of the business transaction. (Pessimistic Offline Lock, 429)

Even though pessimistic offline locking is not something I’ve ever explicitly built into a database application, I felt well-equipped and eager to map this Postgres feature to my understanding of these patterns.

General takeaways and conclusion Link to heading

Having read the narratives and most of the patterns, it’s abundantly clear that there’s lots I don’t know about OOP and application architectures. But I came away learning what I needed to know for now, along with a newfound awareness about my blind spots.

A lot of time is also dedicated to describing complex patterns that you’ll probably never have to implement yourself, but which you’ll encounter in popular open source libraries. Those sections mostly flew over my head, but at least I’ll know where to look when I need to learn more. For example, the Active Record library carries its namesake in the Active Record pattern, and SQLAlchemy is often chosen for it’s Data Mapper implementation. When choosing a library, this knowledge helps us see through the hype by recognizing the pattern and knowing when to use it.

Although this book was published in 2002, the patterns continue to be referenced in modern software. But my main criticism of EAA is that an update would be helpful, as there are some anachronisms that might puzzle a developer reading this today. For example, XML and SOAP are mentioned as common data formats, yet there is no mention of JSON. Even better, updating some of the examples with excerpts from popular open source libraries would really help the reader appreciate them.

I really appreciate how Fowler makes it easy to reference these patterns by providing an online version of the dictionary As a fan of spaced-repetition flashcards, I also created an deck with Anki to help retain this knowledge (I do this with most of the technical books I read, because forgetting what we learn is a terrible thing)

I value these definitions because an important part of patterns is building a common vocabulary, so we can say “this class is a Remote Facade”, and other designers will know what we mean. But one caveat is not to overload ourselves with unnecessary jargon. Afterall, the least experienced person should be able to understand things equally clearly. So the takeaway here isn’t to use these terms as much as possible, but to use them where it makes sense, while also having enough familiarity to explain them simply.

Finally, my biggest takeaway was realizing that Fowler doesn’t actually present us with anything new in EAA. In fact, it’s the opposite - this is a book of (for our industry) old ideas. This book is for someone who values the importance of communicating these ideas with others, which I’ll continue revisiting for years to come.