When business rules become technical problems

21 August 2020

Sometimes business rules are easy to implement. Let's say a PM comes to me and asks "make sure the price of a product never goes below zero". Piece of cake:

public void save(Product product) {
  if (product.price < 0) {
    throw new IllegalArgumentException(String.format(
      "tried to save product %s with negative price %d",
      product.id,
      product.price
    ));
  }
  myOrm.save(product);
}

You disagree on the exception being used as control flow? Replace it with your favorite way of dealing with conditions. Can one save a negative price product with this code? Is this code correct?

It is not as obvious as it seems. In a fully concurrent application, the reference to product could be shared by multiple threads. Another thread could change the price of the product after the check and before it is saved by the current thread leading to wrong behavior.

We do not consider this kind of scenario in DDD approaches or blog posts. To answer to an HTTP requests, threads do not share references to domain objects. They would rather share references to the database pool. Each request fetches the domain object from the database, applies changes and saves the result. Since each request creates a different object for the same Entity (uniquely identified domain concept), the code above solves the business need to refuse negative prices.

Here's an example of an execution of two requests at the same time:

sequence diagram of multiple user changing the price at the same time — In purple the product at its current price, in green the product with price 14.

In any order of execution, user B will never be able to save the product at a negative price. As long as the database acts like a regular register (see my notes on Concurrency for a definition), the business rule is correctly solved.

For this kind of business needs, the architecture of systems makes our code safe. This technical detail provides a safe environment for our code. This environment sparks conversations around the best way to express our business rules with code. These conversations lead to DDD practices (it's a bit far fetched, bear with me for a moment).

Is domain purity possible?

When all business rules are expressed in a single place in the code, some call that domain purity. In this article, the author insists on the distinction between completeness and purity. This is not what I'll discuss here. I want to focus on the example used in the article.

I have a problem with this example. I tweeted about it:

If you consider isolation, all implementations fail to check uniqueness in a concurrent environment, even with the in-memory database... This is a very good example of technical constraints influencing domain code. Could you find a correct way to check uniqueness? @sadraskol - August 10, 2020

I think this tweet is too provocative to explain the kind of problem it points out. Since tweets are too short for an explanation, let's dig into it here.

Database isolation

Let's consider Postgres. By default, when running a transaction, it's under Read Committed isolation. At this level of isolation "a SELECT query sees a snapshot of the database as of the instant the query begins to run". Note that the snapshot is before the query, not before the transaction.

It means that two concurrent transactions can run without hindering each other. Let's run the example in a Read Committed transaction in the worst case:

Database has no user
User A opens a transaction
User A checks for email "a@a.com"
User B opens a transaction
User B checks for email "a@a.com"
Database saves user A with email "a@a.com"
Database saves user B with email "a@a.com"

This behavior can be checked using a formal specification. In TLA+, this would look like that:

\* We consider 3 accounts that can choose from 3 distinct emails
Users == { "tom", "nas", "hyp" }
Emails == { "a", "b", "c" }
Nil == "Nil"

VARIABLE possibleEmails, chosenEmails, userRegistered, checkedEmails
vars == << possibleEmails, chosenEmails, userRegistered, checkedEmails >>

\* At first no user registered
Init == /\ chosenEmails = << >>
        /\ userRegistered = << >>
        /\ possibleEmails = Emails
        /\ checkedEmails = [ u \in Users |-> Nil ]

UserRegisteredYet(user) == \A u \in DOMAIN userRegistered : userRegistered[u] /= user

CheckEmailUniqueness(user, email) == /\ email \in possibleEmails
                                     /\ checkedEmails[user] = Nil
                                     /\ checkedEmails' = [ checkedEmails EXCEPT ![user] = email ]
                                     /\ UNCHANGED << possibleEmails, chosenEmails, userRegistered >>

EmailChecked(user, email) == checkedEmails[user] = email

ChooseEmail(user, email) == /\ UserRegisteredYet(user)
                            /\ EmailChecked(user, email)
                            /\ chosenEmails' = Append(chosenEmails, email)
                            /\ userRegistered' = Append(userRegistered, user)
                            /\ possibleEmails' = possibleEmails \ { email }
                            /\ UNCHANGED << checkedEmails >>

NoEmailToChoose == possibleEmails = {} /\ UNCHANGED vars

EmailChecking == \E u \in Users: /\ \E e \in possibleEmails: CheckEmailUniqueness(u, e)

Range(f) == { f[x]: x \in DOMAIN f }

UserRegistering == \E u \in Users: /\ \E e \in Range(checkedEmails): ChooseEmail(u, e)

Next == \/ EmailChecking \/ UserRegistering \/ NoEmailToChoose

Spec == /\ Init
        /\ [][Next]_vars
        /\ WF_vars(UserRegistering)
        /\ WF_vars(EmailChecking)

\* Invariants
EveryChosenEmailMustBeUnique ==
        \/ Len(userRegistered) = 0
        \/ \A email \in Emails:
            Len(SelectSeq(chosenEmails, LAMBDA x: x = email)) <= 1

EveryRegisteredEmailHasARegisteredAccount == Len(userRegistered) = Len(chosenEmails)

\* Property: here we check for liveness of our implementation,
\* that is that at least one email is chosen by account
EventuallyAtLeastAnEmailIsChoosen == <>(Len(chosenEmails) >= 1)

The model checker of TLA+, TLC, will find the same example as previously described:

TLC finds a counter example for our specification

If you think I wrote this article only to practice my TLA+ skills... You're right! But let's not stop there.

Finding an elegant solution

There are multiple solutions to that problem.

The first one would be to set the transaction level of your database to Serializable, the highest level of isolation. It guarantees that every transaction is serializable: you can find a linear execution of all transactions. It simulates transactions being run sequentially in a single thread. This has a lot of performance drawbacks, so my team was shy to try it. If you use this isolation in production, I'd be curious of your experience with it!

Another solution would be to set up a lock on the table for unicity. You can use SELECT ... FOR UPDATE to lock the rows of the table. It protects against concurrent conflict updates in the same row. Since it's a one liner, it's easy to implement in your current code.

Serializable isolation and locks are good solutions but they can be quite expensive. If a transaction takes a long time, it slows down every request waiting for the lock. If misused, you can find yourself in deadlocks or having poor performances. There is a simpler solution: unique constraints. Constraints are checked within the database. Unique constraints are tailored for unicity check.

Using the database constraints to check unicity means moving some of the business needs outside your application code. This is not coherent to an approach that centralized all rules in one place. There's an easy mitigation to this problem. Since the repository interface is defined by the domain, we can add this case to the domain:

interface UserRepository {
  void save(User user) throws EmailAlreadyTakenException
}

Every caller will have to define a behavior in case the exception is thrown. Once again exception is not the best control flow ¯\_(ツ)_/¯ Use the control flow that fits your code style.

End notes

It's always better to express code in pure domain code: checking rules in pure memory is much simpler than mixing Database calls with logic. But sometimes pure memory code will not cut it. Concurrent problems, unicity across systems, partial failures, time related tasks are difficult to check without battle tested technical solutions. I would recommend retrofitting technical constraints in the domain code. It's safer and would lead to less painful bugs rather than depending on an unadapted architecture.

ps: race condition, race condition is the missing word of this post. It took me a week to figure it out...