Sadraskol

Reading notes: Concurrency, the works of Leslie Lamport

Concurrency, the works of Leslie Lamport edited by Dahlia Malkhi, is published by ACM and freely available on their site. The collection is targetted to large audience to better know the work of ACM Turing award recipients. As the title implies, this book explains the work of Leslie Lamport on concurrency, and tries to convince you this award was deserved.

The book dives quickly into the first major contribution of Leslie Lamport: the mutual exclusion solution of the Bakery algorithm. It mimics the queue of clients waiting at a bakery. Each process has a ticket and only the lowest ticket is given access to the critical part of the code. Lamport proved that the algorithm works for atomic registers, but also for regular registers (more on that later). This algorithm is already a challenge for the mind to understand how difficult it is to find and prove correct concurrent algorithms.

Lamport also provided a definition of what safe, regular and atomic registers meant. When you deal with memory, two operations can be performed: read and write. As explained in this stackoverflow answer, reads and writes can happen at the same time.

Process1 -----wwwww--------------wwwwww------- (w1, w2)
Process2 -------rrrrr----------------rrrr----- (r21, r22)
Process3 -------------rrrrrr---rrrr----------- (r31, r32)

Let's say that the register is reading 0 before any writes, w2 writes 10 and w3 writes 100. The range of this virtual register is 0-255. The possible values returned would be:

ReadAtomicRegularSafe
r210,100,100-255
r31101010
r3210,10010,1000-255
r22if r32 = 100, 100 else 10,10010,1000-255

Note that every none overlapping read returns the latest write, the behavior changes for overlapping reads. For safe registers, overlapping reads can return any value in the range of the register. Regular register will only allow current value or previous value to be returned, without constraints on the order of read. Successive overlapping reads can return incoherent values between one another but not random values. Atomic read are the strongest registers, concurrent reads must respect sequential consistency.

Causal time

Arguably the biggest contribution to computer science by Leslie Lamport is the notion of causal time. At a time when it was not proved that any algorithm could be performed in distributed systems, and the reliance on clocks to synchronize algorithms, the paper "Time, Clocks, and the Ordering of Events in a Distributed System" marked a turning point.

The idea is the following: multiple processes send messages to other processes and perform actions. Each process has his own causal clock, a number increasing for each operation. When sending a message to other processes, the process also send his clock. The receipient will then choose: either his clock is greater, and it keeps on going, or it chooses the clock of the sender and increases it. If every step of an algorithm is performed between messages, we are able to trace which step have followed other steps.

Process1 0--1---------4------------5-------
                     /              \
Process2 0------2---3----------------6-----
               /                      \
Process3 0----1------------2-----------7---

This little graph represent the clock of each process. They all start at 0. When a step of the algorithm is performed, the local clock increases by 1. Every line in between processes represents a message being sent to another process. In this case the local clock is also re-evaluated. If every clock increase is performed by a message, we can trace the origin of every steps. We are not interested in steps between message sending. The main focus is not the real time, but the logic time that links every steps of the algorithm.

This appearing simple concept allows to prove that any algorithm, every state machine can be distributed (given no faults in network/process) and opens the door for future work on distributed state machines.

For instance, you can perform a snapshot of the distributed system, that doesn't really snapshot the state of every processes at a given time ("real" time), but a causal view of the system. Although not precise, this kind of snapshot allow to verify properties of the running system: if the snapshot is in a deadlock, the system is in a deadlock, etc.

Leslie Lamport also defined the Byzantine failure, but I didn't really got this part so i'll skip to the next.

Replication problem

Leslie Lamport provided with one of the most influencial algorithm: the Paxos algorithm. It runs Apache Cassandra, Google Spanner and Bigtable, it influenced a whole family of consensus algorithm, Raft being the latest newborn.

But first let's do some history:

His paper was groundbraking since if no failure happened, strong consistency and replication could be proved possible!

After some definition of Agreement, Completion, Asynchrony, Partial Synchrony, Synchrony, Crash Failure, Fail stop, Total Ordering Protocols and Consensus Protocols, the book dives into some properties of the systems (I'll let you read the book to understand all those terms). One thing to note is that for f failures, consensus can be found for 2f + 1 participants.

The first early consensus protocol was published in 1983 by Micheal Ben-Or. It is described as such in the book:

  1. For every N processes p set an estimate e(p) to Red or Blue randomly, they also set their round r(p) = 0

Phase 1:

  • every process broadcasts a message of type (Phase1, r(p), e(p))
  • wait to receive N/2 phase 1 messages of round r(p)
  • if every message has the same estimate e, choose v(p) = e
  • else v(p) = _|_ (bottom or null for instance)

Phase 2:

  • every process broadcasts a message of type (Phase2, r(p), v(p))
  • wait to receive N/2 phase 2 messages of round r(p)
  • if every message has the same value v (!= _|_), decide e(p) = v
  • else if a message has a non bottom value v, e(p) = v
  • else choose a random Red or Blue estimate e(p) = Red | Blue

The two phase algorithm allows to reduce the number of processes, as the single phase variant protocol would require 3f + 1 processes to guarantee the same properties. This algorithm is C-valid, C-agreement and completes with a probability 1 (like the probability of getting head when tossing a coin for an infinite amount of times).

Although this is a theoretical only algorithm, it can be extended to other consensus algorithms.

In 1984, the Dwork, Lynch & Stockmeyer (DLS) introduces the notion of leader to simplify the complexity of communication. The alternative presented in the book is called the Franken algorithm. It only works for Partial Synchrony, while Ben-Or worked for Asynchrony systems.

DLS is still a theoretical algorithm, but it is an improvement over Ben-Or since any value can be used as estimates.

Finally in 1989, Lamport published the first paper on the first Paxos consensus algorithm. Instead of using O(N²) messages by rounds, it would only use O(N) messages, making it the first possibly interesting algorithm for the industry. It uses a leader and is also aimed at Partial Synchrony systems.

  1. set e(p) = _|_, r(p) = 1 and T(p) > 0

Phase 1a.

  • if p is leader of r(p) broadcast (Phase1a, r(p))
  • wait for message of type Phase1a, with r >= r(p), then set r(p) = r

Phase 1b.

  • send (Phase1b, r(p), e(p)) to leader of r(p)
  • if p leader of r(p):
  • wait for N/2 messages of Phase1b with E = set of estimates received,
  • if there is an non bottom estimate in E, v(p) = max(E) else use the initial estimate of p

Phase 2a.

  • if p is leader of r(p) broadcast (Phase2a, r(p), e = (v(p), r(p)))
  • wait for message of type Phase2a, and accept e(p) = e

Phase 2b.

  • send (Phase2b, r(p)) to leader of r(p)
  • if p is leader of r(p):
  • wait for N/2 messages from Phase2b, then decide the proposal e(p) and broadcast to all replicas

Optional Phase 3

  • if a p is not leader of r(p) then upon timeout T(p) of not receiving from the leader:
  • increase r(p) for which p is leader
  • double T(p) and go back to phase 1a.

Leaders of rounds are choosen by a simple mecanism like p is leader of rounds r mod N = p. Although I had a pretty clear picture in my head of previous algorithms I am not sure I understand how Paxos could work and be complete. The tricky part to me is understanding when the leader does not respond, how the new leader is choosen.

Paxos has opened a whole family of consensus algorithms that were successfully implemented in distributed systems thoughout the industry, but also inspired more recent Raft.

The weakness of Paxos algorithm is dynamic configuration. Changing the processes configuration have tricky corner-cases and need special care.

Formal verification

Leslie Lamport has a mathematician background which influences his work in the sense that he does not focus on language specific syntax in elegant algorithms. He tries to study systems as a whole and properties that are common to every technology. This makes his work applicable to memory, multiprocesses or distributed systems alike.

He coined the terms "Safety" (nothing bad ever happen in a system) and "Liveness" (Something good eventually happens). These were later defined.

In the 70s there was work on temporal logic, in the 80s Lamport worked on temporal logic to try coming up with a formal method specific for distributed problems and finally in the early 90s he put the foundation of TLA (Temporal Logic Action). This is the subject of my next book so I didn't get into this part of the book.

The rest of the book consists of a collection of groundbraking papers proposed by Leslie Lamport. I didn't read that either.