Concurrency review : umbrant

Concurrency review

July 8, 2011

I assume that everyone has already read Andrew Birrell's seminal paper on "Programming with Threads" or at least has a basic conception of parallel programming. This is going to deal with locking and concurrency at a higher level. At-bat today are five selected papers on concurrency:

"Granularity of Locks and Degrees of Consistency in a Shared Data Base", by Gray et al., 1975
"Experience with Processes and Monitors in Mesa", Lampson and Redell, 1980
"On Optimistic Methods for Concurrency Control", Kung and Robinson, 1981
"Threads and Input/Output in the Synthesis Kernel", Massalin and Pa, 1989
"Concurrency Control Performance Modeling: Alternatives and Implications", Agrawal, Carey and Livny, 1987

Background

I know I said I expected Birrell's paper as base knowledge, but here's a TLDR that might let you skip it.

The need for locking is derived from the concurrent reader/writer problem. It's safe for multiple threads to be reading the same data at the same time, but it's not safe to read or write while someone else is writing since you can get corrupted results. This requires the idea of the reader-writer lock, which allows any number of concurrent readers, but will make sure that any writer gets exclusive access (i.e. no other readers or writers are accessing the protected data). This is also called shared/exclusive locking, and is an especially common construct in parallel programming.

Granularity of Locks

The important takeaway from this Jim Gray paper is the idea of hierarchal locking, where locking a database table also locks all the rows and row fields in that table. This hierarchal structure allows locking at an almost arbitrary granularity depending on the needs of the executing query, which ameliorates the issues that can happen with too-fine-grained locking (excessive lock overhead from doing lots of acquisitions and releases) or too-coarse-grained locking (poor concurrency from unnecessary lock contention).

This scheme applies to exclusive (X) locks used for writes as well as share (S) locks used for reads, but also requires the introduction of a third lock type: intention locks. Intention locks are used to indicate in an ancestor node that one of its children has been locked, preventing another query from falsely locking the ancestor, and thus, the child as well. This is refined to having both a intention share lock (IS) and an intention exclusive lock (IX) to allow concurrent reads, since intention share locks are compatible. Exclusive intention locks are also compatible, since they still have to ultimately exclusively lock the child they want to modify. Queries are required to leave a breadcrumb trail of correct intention locks behind as they traverse toward what they ultimately want to access. Locks also must be released in leaf-to-root order, so the locking hierarchy remains consistent.

One more intention lock type is introduced for yet better concurrency: share and intention exclusive locks (SIX). This is interpreted as "read-only access to a subtree, exclusively-locking some to be written". This is necessary because normally you can't have concurrent read/writes (cannot just first acquire the share lock and then an intention exclusive lock since they're incompatible), but since these rights are being granted to the same query, it can be depended upon not to read a row that it's currently writing. This read-modify-write behavior for a subtree is super common in databases, which is why SIX locks are important.

Table 1 on page 5 of the paper is a concise rundown of what locks are compatible with each other. It might be a nice exercise to work through (the lock types being null, IS, IX, S, SIX, X).

The rest of the paper seems less relevant. Gray et al. explain how this can be applied to a more dynamic graph based on locking ranges of mutable indexes, with the same strategy. I don't think this works for multiple indexes, since then the hierarchy DAG is no longer a DAG. They also cover the idea of "degrees of consistency" with tradeoffs between performance (concurrency) and recovery (consistency). I don't think real-world databases use anything except the highest degree of consistency, since the idea of unrecoverable, non-serializable transactions isn't pleasant. Anything with eventual consistency (looking at you, NoSQL) has made this tradeoff.

Experience with Processes and Monitors in Mesa

This paper specifies a new parallel programming language, Mesa, used to implement a new operating system, Pilot. Introducing a new language and OS at the same time is pretty common, and is how we arrived at C and Unix. This is where the idea of "Mesa semantics" for monitors came from (compared to Hoare semantics). To put the work in proper context, apparently in 1979 one had to justify why a preemptive scheduler is required over a non-preemptive design even assuming a uniprocessor (the obvious reason being interrupt handling).

Mesa is kind of a neat language, in that any procedure can be easily forked off as a new process, and processes are first class values in the language and can be treated like any other value. This isn't to say that everything is expected to be able to run concurrently, just that the FORK language construct is easy to apply. The core organizational construct in Mesa is the "module" or the "monitor module". This is basically a way of logically organizing procedures, and specifying which of them need to acquire the monitor lock as part of execution.

This is also where "Mesa semantics" come in. Instead of immediately switching to a waiting process on a signal, the signaller continues running. This seems like a great win, since although it means slightly different semantics to the program, it also means fewer context switches.

The paper goes on to describe more about monitors and the implementation.

On Optimistic Methods for Concurrency Control

When I read this paper for 262A, it was a big eye-opener. I felt that the idea wouldn't hold up in real usage (and I think that this is true, except in the specific situations noted), but it was a refreshing approach to handling concurrency I had never thought of.

The idea behind "optimistic concurrency" is doing away with locking, and instead doing checking at the end before commit to see if there are any conflicts from concurrent queries, and aborting if so. In this way, even if incorrect query results are generated along the way, they are not externalized. This is speculative and will result in lots of aborted transactions (and thus wasted work) under write-heavy workloads, but as the paper says, this works wonderfully for read-heavy workloads where it's unlikely to have a conflict.

The motivation here is that locking often imposes unnecessary overhead, and can complicate things. In a locking scheme, even read-only queries need to lock rows even though they aren't modifying the data, just to indicate that the reads is happening. All this checking and verifying adds up, increasing complexity of the system, and leading to potential deadlocks which have to be resolved through a deadlock-free scheme, or deadlock detection and abort. Locking also can operate at too coarse a granularity; imagine the root node in a hierarchical locking scheme as described by Gray et al., it's basically constantly under lock contention, often times unnecessarily since queries are not necessarily operating on the same subtrees.

This is implemented by having two-phase transactions, which goes read phase, validate, then write phase. In the read phase, the transaction gathers up the names of all the objects it needs to read, defining a read set. It then validates whether the transaction T_j is serializable, checking to make sure that for all prior transactions T_i one of the following is true:

T_i completes its write phase before T_j starts its read phase. T_i comes entirely before T_j, so it's fine.
The write set of T_i does not intersect with the read set of T_j, and the T_i finishes writing before T_j starts writing. As long as there's no intersection, T_j's reads are safe, and as long as T_i finishes writing before T_j starts, T_j will not be overwritten by T_i.
The write set of T_i does not intersect the read or write set of T_j, and T_i finishes reading before T_j starts writing. Similar to the previous, no intersection with the read or write set makes T_j very safe, and T_i needs to finish reading before T_j starts writing to protect T_i's read set.

This means that validation needs to check the write sets of all transactions that had finished reading but not finished writing (conditions 2 and 3). This poses an issue for long-running transactions, since the validator might be expected to keep around write-sets almost indefinitely. The proposed answer is to abort and restart the transaction, which leads to the question of how to deal with transactions that repeatedly fail validation. This is answered by write-locking the entire database and letting the "starving" transaction run to completion in isolation. This isn't great at all, but the assumption is that both long-running transactions and repeat-failures are rare.

The evaluation in this paper is kind of spotty. It's purely theoretical, and they chose to do their analysis on a B-tree, which is one of the better (though also common) situations because of the high fanout and low depth leading to lock contention on root nodes. They also assume a uniformly random distribution for accesses which is probably untrue (accesses are normally temporally correlated, which is why LRU caching works).

All-in-all, this is the system design maxim of "optimize for the common case" taken to the extreme. The common case of no-conflict transactions will be faster with optimistic concurrency control, but it'll collapse under load a lot worse. Like the authors say, it's only for situations where transaction conflict is rare.

Threads and Input/Output in the Synthesis Kernel

This seems to be a more meta paper, where optimistic concurrency and lock-avoiding techniques were applied to an OS to improve performance. It's also chock-full of system-specific jargon, which I will kindly avoid introducing. Honestly, most of what's laid out in the paper feels like a bunch of small optimizations for a specialized kernel that add up to something that performs demonstrable better than SunOS. Some of these techniques might be translatable back to Unix-y implementations, some of it is unique to the system (runtime optimization of syscalls?), and some of it is because it's a special-purpose kernel. I like how the references are to the SunOS source code, GEB (yes, the Hofstadter book), 3 of the author's own papers, and then two external. Certainly a different time.

The takeaways here are unclear. The conclusion of "avoid synchronization when possible" seems hardly novel, and it feels too much like they implemented to optimize their microbenchmarks (no real apps were written).

Concurrency Control Performance Modeling

This paper does a deep comparison between three concurrency control algorithms: blocking locks, immediate restart on lock contention, and optimistic concurrency. I really love papers like this one, since they take a bunch of different algorithms that all tested well under different model assumptions, carefully dismantle said assumptions, and reveal real truths with their own meticulous performance model. It really demonstrates the authors' complete understanding of the problem at hand.

There are a number of model parameters that are crucial to performance here. The database system model specifies the physical hardware (CPUs and disks), associated schedulers, characteristics of the DB (size and granularity), load control mechanisms, and the concurrency control algorithm itself. The user model specifies the arrival process for users, and the type of transactions (batch or interactive). The transaction model specifies the storage access pattern of a transaction, as well as its computational requirements (expressed in terms of probabilities).

I consider this to be about as complete as possible. They ignore I/O patterns and cache behavior, but those are just damn hard to model. Using a Poisson distribution for transaction inter-arrival rates is canonical without evidence to disprove it ( see "On the Self-similar Nature of Ethernet Traffic" by Leland et al. for a situation where Poisson does not hold so true). They also do not take into account processing time spent on the concurrency control algo itself, which feels like a slight copout since I think this means they ignore lock overhead and use a completely granular locking system (not hierarchical locking), which disfavors optimistic concurrency. This is implementation specific and a lot of additional work to add to the model, and considering there's some prior work showing that the costs are roughly comparable and negligible compared to access costs, I'm willing to let it go.

The interesting part comes when they manipulate all the model parameters, and explain how different papers arrived at their different performance results. Basically, under the assumption of infinite resources, optimistic concurrency does splendidly as the level of multiprogramming increases, since locking strategies run into contention and transactions get blocked up. Optimistic transactions still face more conflicts and have to be restarted, but since there the pipeline is always full of running transactions (none are just blocked and using up a queue slot while doing no work), overall throughput continues to increase. Immediate-restart reaches an interesting performance plateau, due to its scheme of trying to match the rate of transaction completion with the rate of re-introducing restarted transactions. This was the model used in a number of prior papers.

Introducing a very resource limited situation turns things sharply in favor of blocking algos. Blocking performs much better until very high levels of multiprogramming, immediate-restart hits the same plateau for the same reason, and optimistic concurrency performs linearly worse beyond a very small multiprogramming level. Basically, every optimistic conflict detected at the end of a transaction just wasted all of the resources used; immediate restart does better since it will restart if it detects a conflict midway, and also delays restarts to match the completion rate.

Increasing the number of resources begins to favor optimistic concurrency again, but the price/performance isn't there since doubling the # of resources does not lead to a doubling in performance. They do a few more different situations, examining different workloads and model assumptions, which you can read yourself if you want to know more.

Basically, it's hard to make general statements about performance; things are dependent on your model. It seems that for most real-world use cases though (limited resources, high utilization), blocking is the concurrency control method of choice. It's also important to carefully control the level of multiprogramming for optimal throughput, since performance tends to peak and then decline as things thrash.

I also just find it really cool that they explained the performance results of a lot of previous papers within the scope of their own model, basically saying that no one was wrong, just incomplete in their analysis.