Inspiration

Somewhere something incredible is waiting to be found.

C++ 11 Atomics and Concurrency Memory Model – Part 1

What is the problem.

Let us assume we have this source code that we want to compile.

// Initially x = y = z = 0
x = 3;
y = 4;
z = x + y;
print z;

What if the compiler during the compilation reorders the instructions into this:

// Listing 2
// Initially x = y = z = 0
y = 4;
x = 3;
z = x + y;
print z;

Does it change the behavior of the program? As you can see both the codes above will result in z = 7. There is no change in the behavior of the program. In fact, it is perfectly valid for the compiler to transform the above code into this:

// Listing 3
z = 7;
print z;

when both x and y are not used anywhere else.

As long as the observable behavior of the program in the single-threaded environment remains unchanged, your compiler is free to reorder or even transform your code for optimization purposes. Not only your compiler think that most of the time your code is slow and stupid, but modern CPU is free to execute the compiled code out of order as well. This behavior remains unnoticed when we write a single-threaded program but becomes problematic in a multi-threaded scenario.

// Listing 4
// Initially payload = flag = 0

payload = 1; // Operation A
flag = 1; // Operation B

while (flag!=1); // Operation C
r1 = payload; // Operation D

In the above code, we want to send a payload from thread 1 to thread 2. First, we set the payload and then we set the flag in thread one. Thread 2 will wait for the flag to become 1 before consuming the payload. The correct final result for r1 should be 1.

Let us assume that all the operations are atomic and all the store is immediately visible to all the CPU (no store buffer and local caches). Is there any problem with the code above?

As I mention before, the compiler and CPU can reorder the instructions. It is perfectly valid to reorder Operation A and Operation B.

// Listing 5
// Initially payload = flag = 0;

flag = 1; // Operation B
payload = 1 // Operation A

while (flag!=1); // Operation C
r1 = payload; // Operation D

Now we can get r1 equal zero when the execution follow this order: B-C-D-A.

Before C++ 11, there is no standard way to solve the problem above. We have to manually insert compiler barriers and CPU specific fences between the instructions in our program. Every CPU have different ordering constraints, for example in x86 CPU cannot reorder a read operation with another read operation so that we can prevent read-read reordering with only compiler barrier. The same thing cannot be said for another CPU, for example, POWER, and ARMv7 may reorder read-read operation. C++11 memory model helps us to write and reason cross-platform lock-free multi-threaded program.

Sequential Consistency

Before we define sequential consistency let us separate between program and execution. Program is the source code that we write. When we say program-order it means the order of the instruction in the source code. If operation A appears before B in program order we call operation A sequence-before operation B. Execution is the actual series of instruction that is executed by the CPU. A program can have multiple possible executions, especially in a multi-threaded environment.

A Sequential consistent execution is an execution that:

1. Executes one instruction at a time. This guarantee is nice to have as we can reason our program as if it runs on single processor.
2. Consistent with the program order of all threads. Consistent with program order means that if operation A appears before operation B in program order then operation A must appears before B in the sequential consistent execution as well.

For example in the source code below:

// Listing 6
// Initially x = y = 0;

x = 1; // A
r1 = y; // B

y = 1; // C
r2 = x; // D

The possible sequential consistent executions are A-B-C-D, A-C-B-D, A-C-D-B, C-A-B-D, C-A-D-B, and C-D-A-B. In the sequential consistent executions, r1 and r2 cannot be both zero. Either one of it must be one. Executions like B-D-A-C is not sequential consistent execution because B appears before A, which is not consistent with the program orders(A-B and C-D). Having our program only execute the sequential consistent execution is easy to reason about, but unfortunately having full sequential consistency memory model is expensive to implement. Instead, C++ offers a slightly weaker memory model called Sequential-Consistency-Data-Race-Free or SCDRF or data-race-free-0 model.

SCDRF

In SCDRF model, we have to categorize each operation either as data operation or synchronize operation.

A sequential consistent execution contains a data-race if it has all these conditions:

1. There are two consecutive operations that access the same memory location from different threads.
2. At least one of them is a write operation.
3. At least one of them is a data operation.
// Listing 7
// Initially x = y = flag = 0
x = 3; // A
r2 = y; // B
flag = 1 // C

if (flag != 1) // D
{
r1 = x; // E
y = 3; // F
}

In the above example, if we categorize every operation as data operation, then A-B-C-D-E-F is one example of an execution that contains data race because C and D come from different thread and C is read operation and both operations are data operation. If we mark operation C and D as synchronize operation now the previous execution that I mention no longer contains any data race. In fact, there is no possible sequential-consistent execution in the program that contains data race anymore.

Note that D-A-E-B-C-F is not a possible execution because D will return false, so E and F will not be executed, the correct execution should be D-A-B-C. It is important to notice that the initial value of each variable is important. If the initial value of flag is one then it is not enough to mark C and D as synchronize operation to remove data races. Now D-A-E-B-C-F becomes possible and it contains data-races, because of A-E.

In SCDRF model:

• If a program has a sequential-consistent execution that contains data race, then the behavior of the program is undefined.
• If a program has no possible sequential-consistent execution that contains data-race, then the behavior of the program is as if it is executed in one of its sequential-consistent executions.

As long as your program is data race free, you can have the illusion of sequential consistency. Note the term illusion, because in SCDRF model compiler and CPU can still reorder things that the actual execution is non-sequential-consistent execution. For example in the example above(Listing 7), the compiler can reorder operation A and B so that the execution becomes B-A-C-D-E-F, and we wouldn’t notice it. We can still have the illusion of sequential consistency in our program.

Any compiler transformation that makes a data-race-free program into a data-race-program is prohibited. For example, the code in thread two in listing 7 can be transformed into this:

// Listing 8
r3 = y;
y = 3;
if (flag == 1) {
y = r3;
}

As you can see, this doesn’t change the behavior of a single-threaded program. But this kind of speculative store in C++11 is illegal because now the initial data-race-free program becomes a data-race program.

Atomics in C++

In C++ every every operation on the atomic takes a memory order argument, the possible memory order is

typedef enum memory_order {
memory_order_relaxed,
memory_order_consume,
memory_order_acquire,
memory_order_release,
memory_order_acq_rel,
memory_order_seq_cst
} memory_order;

The default memory order is memory_order_seq_cst.

atomic<int> x;
x = 0;
// is equivalent to
x.store(0);
// is equivalent to
x.store(0, memory_order_seq_cst);

In this post, we will limit ourselves to memory_order_seq_cst, the default and the strongest memory_order in C++11. I will discuss the weaker memory orders in my next post. If we limit ourself to memory_order_seq_cst then we will have the SCDRF memory model that we discuss before. Every operation on the atomic are considered as synchronize operations and everything else is data operation. So to fix the code in listing 7 in C++11, we can do this:

// Declaration
...
atomic<int> flag = 0;
...
// Initially x = y = flag = 0

x = 3; // A
r2 = y; // B
flag = 1 // C or flag.store(1) or flag.store(1, memory_order_seq_cst)

{
r1 = x; // E
y = 3; // F
}

Conclusion

In this post, we have learned what is considered as high level atomic or the default atomic behavior in C++. In the next post, we will see what is weaker memory models and what is the interaction with this memory model. As soon as we use the weaker memory model, the sequential consistency guarantee is not there anymore. But using a weaker memory model is useful for performance reason.

That’s it for today. Thanks for reading!