Atomic locking in Intel® 64 and IA-32 architectures

The Intel® 64 and IA-32 architecture guarantees atomicity if one of the following conditions are met

  • Reading or writing a byte
  • Reading or writing a word aligned on a 16-bit boundary
  • Reading or writing a double word aligned on a 32-bit boundary
So what does this mean ? It means that if we are clever about it, we can implement a simple lock using atomic reading and writing of a single byte.

The following function will use the XCHG instruction to swap two registers atomically, so all we have to do is give it the address to our locking variable and check the return value. If the value returned from the function is LOCKED it means that the thread was unable to acquire the lock. The beauty about this instruction is that if it is used to reference one general purpose register and a memory address, the processor bus locking protocol will be invoked causing the operation to become atomic.

The inline assembly statement might look daunting at first, but it is not really that complicated.

Per definition the structure of such an inline extended assembly statement is like this:

       asm ( assembler template 
           : output operands
           : input operands
           : list of clobbered registers
           );

The use of the operands is optional, and we do not use and clobbered registers in this statement, we trust the compiler to figure that out.

As you can see, we provide one output operand and two input operands. The output operand is the variable where we will store the previous value of the lock, and the variable that the caller uses to check if the lock was successfully acquired.

In input and output operands we can specify various constraints:

  • As you look at the “0″(value) operand, the “0″ is the constraint telling it that the value of this input operand is ONLY allowed to be stored in the 0th output variable, which is the output variable value.
  • The “a” constraint of address specifies that the value should be stored in the EAX register.
  • The “=b” constraint of the output operand specifies that this value should be read from EBX, and the operand is write-only.
#define LOCKED   = 1
#define UNLOCKED = 0
 
static int test_and_set(int *address)
{
	int value = LOCKED;
	asm volatile ("xchg (%%eax), %%ebx"
		      :"=b" (value)
		      :"0"(value), "a"(address));
	return value;
}

So there we have it, essentially this says “Take the memory address given in the address variable and store the value given in the value variable in it, while putting the previous content of that memory address into the value variable when done.”

When the caller checks the returned value, if the value is LOCKED it will know that the lock was already taken, and if the value is UNLOCKED it will know that the previous value was unlocked, thus the locked has now been taken.

IBM Active Protection System


Playing around on my thinkpad, I discovered that the HDAPS system can be used to do more than just protect your hard drive in case of a fall.

The following application displays a 3D model of a cube representing the monitored laptop, a server running on that laptop is sending real-time updates using information it gets from the IBM Hard Drive Active Protection System (HDAPS). This allows a remote client to monitor the movement of the laptop. The application itself is not very impressive, but I included a screenshot anyway.

Implementing statistical sampling into the Atlas TDAQ Network

This is my bachelor project while working in the Atlas TDAQ Networking Group, which is a part of the ATLAS experiment on the Large Hadron Collider located at CERN.

The ATLAS data acquisition system consists of four different networks interconnecting up to 2000 processors using up to 200 edge switches and five multi-blade chassis devices. For performance monitoring and troubleshooting purposes there was an imperative need to identify and quantify single traffic flows. sFlow is an industry standard based on statistical sampling which attempts to provide a solution to this.

Due to the size of the ATLAS network, the collection and analysis of the sFlow data from all devices generates a data handling problem of its own.

This report describes how this problem is addressed by developing a system that makes it possible to collect and store data either centrally or distributed according to need, the methods used to present the results in a relevant fashion for system analysts are discussed and we explore the possibilities and limitations of this diagnostic tool, giving some examples of its use in solving system problems that arise during the ATLAS data taking.