Analysis of atomic operation principle

Concept

Atomic operations are operations that are not interrupted , That is, it is the smallest execution unit . The simplest atomic operations are assembly instructions ( Not including some pseudo instructions , Pseudo instructions are interpreted by the assembler as multiple assembly instructions ). stay linux The data structure corresponding to atomic operations in is atomic_t, The definition is as follows :

typedef struct {int counter;} atomic_t;

It's essentially an integer variable , The reason why such a data type is defined , It's to make atomic operators accept only atomic_t Operands of type , If it's not atomic_t Type data , It won't pass at the compile stage ; Another reason is to ensure that the compiler does not optimize access to the corresponding values , Make sure that all access to it is memory access , Instead of accessing registers .

Assignment operation

ARM The processor has instructions to assign memory addresses directly (STR).

#define atomic_set(v,i) (((v)->counter) = (i))

Read operations

use volatile To prevent the compiler from optimizing access to variables , Make sure it's memory access , Instead of accessing registers .

#define atomic_read(v) (*(volatile int *)&(v)->counter)

Add operation

Use the exclusive instruction to complete the accumulation operation .

static inline void atomic_add(int i, atomic_t *v){unsigned long tmp;int result;//  Read with exclusive instructions , Then perform the add operation , Rerun when exclusive write fails __asm__ __volatile__("@ atomic_add\n""1: ldrex %0, [%3]\n"" add %0, %0, %4\n"" strex %1, %0, [%3]\n"" teq %1, #0\n"" bne 1b": "=&r" (result), "=&r" (tmp), "+Qo" (v->counter): "r" (&v->counter), "Ir" (i): "cc");}

Reduction of operating

Comparing the code of add operation and subtract operation, we can see that , They're very similar , In fact, the only difference is , So now the macro definition has been used in the latest kernel source code ATOMIC_OP(op, c_op, asm_op) To rewrite this part of the code .

static inline void atomic_sub(int i, atomic_t *v){unsigned long tmp;int result;//  Read with exclusive instructions , Then perform the subtraction operation , Rerun when exclusive write fails
__asm__ __volatile__("@ atomic_sub\n""1: ldrex %0, [%3]\n"" sub %0, %0, %4\n"" strex %1, %0, [%3]\n"" teq %1, #0\n"" bne 1b": "=&r" (result), "=&r" (tmp), "+Qo" (v->counter): "r" (&v->counter), "Ir" (i): "cc");}

Other operating

There are other similar atomic manipulation functions , such as atomic_XXX_return、atomic_cmpxchg、atomic_clear_mask, And on this basis atomic_inc、atomic_dec、atomic_XXX_and_test、atomic_XXX_return etc. . The above code is for SMP The implementation of the processor , For non SMP processor , Because there is no other core preemption , So you just need to prevent other processes from preempting to implement atomic operations , For example, add operation :

static inline int atomic_sub_return(int i, atomic_t *v){unsigned long flags;int val;//  Prevent other processes from interrupting code execution by closing interrupts raw_local_irq_save(flags);
val = v->counter;
v->counter = val -= i;//  Restore the original state of the interrupt raw_local_irq_restore(flags);return val;}

ldrex and strex Instructions

stay Armv6 Start supporting multicore , adopt ldrex And strex Instructions to ensure the atomicity of data operations , For example, the lock operation of spin lock 、 Atomic variable operation, etc . stay Armv6 Before , It's all mononuclear , To ensure the atomicity of data , Need to turn off interrupt operation . For multi-core platforms , The interrupt operation can only turn off the local interrupt , To do atomic operations on data , Must use ldrex Command and strex.

Exclusive load and store registers .

grammar

LDREX{cond} Rt, [Rn {, #offset}]STREX{cond} Rd, Rt, [Rn {, #offset}]LDREXB{cond} Rt, [Rn]         Byte loading
STREXB{cond} Rd, Rt, [Rn]       Byte storage
LDREXH{cond} Rt, [Rn]         Half word loading
STREXH{cond} Rd, Rt, [Rn]       Halfword storage
LDREXD{cond} Rt, Rt2, [Rn]       Double word loading
STREXD{cond} Rd, Rt, Rt2, [Rn]     Double word storage 

among :

cond
Is an optional condition code ( See conditional execution ).
Rd
Is the target register to store the return status .
Rt
Is the register to load or store .
Rt2
The second register used for double word loading or storage .
Rn
Is the register on which the memory address is based .
offset
To apply to  Rn  The optional offset of the value in .offset  Can only be used for  Thumb-2  In the instruction . If omitted  offset, The offset is considered to be  0.

LDREX

LDREX Data can be loaded from memory .

If the physical address has a share TLB attribute , be LDREX Marks the physical address as exclusive access to the current processor , And will clear any exclusive access flag of the processor to any other physical address .

otherwise , Will mark : The execution processor has marked a physical address , But the visit is not over yet .

STREX

STREX It can store data in memory under certain conditions . The conditions are as follows :

If the physical address is not shared TLB attribute , And the execution processor has a physical address marked but not yet accessed , Then there will be storage , Clear the mark , And in Rd Medium return value 0.

If the physical address is not shared TLB attribute , And the execution processor has no physical address that has been marked but has not yet been accessed , Then there will be no storage , And will be in Rd Medium return value 1.

If the physical address has a share TLB attribute , And has been marked for exclusive access by the execution processor , Then it will be stored , Clear the mark , And in Rd Medium return value 0.

If the physical address has a share TLB attribute , But not marked for exclusive access by the execution processor , Then there will be no storage , And will be in Rd Medium return value 1.

Limit

r15 Not for Rd、Rt、Rt2 or Rn Any one of them .
about STREX,Rd It must not be with Rt、Rt2 or Rn For the same register .
about ARM Instructions :

Rt  It must be an even numbered register , And cannot be  r14
Rt2  It has to be for R(t+1) Not allowed  offset.

about Thumb Instructions :

r13  Not for  Rd、Rt  or Rt2  Any one of them
about  LDREXD,Rt  and Rt2  Cannot be the same register
offset  The value of can be  0-1020  Within the scope of  4  Any multiple of .

usage

utilize LDREX and STREX Interprocess communication can be implemented before multiple processors and shared memory systems .

For performance reasons , Please put the corresponding LDREX Instructions and STREX Minimize the number of instructions between instructions .

Note

STREX The address used in the instruction must be the same as the one that has been executed most recently LDREX The instructions use the same address . If you use a different address , be STREX The execution result of the instruction will be unpredictable .

Example

  MOV r1, #0x1   ; load the ‘lock taken’ value
try
 LDREX r0, [LockAddr]   ; load the lock value
 CMP r0, #0         ; is the lock free?
 STREXEQ r0, r1, [LockAddr]  ; try and claim the lock
 CMPEQ r0, #0        ; did this succeed?
 BNE try          ; no – try again....                                        ; yes – we have the lock

ldrex Use of

ldrex And strex The combination of instructions is really powerful , For some old multicore processors , Need to lock the bus to ensure the data atomic operation ( Such as x86), This leads to less efficient access . and ldrex and strex There is no lock bus operation , And you can perform complex operations on variables between two instructions , It's not just adding 1 reduce 1 operation . But also need to pay attention to ldrex And strex For the instructions to work properly , There is also a premise , The author is developing ti 66AH2H(4 nucleus cortex-A15) platform BSP when , I met this problem , Let's take a look at ARM The manual is useful for ldrex Instructions related to instructions :

[ picture .png]

The above content comes from ARM Official documents (DDI0438I_cortex_a15_r4p0_trm.pdf), The main meaning is for ldrex and strex Command support requires global monitor, and global monitor There are two ways to implement , One is internal implementation , Need to open cache. The other is external implementation ( It's about chips , Some chips are not implemented ), By way of bus monitoring . When both internal and external are implemented , Give priority to internal global monitor. To be on the safe side , Try not to turn it on cache Before using ldrex and strex Instructions .