A Look at Go Locks

Let's take a look at how GO implements locks in the actual runtime. After Go's 6g compiler generates x86 code (in file.6), it calls into Go's runtime, which is linked in once a 6l to finally generate a binary executable. For example, any Go routine finally calls the scheduler, which locks the scheduler object, and executes the Go routine. How are basic locks implemented in Go? Let's look at the lock() routine:

   1:  runtime::lock(Lock *l)
   2:  {
   3:      if(m->locks < 0)
   4:          runtime::throw("lock count");
   5:      m->locks++;
   6:   
   7:      if(runtime::xadd(&l->key, 1) > 1) {    // someone else has it; wait
   8:          // Allocate semaphore if needed.
   9:          if(l->sema == 0)
  10:              initsema(&l->sema);
  11:          runtime::mach_semacquire(l->sema);
  12:      }
  13:  }

First off, Go uses semaphores as the global locking scheme. The number of semaphores depends on the number of CPUs on the host machine. Let's take a look at the Lock data structure:

   1:  struct    Lock
   2:  {
   3:      uint32    key;
   4:      uint32    sema;    
   5:  };

The Lock data structure just keeps count of the number of available resources. Next Lock() attempts to grab the semaphore via a call to xadd(). Let's see what xadd does:

   1:  asm.s
   2:  // uint32 xadd(uint32 volatile *val, int32 delta)
   3:  // Atomically:
   4:  //    *val += delta;
   5:  //    return *val;
   6:  TEXT runtime::xadd(SB), 7, $0
   7:      MOVQ    8(SP), BX
   8:      MOVL    16(SP), AX
   9:      MOVL    AX, CX
  10:      LOCK
  11:      XADDL    AX, 0(BX)
  12:      ADDL    CX, AX
  13:      RET

Great, now we can finally trace the call stack to see what's actually happening down at the machine level. Amazing that they hand wrote the assembly. There are two interesting instructions here: LOCK and XADDL. How do these things work? From the Intel assembly manual:

LOCK - Causes the processor’s LOCK# signal to be asserted during execution of the accompanying instruction (turns the instruction into an atomic instruction). In a multi-processor environment, the LOCK# signal ensures that the processor has exclusive use of any shared memory while the signal is asserted.

So the LOCK instruction will atomically lock the next instruction, which in this case is the XADDL. Semaphores have counters based on the number of available resources. LOCK will make the XADDL atomic. Conceptually, we are locking the semaphore counter, then atomically adds a number to it.

XADDL - Exchanges the first operand (destination operand) with the second operand (source operand), then loads the sum of the two values into the destination operand. The destination operand can be a register or a memory location; the source operand is a register.

XADDL increments the semaphore. Notice that the hand written assembly code actually ensures that the add occurs on the AX register, which is where the return value must be. The only question is why are we adding a number instead of subtracting? If nobody has the lock, the value in l->key should be 0. Adding 1 to it means we can grab the key (1 > 1 test fails). If the value is greater than 1, it means multiple processes are waiting for the resource. It doesn't seem to be asking the question, how many resources are left? Instead it's asking how many processes are waiting on me?

   1:  thread.c
   2:  void
   3:  runtime::mach_semacquire(uint32 sem)
   4:  {
   5:      int32 r;
   6:   
   7:      while((r = runtime::mach_semaphore_wait(sem)) != 0) {
   8:          if(r == KERN_ABORTED)    // interrupted
   9:              continue;
  10:          macherror(r, "semaphore_wait");
  11:      }
  12:  }

Go finally calls the OS X semaphore wait() routine to grab the lock on the semaphore. So it looks like we first grab a Lock on the lock object, which finally locks the semaphore from the OS by calling semaphore_wait_trap(). This makes sense because Go locks go routines, which are mapped onto multiple OS threads. What I think is happening is that the Go runtime locks an application object first (e.g. the scheduler). Then, it tries to grab the semaphore for the currently running thread (I think), at which point the code requesting the lock is free to execute.

   1:  sys.s
   2:  // uint32 mach_semaphore_wait(uint32)
   3:  TEXT runtime::mach_semaphore_wait(SB),7,$0
   4:      MOVL    8(SP), DI
   5:      MOVL    $(0x1000000+36), AX    // semaphore_wait_trap
   6:      SYSCALL
   7:      RET

Unlocking

Now let's see how GO unlocks. It's pretty much the same thing in reverse. The unlock procedure first tries to grab the Lock object and subtract one from the lock count.

   1:  void
   2:  runtime::unlock(Lock *l)
   3:  {
   4:      m->locks--;
   5:      if(m->locks < 0)
   6:          runtime::throw("lock count");
   7:   
   8:      if(runtime::xadd(&l->key, -1) > 0) {    // someone else is waiting
   9:          // Allocate semaphore if needed.
  10:          if(l->sema == 0)
  11:              initsema(&l->sema);
  12:          runtime::mach_semrelease(l->sema);
  13:      }
  14:  }

Now let's checkout mach_semrelease() , which performs a syscall to semaphore_signal()

   1:  thread.c
   2:  void
   3:  runtime::mach_semrelease(uint32 sem)
   4:  {
   5:      int32 r;
   6:   
   7:      while((r = runtime::mach_semaphore_signal(sem)) != 0) {
   8:          if(r == KERN_ABORTED)    // interrupted
   9:              continue;
  10:          macherror(r, "semaphore_signal");
  11:      }
  12:  }

And the final source for mac_semaphore_signal():

   1:  sys.s
   2:  // uint32 mach_semaphore_signal(uint32)
   3:  TEXT runtime::mach_semaphore_signal(SB),7,$0
   4:      MOVL    8(SP), DI
   5:      MOVL    $(0x1000000+33), AX    // semaphore_signal_trap
   6:      SYSCALL
   7:      RET

Now we can lock and unlock all we want.