Using random number generators to solve problems

I recently came across the following blog post detailing a very nice Ruby solution to the following problem:

  • you are constantly creating robots, each of which must be given a unique name
  • this name must follow the pattern letter letter number number number
  • names need to be assigned at random, e.g. you can’t just call your first robot AA000, your second robot AA001, …
  • there needs to be a way to reset this name generator

The author came up with a very beautiful Ruby solution that makes clever use of ranges. In essence, it goes a little something like this:

class Generator
  def initialize
    init_rng
  end

  def next
    @names.next
  end

  def reset
    init_rng
  end

  private

  def init_rng
    @names = ('AA000'..'ZZ999').to_a.shuffle.each
  end
end

gen = Generator.new
gen.next # => SX696
gen.next # => VW329

There’s some really good stuff going on here. I had no idea ranges in Ruby where smart enough to deal with ('AA000'..'ZZ999'). The idea of storing @names as an enumerator is very nice as well. The author should feel proud for coming up with it.

Can we make it scale?

Reading through the code, it struck me just how diabolical this problem actually was. Because of the requirement for names to be assigned at random, we need to generate all elements in the range ('AA000'..'ZZ999') during initialization in order for us to be able to shuffle them into a random order. Unfortunately, this doesn’t really scale well for larger ranges. Some computation times for different ranges are shown below.

'AA000'..'ZZ999'     - 0.3 sec
'AAA000'..'ZZZ999'   - 9.8 sec
'AAAA000'..'ZZZZ999' - 133 sec

I started wondering just how hard it would be to solve this problem for arbitrarily large ranges. As it turns out, it’s totally possible, but we need to rephrase our robot naming problem into two completely different subproblems:

  1. Can we come up with a method that takes a number as input and transforms it into the relevant element of the range 'AA000'..'ZZ999' as output? e.g. can we transform the number 0 into the string AA000, the number 1 into the string AA001, … ?

  2. Can we come up with a random number generator that’ll allow us to specify a range and that will then return all elements of this range exactly once in a random order?

If we can solve both of these subproblems, then the original problem can be solved by writing such a random number generator and transforming each random number returned by it into the desired range. The nice thing about this approach is that generating a new name will always take the same amount of time regardless of the size of the range as we now no longer have to compute all elements in the range during initialization.

Designing a transformation function

The transformation function is by far the easier subproblem of the two. There isn’t really much to say about constructing such a function. We’ll start by writing a function that can transform an input into a 5 digit binary number. Once this is done, we’ll just modify this function to transform the input into the range 'AA000'..'ZZ999' instead.

Writing a function that can create 5 digit binary numbers is pretty straightforward. It goes a little something like this:

def transform(number)
  binary = []

  binary << number % 2
  binary = number / 2

  binary << number % 2
  binary = number / 2

  binary << number % 2
  binary = number / 2

  binary << number % 2
  binary = number / 2

  binary << number % 2
  binary = number / 2

  binary.reverse.join
end

transform(17) # => 10001
transform(28) # => 11100

The names we want to generate need to be in the range 'AA000'..'ZZ999'. That is to say, the last three characters need to be 0-9 (base 10), while the first two characters need to be A-Z (base 26). Modifying our function to accommodate this is straightforward enough:

def transform(number)
  name = []

  name << number % 10
  number = number / 10

  name << number % 10
  number = number / 10

  name << number % 10
  number = number / 10

  name << char(number % 26)
  number = number / 26

  name << char(number % 26)
  number = number / 26

  name.reverse.join
end

def char(number)
  (65 + number).chr
end

transform(0) # => AA000
transform(1) # => AA001

If we clean this up a bit, then we end up with the following code:

def generate_name(number)
  name = []

  3.times do
    name << number % 10
    number = number / 10
  end

  2.times do
    name << char(number % 26)
    number = number / 26
  end

  name.reverse.join
end

def char(number)
  (65 + number).chr
end

There we go, that was pretty easy. The first of our subproblems has been solved and we’re now halfway to solving our robot naming problem. Let’s go ahead and take a look at the second subproblem: creating a custom random number generator.

Creating a custom random number generator

Creating a random number generator that returns all numbers in a given range exactly once in a random order is a bit of an unusual programming problem. Thinking back to my CS classes, I can vaguely remember a lecture about generating pseudo-random numbers with a linear congruential generator. So let’s start by having a look at its Wikipedia page.

Linear congruential generators use the following formula to generate random numbers:

next_random_number = (a * current_random_number + c) % m

A generator will have different properties depending on the choice of the a, c, and m parameters. Luckily for us, there is a small section on its wiki page that describes how these parameters should be chosen in order for a generator to return all numbers of a given range exactly once in a random order.

If our range is a power of 2, these parameters need to fulfill the following requirements:

  • m will be our range, and therefore needs to be a power of 2
  • a needs to be smaller than m and a - 1 needs to be divisible by 4
  • c can be any odd number smaller than m

Writing a generator that meets these requirements is a pretty straightforward undertaking:

class Generator
  def initialize(range:)
    @range = range
    init_rng
  end

  def reset
    init_rng
  end

  def next
    @r = ((@a * @r) + @c) % @m
  end

  private

  def init_rng
    random = Random.new

    # m needs to be a power of two for now (we don't support arbitrary ranges yet)]
    # a needs to be smaller than m and (a - 1) needs to be divisible by 4
    # c can be any odd number smaller than m
    @m = @range
    @a = random.rand(@m / 4).to_i * 4 + 1
    @c = random.rand(@m / 2).to_i * 2 + 1

    # a random seed to get our generator started. The value of this seed needs to be
    # smaller than the value of m as well.
    @r = random.rand(@m).to_i
  end
end

# remember that our range needs to be a power of 2 for now
gen = Generator.new(range: 8)
16.times.map { gen.next }
# => [6, 1, 4, 7, 2, 5, 0, 3, 6, 1, 4, 7, 2, 5, 0, 3]

gen.reset
16.times.map { gen.next }
# => [2, 3, 0, 1, 6, 7, 4, 5, 2, 3, 0, 1, 6, 7, 4, 5]

Note that our generator will start repeating itself after having iterated across all elements in its range. We can fix this by having our next method raise an exception when its random numbers start repeating. The code for this can be seen below.

class GeneratorExhausted < StandardError; end

class Generator
  ...

  def next
    @r = ((@a * @r) + @c) % @m
    raise GeneratorExhausted if @r == @first_value
    @first_value ||= @r
    @r
  end

  ...
end

Note that up until now our range has always needed to be a power of two. Luckily we can easily accommodate ranges of arbitrary length. All we need to do is:

  • calculate the first power of two larger than our arbitrary range
  • use this power of two for our random number calculations
  • if our random number generator comes up with a number outside the specified range, then just throw it away and keep generating numbers until we find one that’s inside our range. On average, the very next randomly generated number will be inside the range, so the overhead for this is small. The code for doing so is shown here:
class GeneratorExhausted < StandardError; end

class Generator
  def initialize(range:)
    @range = range
    init_rng
  end

  def reset
    init_rng
  end

  def next
    loop do
      @r = ((@a * @r) + @c) % @m
      raise GeneratorExhausted if @r == @first_value

      if @r < @range
        @first_value ||= @r
        return @r
      end
    end
  end

  private

  def init_rng
    random = Random.new

    # m is the first power of two larger than our range
    # a needs to be smaller than m and (a - 1) needs to be divisible by 4
    # c can be any odd number smaller than m
    @m = 2 ** Math.log(@range + 1, 2).ceil
    @a = random.rand(@m / 4).to_i * 4 + 1
    @c = random.rand(@m / 2).to_i * 2 + 1

    # a random seed to get our generator started. The value of this seed needs to be
    # smaller than the value of m as well.
    @r = random.rand(@m).to_i
  end
end

gen = Generator.new(range: 10)
10.times.map { gen.next }
# => [6, 1, 8, 5, 9, 0, 3, 2, 4, 7]

The above code solves our second subproblem. We have successfully created a random number generator that’ll return all elements in a specified range exactly once in a random order. With both subproblems solved, we can now go ahead and write a solution to the original problem of generating random names for our robots.

Putting it all together

Our final NameGenerator class will end up looking something like shown below. Note that I did not bother to copy the code for the Generator class from the previous section.

class NameGenerator
  RANGE = 10 * 10 * 10 * 26 * 26

  def initialize
    @generator = Generator.new(range: RANGE)
  end

  def reset
    @generator.reset
  end

  def next
    number = @generator.next
    generate_name(number)
  end

  private

  def generate_name(number)
    name = []

    3.times do
      name << number % 10
      number = number / 10
    end

    2.times do
      name << char(number % 26)
      number = number / 26
    end

    name.reverse.join
  end

  def char(number)
    (65 + number).chr
  end
end

name_generator = NameGenerator.new
name_generator.next # => MJ650
name_generator.next # => HK923

The really nice thing about this approach is that by writing our own random number generator we have essentially created a lazily evaluated pseudo-random shuffle. It’s the kind of thing you would never want to use in a library that requires proper random numbers (e.g. encryption), but works great for problems like this.

Because of the lazily evaluated nature of our solution, we can now generate names across arbitrarily large ranges. Remember how it used to take 133 seconds to generate the first random name in the range 'AAAA000'..'ZZZZ999'? We can now do this in just 0.1 seconds! That’s a pretty good speed improvement for generating robot names!

Finishing up, I would just like to mention that this was a bit of a weird article for me to write. I’m not really at home with creating random number generators, so there is a real chance that some bugs might have crept in here or there. As always, if you think I got anything wrong, please feel free to get in touch and let me know.


Ruby concurrency: building a timeout queue

Ruby’s built-in Queue class is an incredibly handy data structure for all kinds of problems. Its versatility can be attributed to its pop method, which can be used in both a blocking and a non-blocking manner. However, when using a blocking pop, you may not always want to run the risk of this pop potentially blocking indefinitely. Instead, you might wish for your pop to block only until a given timeout interval has expired, after which it’ll throw an exception. We’ll refer to a queue that supports such pop behavior as a TimeoutQueue. This post will show you how to build such a queue.

Setting the stage

Going forward, I’ll assume you’re comfortable with all the concepts introduced in my earlier post on condition variables. Our starting point will be the SimpleQueue implementation that we wrote as part of that post. The code for this is shown below. If some of its concepts are unfamiliar to you, then I’d highly recommend reading my earlier post before continuing on.

class SimpleQueue
  def initialize
    @elems = []
    @mutex = Mutex.new
    @cond_var = ConditionVariable.new
  end

  def <<(elem)
    @mutex.synchronize do
      @elems << elem
      @cond_var.signal
    end
  end

  def pop(blocking = true)
    @mutex.synchronize do
      if blocking
        while @elems.empty?
          @cond_var.wait(@mutex)
        end
      end
      raise ThreadError, 'queue empty' if @elems.empty?
      @elems.shift
    end
  end
end

This SimpleQueue class behaves very similarly to Ruby’s built-in Queue class:

  • a non-blocking pop called on a non-empty queue will return immediately
  • a non-blocking pop called on an empty queue will raise a ThreadError
  • a blocking pop called on a non-empty queue will return immediately
  • a blocking pop called on an empty queue will wait for data to arrive before returning

It is the scenario described by the last of these four bullet points that we want to improve upon. In its current implementation, a blocking pop called on an empty queue will block for as long as the queue remains empty. As there is no guarantee that the queue will ever receive data, this could be a very long time indeed. In order to prevent such indefinite waits from occurring, we are going to add timeout functionality to this blocking pop.

Before moving on to the next section, let’s write a bit of code to prove that the four bullet points above do, in fact, hold true for our SimpleQueue implementation.

simple_queue = SimpleQueue.new

# non-blocking pop, non-empty queue
simple_queue << 'foobar'
simple_queue.pop(false) # => immediately returns 'foobar'

# non-blocking pop, empty queue
simple_queue.pop(false) # => immediately raises a ThreadError exception

# blocking pop, non-empty queue
simple_queue << 'foobar'
simple_queue.pop # => immediately returns 'foobar'

# blocking pop, empty queue - new data is added after 5s
Thread.new { sleep 5; simple_queue << 'foobar' }
simple_queue.pop # => waits for 5s for data to arrive, then returns 'foobar'

At this point, we can rest assured that our SimpleQueue class does indeed behave like Ruby’s built-in Queue class. Now let’s continue on to the next section where we’ll start adding timeout functionality to the blocking pop method.

A first attempt at adding timeout functionality

The next few sections will focus exclusively on improving the pop method code shown below. We’ll see how we can add timeout functionality to this method with just a few well-chosen lines of code.

def pop(blocking = true)
  @mutex.synchronize do
    if blocking
      while @elems.empty?
        @cond_var.wait(@mutex)
      end
    end
    raise ThreadError, 'queue empty' if @elems.empty?
    @elems.shift
  end
end

A first attempt at introducing timeout functionality could see us start by scrolling through Ruby’s condition variable documentation. Doing so, we might spot that the wait method can take an optional timeout parameter that will cause this method to return if it is still waiting after timeout seconds have passed.

wait(mutex, timeout = nil)

Releases the lock held in mutex and waits; reacquires the lock on wakeup. If timeout is given, this method returns after timeout seconds passed, even if no other thread doesn’t signal.

This could cause us to decide that we’d just need to modify our pop method to make use of this timeout functionality. Our code would then probably end up looking something like the snippet shown below.

def pop(blocking = true, timeout = nil)
  @mutex.synchronize do
    if blocking
      while @elems.empty?
        @cond_var.wait(@mutex, timeout)
      end
    end
    raise ThreadError, 'queue empty' if @elems.empty?
    @elems.shift
  end
end

This code allows for the user to pass a timeout parameter to the pop method. This value will then get used by our condition variable’s wait method. Seems pretty easy, right? Let’s write some Ruby code to test whether the above code will allow a blocking pop method to timeout when called on an empty queue.

timeout_queue = TimeoutQueue.new

# blocking pop, empty queue - new data is added after 5s
Thread.new { sleep 5; timeout_queue << 'foobar' }
timeout_queue.pop # => waits for 5s for data to arrive, then returns 'foobar'

# blocking pop, empty queue - no new data is added
timeout_queue.pop(true, 10) # => blocks indefinitely - something is wrong!

Our timeout parameter seems to have had no effect whatsoever. Instead of timing out, our blocking pop will still wait indefinitely for new data to arrive. What’s happening here is that @cond_var.wait(@mutex, timeout) will indeed return after its timeout interval has passed. However, keep in mind that this statement lives inside the while @elems.empty? loop.

What’ll happen is that @cond_var.wait(@mutex, timeout) will return after ten seconds. At this point, the loop’s predicate will be re-evaluated. If the queue has remained empty, then the loop will repeat for another iteration from which it’ll return after another ten seconds. As you can see, we’re stuck in an infinite loop of 10-second intervals that we’ll only be able to break out of once some data gets added to the queue. It looks like we’ll have to come up with a better approach if we want our blocking pop to timeout correctly.

How not to improve upon our first attempt

At first glance, it might seem reasonable to infer that the above approach’s poor behavior is solely due to the presence of the while @elems.empty? loop. As such, one might conclude that the best way to improve this behavior would involve replacing our while loop with an if statement. The resulting code would then look something like this.

def pop(blocking = true, timeout = nil)
  @mutex.synchronize do
    if blocking
      if @elems.empty?
        @cond_var.wait(@mutex, timeout)
      end
    end
    raise ThreadError, 'queue empty' if @elems.empty?
    @elems.shift
  end
end

Let’s see what happens when we use this implementation and run it against the Ruby code snippet that we produced earlier.

timeout_queue = TimeoutQueue.new

# blocking pop, empty queue - new data is added after 5s
Thread.new { sleep 5; timeout_queue << 'foobar' }
timeout_queue.pop # => waits for 5s for data to arrive, then returns 'foobar'

# blocking pop, empty queue - no new data is added
timeout_queue.pop(true, 10) # => raises ThreadError after 10s

Our new approach seems to work pretty well. However, replacing our while loop with an if statement is just about the worst thing we could do. As I’ve mentioned in great detail in my earlier post, using an if statement not only introduces a subtle race condition, but it also makes our code no longer impervious to spurious wakeups.

The race condition story is a bit too long to repeat here. However, I’ll try to briefly readdress the topic of spurious wakeups. Spurious wakeups occur when a condition variable is woken up out of its wait state for no reason. This illustrious behavior is due to the POSIX library’s implementation of condition variables. As such, this is not something we can easily control. This is why we’ll need to ensure that any code we write will behave correctly in the face of any such wakeups occurring.

Knowing what we know now, let’s look back at the code we wrote earlier. Imagine that a spurious wakeup were to occur immediately after our condition variable entered its wait state. If our code were to be using an if statement, then such a wakeup could instantly cause a ThreadError to be raised. This is not what we want! We only want a ThreadError to be raised after the timeout interval has expired!

Adding timeout functionality the right way

In this section, we’ll see how we can correctly implement timeout functionality. Up until now, we’ve been trying to have our one pop method handle the two possible scenarios of the user specifying a timeout interval, as well as that of the user not specifying a timeout interval. While it’s possible for us to continue this trend, the resulting logic will be unusually complex. For simplicity’s sake, we’ll create a new pop_with_timeout method that we’ll use to explain the scenario where a timeout interval was specified.

Our goal will be to improve on the code shown below. Notice that this code is pretty much identical to the code we covered in the previous section. The only real difference is that the timeout parameter no longer defaults to nil. Instead, we now expect the user to always provide a positive value for this parameter.

def pop_with_timeout(blocking = true, timeout)
  @mutex.synchronize do
    if blocking
      while @elems.empty?
        @cond_var.wait(@mutex, timeout)
      end
    end
    raise ThreadError, 'queue empty' if @elems.empty?
    @elems.shift
  end
end

Thinking back to the previous section, one of the problems with our approach is the value of timeout in @cond_var.wait(@mutex, timeout). If we could get this value to decrease as time passes, then our code would be resistant to spurious wakeups. For example, if our condition variable were to be spuriously awoken after just 4 seconds into a 10-second timeout interval, then the next iteration of the while loop should cause our condition variable to re-enter its wait state for 6 seconds.

First of all, if we want the value passed to @cond_var.wait to decrease as time goes by, then we should start by storing when @cond_var.wait gets called. So let’s introduce a start_time variable that does this.

def pop_with_timeout(blocking = true, timeout)
  @mutex.synchronize do
    if blocking
      start_time = Time.now.to_f

      while @elems.empty?
        @cond_var.wait(@mutex, timeout)
      end
    end
    raise ThreadError, 'queue empty' if @elems.empty?
    @elems.shift
  end
end

Next up, we want to start decreasing the timeout value passed to @cond_var.wait. Keep in mind that Ruby will raise an exception if this value were to be negative. So we should also add some logic that prevents this from happening.

def pop_with_timeout(blocking = true, timeout)
  @mutex.synchronize do
    if blocking
      start_time = Time.now.to_f

      while @elems.empty?
        remaining_time = (start_time + timeout) - Time.now.to_f

        if remaining_time > 0
          @cond_var.wait(@mutex, remaining_time)
        end
      end
    end
    raise ThreadError, 'queue empty' if @elems.empty?
    @elems.shift
  end
end

Our code is starting to look pretty good. We’ve introduced a new remaining_time variable to help us keep track of the time remaining in our timeout interval. By recalculating this variable with every iteration of our while loop, we ensure that its value decreases as time passes. So if a spurious wakeup were to prematurely awaken our condition variable, then the next iteration of our while loop would put it to sleep again for an appropriate amount of time.

Unfortunately, this code has an annoying shortcoming in that it doesn’t break free from the while @elems.empty? loop once the timeout interval has expired. Right now, this loop will keep repeating itself until the queue is no longer empty. What we really want is for us to escape from this loop once our queue is no longer empty OR our remaining_time value becomes zero or less. Luckily, this is quite straightforward to solve.

def pop_with_timeout(blocking = true, timeout)
  @mutex.synchronize do
    if blocking
      start_time = Time.now.to_f

      while @elems.empty? && (remaining_time = (start_time + timeout) - Time.now.to_f) > 0
        @cond_var.wait(@mutex, remaining_time)
      end
    end
    raise ThreadError, 'queue empty' if @elems.empty?
    @elems.shift
  end
end

We can write the above approach slightly more succinctly by replacing start_time with timeout_time. This gives us our final pop_with_timeout code shown here.

def pop_with_timeout(blocking = true, timeout)
  @mutex.synchronize do
    if blocking
      timeout_time = Time.now.to_f + timeout

      while @elems.empty? && (remaining_time = timeout_time - Time.now.to_f) > 0
        @cond_var.wait(@mutex, remaining_time)
      end
    end
    raise ThreadError, 'queue empty' if @elems.empty?
    @elems.shift
  end
end

The final TimeoutQueue class is shown below. The pop_with_timeout logic has been put inside the regular pop method in order to minimize code duplication. It should be pretty easy to recognize the bits and pieces that we’ve covered as part of this article.

class TimeoutQueue
  def initialize
    @elems = []
    @mutex = Mutex.new
    @cond_var = ConditionVariable.new
  end

  def <<(elem)
    @mutex.synchronize do
      @elems << elem
      @cond_var.signal
    end
  end

  def pop(blocking = true, timeout = nil)
    @mutex.synchronize do
      if blocking
        if timeout.nil?
          while @elems.empty?
            @cond_var.wait(@mutex)
          end
        else
          timeout_time = Time.now.to_f + timeout
          while @elems.empty? && (remaining_time = timeout_time - Time.now.to_f) > 0
            @cond_var.wait(@mutex, remaining_time)
          end
        end
      end
      raise ThreadError, 'queue empty' if @elems.empty?
      @elems.shift
    end
  end
end

Let’s see how our TimeoutQueue fares when we plug it into our trusty Ruby snippet.

timeout_queue = TimeoutQueue.new

# blocking pop, empty queue - new data is added after 5s
Thread.new { sleep 5; timeout_queue << 'foobar' }
timeout_queue.pop # => waits for 5s for data to arrive, then returns 'foobar'

# blocking pop, empty queue - no new data is added
timeout_queue.pop(true, 10) # => raises ThreadError after 10s

This is exactly the behavior that we want. This is the same behavior we saw back when we replaced our while loop with an if statement. However, this time around our queue will behave correctly in the face of spurious wakeups. That is to say, if our condition variable were to be woken up early, then our while loop would ensure it is put back to sleep again. This while loop also protects us from a subtle race condition that would have otherwise been introduced if we had used an if statement instead. More information about this race condition can be found in one of my earlier posts. We have now successfully created a fully functioning TimeoutQueue class!

Conclusion

I hope this article has been helpful for those wanting to add timeout functionality to their Ruby queues. Condition variables are a notoriously poorly documented subject in Ruby, and hopefully, this post goes some way towards alleviating this. I also want to give a shout-out to Job Vranish whose two posts on condition variables got me to start learning more about them well over a year ago.

Let me just put up my usual disclaimer stating that while everything mentioned above is correct to the best of my knowledge, I’m unable to guarantee that absolutely no mistakes snuck in while writing this. As always, please feel free to contact me with any corrections.


Ruby concurrency: in praise of condition variables

In a previous post, we talked about the benefits conferred by Ruby mutexes. While a programmer’s familiarity with mutexes is likely to depend on what kind of programs she usually writes, most developers tend to be at least somewhat familiar with these particular synchronization primitives. This article, however, is going to focus on a much lesser-known synchronization construct: the condition variable.

Condition variables are used for putting threads to sleep and waking them back up once a certain condition is met. Don’t worry if this sounds a bit vague; we’ll go into a lot more detail later. As condition variables always need to be used in conjunction with mutexes, we’ll lead with a quick mutex recap. Next, we’ll introduce consumer-producer problems and how to elegantly solve them with the aid of condition variables. Then, we’ll have a look at how to use these synchronization primitives for implementing blocking method calls. Finishing up, we’ll describe some curious condition variable behavior and how to safeguard against it.

A mutex recap

A mutex is a data structure for protecting shared state between multiple threads. When a piece of code is wrapped inside a mutex, the mutex guarantees that only one thread at a time can execute this code. If another thread wants to start executing this code, it’ll have to wait until our first thread is done with it. I realize this may all sound a bit abstract, so now is probably a good time to bring in some example code.

Writing to shared state

In this first example, we’ll have a look at what happens when two threads try to modify the same shared variable. The snippet below shows two methods: counters_with_mutex and counters_without_mutex. Both methods start by creating a zero-initialized counters array before spawning 5 threads. Each thread will perform 100,000 loops, with every iteration incrementing all elements of the counters array by one. Both methods are the same in every way except for one thing: only one of them uses a mutex.

def counters_with_mutex
  mutex = Mutex.new
  counters = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

  5.times.map do
    Thread.new do
      100000.times do
        mutex.synchronize do
          counters.map! { |counter| counter + 1 }
        end
      end
    end
  end.each(&:join)

  counters.inspect
end

def counters_without_mutex
  counters = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

  5.times.map do
    Thread.new do
      100000.times do
        counters.map! { |counter| counter + 1 }
      end
    end
  end.each(&:join)

  counters.inspect
end

puts counters_with_mutex
# => [500000, 500000, 500000, 500000, 500000, 500000, 500000, 500000, 500000, 500000]

puts counters_without_mutex
# => [500000, 447205, 500000, 500000, 500000, 500000, 203656, 500000, 500000, 500000]
# note that we seem to have lost some increments here due to not using a mutex

As you can see, only the method that uses a mutex ends up producing the correct result. The method without a mutex seems to have lost some increments. This is because the lack of a mutex makes it possible for our second thread to interrupt our first thread at any point during its execution. This can lead to some serious problems.

For example, imagine that our first thread has just read the first entry of the counters array, incremented it by one, and is now getting ready to write this incremented value back to our array. However, before our first thread can write this incremented value, it gets interrupted by the second thread. This second thread then goes on to read the current value of the first entry, increments it by one, and succeeds in writing the result back to our counters array. Now we have a problem!

We have a problem because the first thread got interrupted before it had a chance to write its incremented value to the array. When the first thread resumes, it will end up overwriting the value that the second thread just placed in the array. This will cause us to essentially lose an increment operation, which explains why our program output has entries in it that are less than 500,000.

All these problems can be avoided by using a mutex. Remember that a thread executing code wrapped by a mutex cannot be interleaved with another thread wanting to execute this same code. Therefore, our second thread would never have gotten interleaved with the first thread, thereby avoiding the possibility of results getting overwritten.

Reading from shared state

There’s a common misconception that a mutex is only required when writing to a shared variable, and not when reading from it. The snippet below shows 50 threads flipping the boolean values in the flags array over and over again. Many developers think this snippet is without error as the code responsible for changing these values was wrapped inside a mutex. If that were true, then every line of the output of puts flags.to_s should consist of 10 repetitions of either true or false. As we can see below, this is not the case.

mutex = Mutex.new
flags = [false, false, false, false, false, false, false, false, false, false]

threads = 50.times.map do
  Thread.new do
    100000.times do
      # don't do this! Reading from shared state requires a mutex!
      puts flags.to_s

      mutex.synchronize do
        flags.map! { |f| !f }
      end
    end
  end
end
threads.each(&:join)
$ ruby flags.rb > output.log
$ grep 'true, false' output.log | wc -l
    30

What’s happening here is that our mutex only guarantees that no two threads can modify the flags array at the same time. However, it is perfectly possible for one thread to start reading from this array while another thread is busy modifying it, thereby causing the first thread to read an array that contains both true and false entries. Luckily, all of this can be easily avoided by wrapping puts flags.to_s inside our mutex. This will guarantee that only one thread at a time can read from or write to the flags array.

Before moving on, I would just like to mention that even very experienced people have gotten tripped up by not using a mutex when accessing shared state. In fact, at one point there even was a Java design pattern that assumed it was safe to not always use a mutex to do so. Needless to say, this pattern has since been amended.

Consumer-producer problems

With that mutex refresher out of the way, we can now start looking at condition variables. Condition variables are best explained by trying to come up with a practical solution to the consumer-producer problem. In fact, consumer-producer problems are so common that Ruby already has a data structure aimed at solving these: the Queue class. This class uses a condition variable to implement the blocking variant of its shift() method. In this article, we made a conscious decision not to use the Queue class. Instead, we’re going to write everything from scratch with the help of condition variables.

Let’s have a look at the problem that we’re going to solve. Imagine that we have a website where users can generate tasks of varying complexity, e.g. a service that allows users to convert uploaded jpg images to pdf. We can think of these users as producers of a steady stream of tasks of random complexity. These tasks will get stored on a backend server that has several worker processes running on it. Each worker process will grab a task, process it, and then grab the next one. These workers are our task consumers.

With what we know about mutexes, it shouldn’t be too hard to write a piece of code that mimics the above scenario. It’ll probably end up looking something like this.

tasks = []
mutex = Mutex.new
threads = []

class Task
  def initialize
    @duration = rand()
  end

  def execute
    sleep @duration
  end
end

# producer threads
threads += 2.times.map do
  Thread.new do
    while true
      mutex.synchronize do
        tasks << Task.new
        puts "Added task: #{tasks.last.inspect}"
      end
      # limit task production speed
      sleep 0.5
    end
  end
end

# consumer threads
threads += 5.times.map do
  Thread.new do
    while true
      task = nil
      mutex.synchronize do
        if tasks.count > 0
          task = tasks.shift
          puts "Removed task: #{task.inspect}"
        end
      end
      # execute task outside of mutex so we don't unnecessarily
      # block other consumer threads
      task.execute unless task.nil?
    end
  end
end

threads.each(&:join)

The above code should be fairly straightforward. There is a Task class for creating tasks that take between 0 and 1 seconds to run. We have 2 producer threads, each running an endless while loop that safely appends a new task to the tasks array every 0.5 seconds with the help of a mutex. Our 5 consumer threads are also running an endless while loop, each iteration grabbing the mutex so as to safely check the tasks array for available tasks. If a consumer thread finds an available task, it removes the task from the array and starts processing it. Once the task had been processed, the thread moves on to its next iteration, thereby repeating the cycle anew.

While the above implementation seems to work just fine, it is not optimal as it requires all consumer threads to constantly poll the tasks array for available work. This polling does not come for free. The Ruby interpreter has to constantly schedule the consumer threads to run, thereby preempting threads that may have actual important work to do. To give an example, the above code will interleave consumer threads that are executing a task with consumer threads that just want to check for newly available tasks. This can become a real problem when there is a large number of consumer threads and only a few tasks.

If you want to see for yourself just how inefficient this approach is, you only need to modify the original code for consumer threads with the code shown below. This modified program prints well over a thousand lines of This thread has nothing to do for every single line of Removed task. Hopefully, this gives you an indication of the general wastefulness of having consumer threads constantly poll the tasks array.

# modified consumer threads code
threads += 5.times.map do
  Thread.new do
    while true
      task = nil
      mutex.synchronize do
        if tasks.count > 0
          task = tasks.shift
          puts "Removed task: #{task.inspect}"
        else
          puts 'This thread has nothing to do'
        end
      end
      # execute task outside of mutex so we don't unnecessarily
      # block other consumer threads
      task.execute unless task.nil?
    end
  end
end

Condition variables to the rescue

So how we can create a more efficient solution to the consumer-producer problem? That is where condition variables come into play. Condition variables are used for putting threads to sleep and waking them only once a certain condition is met. Remember that our current solution to the producer-consumer problem is far from ideal because consumer threads need to constantly poll for new tasks to arrive. Things would be much more efficient if our consumer threads could go to sleep and be woken up only when a new task has arrived.

Shown below is a solution to the consumer-producer problem that makes use of condition variables. We’ll talk about how this works in a second. For now though, just have a look at the code and perhaps have a go at running it. If you were to run it, you would probably see that This thread has nothing to do does not show up anymore. Our new approach has completely gotten rid of consumer threads busy polling the tasks array.

The use of a condition variable will now cause our consumer threads to wait for a task to be available in the tasks array before proceeding. As a result of this, we can now remove some of the checks we had to have in place in our original consumer code. I’ve added some comments to the code below to help highlight these removals.

tasks = []
mutex = Mutex.new
cond_var = ConditionVariable.new
threads = []

class Task
  def initialize
    @duration = rand()
  end

  def execute
    sleep @duration
  end
end

# producer threads
threads += 2.times.map do
  Thread.new do
    while true
      mutex.synchronize do
        tasks << Task.new
        cond_var.signal
        puts "Added task: #{tasks.last.inspect}"
      end
      # limit task production speed
      sleep 0.5
    end
  end
end

# consumer threads
threads += 5.times.map do
  Thread.new do
    while true
      task = nil
      mutex.synchronize do
        while tasks.empty?
          cond_var.wait(mutex)
        end

        # the `if tasks.count == 0` statement will never be true as the thread
        # will now only reach this line if the tasks array is not empty
        puts 'This thread has nothing to do' if tasks.count == 0

        # similarly, we can now remove the `if tasks.count > 0` check that
        # used to surround this code. We no longer need it as this code will
        # now only get executed if the tasks array is not empty.
        task = tasks.shift
        puts "Removed task: #{task.inspect}"
      end
      # Note that we have now removed `unless task.nil?` from this line as
      # our thread can only arrive here if there is indeed a task available.
      task.execute
    end
  end
end

threads.each(&:join)

Aside from us removing some if statements, our new code is essentially identical to our previous solution. The only exception to this are the five new lines shown below. Don’t worry if some of the accompanying comments don’t quite make sense yet. Now is also a good time to point out that the new code for both the producer and consumer threads was added inside the existing mutex synchronization blocks. Condition variables are not thread-safe and therefore always need to be used in conjunction with a mutex!

# declaring the condition variable
cond_var = ConditionVariable.new
# a producer thread now signals the condition variable
# after adding a new task to the tasks array
cond_var.signal
# a consumer thread now goes to sleep when it sees that
# the tasks array is empty. It can get woken up again
# when a producer thread signals the condition variable.
while tasks.empty?
  cond_var.wait(mutex)
end

Let’s talk about the new code now. We’ll start with the consumer threads snippet. There’s actually so much going on in these three lines that we’ll limit ourselves to covering what cond_var.wait(mutex) does for now. We’ll explain the need for the while tasks.empty? loop later. The first thing to notice about the wait method is the parameter that’s being passed to it. Remember how a condition variable is not thread-safe and therefore should only have its methods called inside a mutex synchronization block? It is that mutex that needs to be passed as a parameter to the wait method.

Calling wait on a condition variable causes two things to happen. First of all, it causes the thread that calls wait to go to sleep. That is to say, the thread will tell the interpreter that it no longer wants to be scheduled. However, this thread still has ownership of the mutex as it’s going to sleep. We need to ensure that the thread relinquishes this mutex because otherwise all other threads waiting for this mutex will be blocked. By passing this mutex to the wait method, the wait method internals will ensure that the mutex gets released as the thread goes to sleep.

Let’s move on to the producer threads. These threads are now calling cond_var.signal. The signal method is pretty straightforward in that it wakes up exactly one of the threads that were put to sleep by the wait method. This newly awoken thread will indicate to the interpreter that it is ready to start getting scheduled again and then wait for its turn.

So what code does our newly awoken thread start executing once it gets scheduled again? It starts executing from where it left off. Essentially, a newly awoken thread will return from its call to cond_var.wait(mutex) and resume from there. Personally, I like to think of calling wait as creating a save point inside a thread from which work can resume once the thread gets woken up and rescheduled again. Please note that since the thread wants to resume from where it originally left off, it’ll need to reacquire the mutex in order to get scheduled. This mutex reacquisition is very important, so be sure to remember it.

This segues nicely into why we need to use while tasks.empty? when calling wait in a consumer thread. When our newly awoken thread resumes execution by returning from cond_var.wait, the first thing it’ll do is complete its previously interrupted iteration through the while loop, thereby evaluating while tasks.empty? again. This actually causes us to neatly avoid a possible race condition.

Let’s say we don’t use a while loop and use an if statement instead. The resulting code would then look like shown below. Unfortunately, there is a very hard to find problem with this code. Note how we now need to re-add the previously removed if tasks.count > 0 and unless task.nil? statements to our code below in order to ensure its safe execution.

# consumer threads
threads += 5.times.map do
  Thread.new do
    while true
      task = nil
      mutex.synchronize do
        cond_var.wait(mutex) if tasks.empty?

        # using `if tasks.empty?` forces us to once again add this
        # `if tasks.count > 0` check. We need this check to protect
        # ourselves against a nasty race condition.
        if tasks.count > 0
          task = tasks.shift
          puts "Removed task: #{task.inspect}"
        else
          puts 'This thread has nothing to do'
        end
      end
      # using `if tasks.empty?` forces us to re-add `unless task.nil?`
      # in order to safeguard ourselves against a now newly introduced
      # race condition
      task.execute unless task.nil?
    end
  end
end

Imagine a scenario where we have:

  • two producer threads
  • one consumer thread that’s awake
  • four consumer threads that are asleep

A consumer thread that’s awake will go back to sleep only when there are no more tasks in the tasks array. That is to say, a single consumer thread will keep processing tasks until no more tasks are available. Now, let’s say one of our producer threads adds a new task to the currently empty tasks array before calling cond_var.signal at roughly the same time as our active consumer thread is finishing its current task. This signal call will awaken one of our sleeping consumer threads, which will then try to get itself scheduled. This is where a race condition is likely to happen!

We’re now in a position where two consumer threads are competing for ownership of the mutex in order to get scheduled. Let’s say our first consumer thread wins this competition. This thread will now go and grab the task from the tasks array before relinquishing the mutex. Our second consumer thread then grabs the mutex and gets to run. However, as the tasks array is empty now, there is nothing for this second consumer thread to work on. So this second consumer thread now has to do an entire iteration of its while true loop for no real purpose at all.

We now find ourselves in a situation where a complete iteration of the while true loop can occur even when the tasks array is empty. This is a not unlike the position we were in when our program was just busy polling the tasks array. Sure, our current program will be more efficient than busy polling, but we will still need to safeguard our code against the possibility of an iteration occurring when there is no task available. This is why we needed to re-add the if tasks.count > 0 and unless task.nil? statements. Especially the latter of these two is important, as otherwise our program might crash with a NilException.

Luckily, we can safely get rid of these easily overlooked safeguards by forcing each newly awakened consumer thread to check for available tasks and having it put itself to sleep again if no tasks are available. This behavior can be accomplished by replacing the if tasks.empty? statement with a while tasks.empty? loop. If tasks are available, a newly awoken thread will exit the loop and execute the rest of its code. However, if no tasks are found, then the loop is repeated, thereby causing the thread to put itself to sleep again by executing cond_var.wait. We’ll see in a later section that there is yet another benefit to using this while loop.

Building our own Queue class

At the beginning of a previous section, we touched on how condition variables are used by the Queue class to implement blocking behavior. The previous section taught us enough about condition variables for us to go and implement a basic Queue class ourselves. We’re going to create a thread-safe SimpleQueue class that is capable of:

  • having data appended to it with the << operator
  • having data retrieved from it with a non-blocking shift method
  • having data retrieved from it with a blocking shift method

It’s easy enough to write code that meets these first two criteria. It will probably end up looking something like the code shown below. Note that our SimpleQueue class is using a mutex as we want this class to be thread-safe, just like the original Queue class.

class SimpleQueue
  def initialize
    @elems = []
    @mutex = Mutex.new
  end

  def <<(elem)
    @mutex.synchronize do
      @elems << elem
    end
  end

  def shift(blocking = true)
    @mutex.synchronize do
      if blocking
        raise 'yet to be implemented'
      end
      @elems.shift
    end
  end
end

simple_queue = SimpleQueue.new
simple_queue << 'foo'

simple_queue.shift(false)
# => "foo"

simple_queue.shift(false)
# => nil

Now let’s have a look at what’s needed to implement the blocking shift behavior. As it turns out, this is actually very easy. We only want the thread to block if the shift method is called when the @elems array is empty. This is all the information we need to determine where we need to place our condition variable’s call to wait. Similarly, we want the thread to stop blocking once the << operator appends a new element, thereby causing @elems to no longer be empty. This tells us exactly where we need to place our call to signal.

In the end, we just need to create a condition variable that makes the thread go to sleep when a blocking shift is called on an empty SimpleQueue. Likewise, the << operator just needs to signal the condition variable when a new element is added, thereby causing the sleeping thread to be woken up. The takeaway from this is that blocking methods work by causing their calling thread to fall asleep. Also, please note that the call to @cond_var.wait takes place inside a while @elems.empty? loop. Always use a while loop when calling wait on a condition variable! Never use an if statement!

class SimpleQueue
  def initialize
    @elems = []
    @mutex = Mutex.new
    @cond_var = ConditionVariable.new
  end

  def <<(elem)
    @mutex.synchronize do
      @elems << elem
      @cond_var.signal
    end
  end

  def shift(blocking = true)
    @mutex.synchronize do
      if blocking
        while @elems.empty?
          @cond_var.wait(@mutex)
        end
      end
      @elems.shift
    end
  end
end

simple_queue = SimpleQueue.new

# this will print "blocking shift returned with: foo" after 5 seconds
# that is to say, the first thread will go to sleep until the second
# thread adds an element to the queue, thereby causing the first thread
# to be woken up again
threads = []
threads << Thread.new { puts "blocking shift returned with: #{simple_queue.shift}" }
threads << Thread.new { sleep 5; simple_queue << 'foo' }
threads.each(&:join)

One thing to point out in the above code is that @cond_var.signal can get called even when there are no sleeping threads around. This is a perfectly okay thing to do. In these types of scenarios calling @cond_var.signal will just do nothing.

Spurious wakeups

A “spurious wakeup” refers to a sleeping thread getting woken up without any signal call having been made. This is an impossible to avoid edge-case in condition variables. It’s important to point out that this is not being caused by a bug in the Ruby interpreter or anything like that. Instead, the designers of the threading libraries used by your OS found that allowing for the occasional spurious wakeup greatly improves the speed of condition variable operations. As such, any code that uses condition variables needs to take spurious wakeups into account.

So does this mean that we need to rewrite all the code that we’ve written in this article in an attempt to make it resistant to possible bugs introduced by spurious wakeups? You’ll be glad to know that this isn’t the case as all code snippets in this article have always wrapped the cond_var.wait statement inside a while loop!

We covered earlier how using a while loop makes our code more efficient when dealing with certain race conditions as it causes a newly awakened thread to check whether there is actually anything to do for it, and if not, the thread goes back to sleep. This same while loop helps us deal with spurious wakeups as well.

When a thread gets woken up by a spurious wakeup and there is nothing for it to do, our usage of a while loop will cause the thread to detect this and go back to sleep. From the thread’s point of view, being awakened by a spurious wakeup isn’t any different than being woken up with no available tasks to do. So the same mechanism that helps us deal with race conditions solves our spurious wakeup problem as well. It should be obvious by now that while loops play a very important role when working with condition variables.

Conclusion

Ruby’s condition variables are somewhat notorious for their poor documentation. That’s a shame, because they are wonderful data structures for efficiently solving a very specific set of problems. Although, as we’ve seen, using them isn’t without pitfalls. I hope that this post will go some way towards making them (and their pitfalls) a bit better understood in the wider Ruby community.

I also feel like I should point out that while everything mentioned above is correct to the best of my knowledge, I’m unable to guarantee that absolutely no mistakes snuck in while writing this. As always, please feel free to contact me if you think I got anything wrong, or even if you just want to say hello.


Using DTrace to measure mutex contention in Ruby

I recently found myself working on Ruby code containing a sizable number of threads and mutexes. It wasn’t before long that I started wishing for some tool that could tell me which particular mutexes were the most heavily contended. After all, this type of information can be worth its weight in gold when trying to diagnose why your threaded program is running slower than expected.

This is where DTrace enters the picture. DTrace is a tracing framework that enables you to instrument application and system internals, thereby allowing you to measure and collect previously inaccessible metrics. Personally, I think of DTrace as black magic that allows me to gather a ridiculous amount of information about what is happening on my computer. We will see later just how fine-grained this information can get.

DTrace is available on Solaris, OS X, and FreeBSD. There is some Linux support as well, but you might be better off using one of the Linux specific alternatives instead. Info about these can be found here and here. At the time of writing this, it was also recently announced that DTrace is making its way to Linux. Please note that all work for this particular post was done on a MacBook running OS X El Capitan (version 10.11).

Enabling DTrace for Ruby on OS X

El Capitan comes with a new security feature called System Integrity Protection that helps prevent malicious software from tampering with your system. Regrettably, it also prevents DTrace from working. We will need to disable SIP by following these instructions. Note that it is possible to only partially disable SIP, although doing so will still leave DTrace unable to attach to restricted processes. Personally, I’ve completely disabled SIP on my machine.

Next, we want to get DTrace working with Ruby. Although Ruby has DTrace support, there is a very good chance that your currently installed Ruby binary does not have this support enabled. This is especially likely to be true if you compiled your Ruby binary locally on an OS X system, as OS X does not allow the Ruby compilation process to access the system DTrace binary, thereby causing the resulting Ruby binary to lack DTrace functionality. More information about this can be found here.

I’ve found the easiest way to get a DTrace compatible Ruby on my system was to just go and download a precompiled Ruby binary. Naturally, we want to be a bit careful about this, as downloading random binaries from the internet is not the safest thing. Luckily, the good people over at rvm.io host DTrace compatible Ruby binaries that we can safely download.

$ rvm mount -r https://rvm.io/binaries/osx/10.10/x86_64/ruby-2.2.3.tar.bz2
$ rvm use ruby-2.2.3

If you do not wish to install rvm, and cannot find a DTrace compatible Ruby through your favorite version manager, then you could do something similar to the snippet shown below.

$ wget https://rvm.io/binaries/osx/10.11/x86_64/ruby-2.2.3.tar.bz2
$ bunzip2 ruby-2.2.3.tar.bz2
$ tar xf ruby-2.2.3.tar
$ mv ruby-2.2.3/bin/ruby /usr/local/bin/druby
$ rm -r ruby-2.2.3 ruby-2.2.3.tar

Running the above snippet will install a DTrace compatible Ruby on our system. Note that we rename the binary to druby so as to prevent conflicts with existing Ruby installations. The above approach should really be treated as a last resort. I strongly urge you to make the effort to find a DTrace compatible Ruby binary through your current version manager.

Now that we’ve ensured we’ll be able to use DTrace with our installed Ruby, let’s move on and start learning some DTrace basics.

DTrace basics

DTrace is all about probes. A probe is a piece of code that fires when a specific condition is met. For example, we could ask DTrace to instrument a process with a probe that activates when the process returns from a particular system call. Such a probe would also be able to inspect the value returned by the system call made by this process.

You interact with DTrace by writing scripts in the D scripting language (not related to the D programming language). This language is a mix of C and awk, and has a very low learning curve. Below you can see an example of such a script written in D. This particular script will list all system calls being initiated on my machine along with the name of the process that initiated them. We will save this file as syscall_entry.d.

/* syscall_entry.d */
syscall:*:*:entry
{
  printf("\tprocess_name: %s", execname);
}

The first line of our script tells DTrace which probes we want to use. In this particular case, we are using syscall:*:*:entry to match every single probe associated with initiating a system call. DTrace has individual probes for every possible system call, so if DTrace were to have no built-in functionality for matching multiple probes, I would have been forced to manually specify every single system call probe myself, and our script would have been a whole lot longer.

I want to briefly cover some DTrace terminology before continuing on. Every DTrace probe adheres to the <provider>:<module>:<function>:<name> description format. In the script above we asked DTrace to match all probes of the syscall provider that have entry as their name. In this particular example, we explicitly used the * character to show that we want to match multiple probes. However, keep in mind that the use of the * character is optional. Most DTrace documentation would opt for syscall:::entry instead.

The rest of the script is rather straightforward. We are basically just telling DTrace to print the execname every time a probe fires. The execname is a built-in DTrace variable that holds the name of the process that caused the probe to be fired. Let us now go ahead and run our simple DTrace script.

$ sudo dtrace -s syscall_entry.d

dtrace: script 'syscall_entry.d' matched 500 probes
CPU     ID                    FUNCTION:NAME
  0    249                      ioctl:entry    process_name: dtrace
  0    373               gettimeofday:entry    process_name: java
  0    249                      ioctl:entry    process_name: dtrace
  0    751              psynch_cvwait:entry    process_name: java
  0    545                     sysctl:entry    process_name: dtrace
  0    545                     sysctl:entry    process_name: dtrace
  0    233                  sigaction:entry    process_name: dtrace
  0    233                  sigaction:entry    process_name: dtrace
  0    751              psynch_cvwait:entry    process_name: dtrace
  0    889                 kevent_qos:entry    process_name: Google Chrome Helper
  0    889                 kevent_qos:entry    process_name: Google Chrome Helper
  0    877           workq_kernreturn:entry    process_name: notifyd
  ...

The first thing to notice is that syscall:*:*:entry matched 500 different probes. At first glance this might seem like a lot, but on my machine there are well over 330,000 probes available. You can list all DTrace probes on your machine by running sudo dtrace -l.

The second thing to notice is the insane amount of data returned by DTrace. The snippet above really doesn’t do the many hundreds of lines of output justice, but going forward we’ll see how we can get DTrace to output just those things we are interested in.

Before moving on to the next section, I just want to note that the D scripting language is not Turing complete. It lacks such features as conditional branching and loops. DTrace is built around the ideas of minimal overhead and absolute safety. Giving people the ability to use DTrace to introduce arbitrary overhead on top of system calls does not fit with these ideas.

Ruby and DTrace

DTrace probes have been supported by Ruby since Ruby 2.0 came out. A list of supported Ruby probes can be found here. Now is a good time to mention that DTrace probes come in two flavors: dynamic probes and static probes. Dynamic probes only appear in the pid and fbt probe providers. This means that the vast majority of available probes (including Ruby probes) is static.

So how exactly do dynamic and static probes differ? In order to explain this, we first need to take a closer look at just how DTrace works. When you invoke DTrace on a process you are effectively giving DTrace permission to patch additional DTrace instrumentation instructions into the process’s address space. Remember how we had to disable the System Integrity Protection check in order to get DTrace to work on El Capitan? This is why.

In the case of dynamic probes, DTrace instrumentation instructions only get patched into a process when DTrace is invoked on this process. In other words, dynamic probes add zero overhead when not enabled. Static probes on the other hand have to be compiled into the binary that wants to make use of them. This is done through a probes.d file.

However, even when probes have been compiled into the binary, this does not necessarily mean that they are getting triggered. When a process with static probes in its binary does not have DTrace invoked on it, any probe instructions get converted into NOP operations. This usually introduces a negligible, but nevertheless non-zero, performance impact. More information about DTrace overhead can be found here, here, and here.

Now that we’ve immersed ourselves in all things probe-related, let’s go ahead and actually list which DTrace probes are available for a Ruby process. We saw earlier that Ruby comes with static probes compiled into the Ruby binary. We can ask DTrace to list these probes for us with the following command.

$ sudo dtrace -l -m ruby -c 'ruby -v'

ID         PROVIDER       MODULE              FUNCTION      NAME
114188    ruby86029         ruby       empty_ary_alloc      array-create
114189    ruby86029         ruby               ary_new      array-create
114190    ruby86029         ruby         vm_call_cfunc      cmethod-entry
114191    ruby86029         ruby         vm_call0_body      cmethod-entry
114192    ruby86029         ruby          vm_exec_core      cmethod-entry
114193    ruby86029         ruby         vm_call_cfunc      cmethod-return
114194    ruby86029         ruby         vm_call0_body      cmethod-return
114195    ruby86029         ruby            rb_iterate      cmethod-return
114196    ruby86029         ruby          vm_exec_core      cmethod-return
114197    ruby86029         ruby       rb_require_safe      find-require-entry
114198    ruby86029         ruby       rb_require_safe      find-require-return
114199    ruby86029         ruby              gc_marks      gc-mark-begin
...

Let’s take a moment to look at the command we entered here. We saw earlier that we can use -l to have DTrace list its probes. Now we also use -m ruby to get DTrace to limit its listing to probes from the ruby module. However, DTrace will only list its Ruby probes if you specifically tell it you are interested in invoking DTrace on a Ruby process. This is what we use -c 'ruby -v' for. The -c parameter allows us to specify a command that creates a process we want to run DTrace against. Here we are using ruby -v to spawn a small Ruby process in order to get DTrace to list its Ruby probes.

The above snippet doesn’t actually list all Ruby probes, as the sudo dtrace -l command will omit any probes from the pid provider. This is because the pid provider actually defines a class of providers, each of which gets its own set of probes depending on which process you are tracing. Each pid probe corresponds to an internal C function that can be called by that particular process. Below we show how to list the Ruby specific probes of this provider.

$ sudo dtrace -l -n 'pid$target:::entry' -c 'ruby -v' | grep 'ruby'

ID         PROVIDER       MODULE                     FUNCTION      NAME
1395302    pid86272         ruby                   rb_ary_eql      entry
1395303    pid86272         ruby                  rb_ary_hash      entry
1395304    pid86272         ruby                  rb_ary_aset      entry
1395305    pid86272         ruby                    rb_ary_at      entry
1395306    pid86272         ruby                 rb_ary_fetch      entry
1395307    pid86272         ruby                 rb_ary_first      entry
1395308    pid86272         ruby                rb_ary_push_m      entry
1395309    pid86272         ruby                 rb_ary_pop_m      entry
1395310    pid86272         ruby               rb_ary_shift_m      entry
1395311    pid86272         ruby                rb_ary_insert      entry
1395312    pid86272         ruby            rb_ary_each_index      entry
1395313    pid86272         ruby          rb_ary_reverse_each      entry
...

Here we are only listing the pid entry probes, but keep in mind that every entry probe has a corresponding pid return probe. These probes are great as they provide us with insight into which internal functions are getting called, the arguments passed to these, as well as their return values, and even the offset in the function of the return instruction (useful for when a function has multiple return instructions). Additional information about the pid provider can be found here.

A first DTrace script for Ruby

Let us now have a look at a first DTrace script for Ruby that will tell us when a Ruby method starts and stops executing, along with the method’s execution time. We will be running our DTrace script against the simple Ruby program shown below.

# sleepy.rb
def even(rnd)
  sleep(rnd)
end

def odd(rnd)
  sleep(rnd)
end

loop do
  rnd = rand(4)
  (rnd % 2 == 0) ? even(rnd) : odd(rnd)
end

Our simple Ruby program is clearly not going to win any awards. It is just one endless loop, each iteration of which calls a method depending on whether a random number was even or odd. While this is obviously a very contrived example, we can nevertheless make great use of it to illustrate the power of DTrace.

/* sleepy.d */
ruby$target:::method-entry
{
  self->start = timestamp;
  printf("Entering Method: class: %s, method: %s, file: %s, line: %d\n", copyinstr(arg0), copyinstr(arg1), copyinstr(arg2), arg3);
}

ruby$target:::method-return
{
  printf("Returning After: %d nanoseconds\n", (timestamp - self->start));
}

The above DTrace script has us using two Ruby specific DTrace probes. The method-entry probe fires whenever a Ruby method is entered; the method-return probe fires whenever a Ruby method returns. Each probe can take multiple arguments. A probe’s arguments are available in the DTrace script through the arg0, arg1, arg2 and arg3 variables.

If we want to know what data is contained by a probe’s arguments, all we have to do is look at its documentation. In this particular case, we can see that the method-entry probe gets called by the Ruby process with exactly four arguments.

ruby:::method-entry(classname, methodname, filename, lineno);

  • classname: name of the class (a string)
  • methodname: name of the method about to be executed (a string)
  • filename: the file name where the method is being called (a string)
  • lineno: the line number where the method is being called (an int)

The documentation tells us that arg0 holds the class name, arg1 holds the method name, and so on. Equally important is that it tells us the first three arguments are strings, while the fourth one is an integer. We’ll need this information for when we want to print any of these arguments with printf.

You probably noticed that we are wrapping string variables inside the copyinstr method. The reason for this is a bit complex. When a string gets passed as an argument to a DTrace probe, we don’t actually pass the entire string. Instead, we only pass the memory address where the string begins. This memory address will be specific to the address space of the Ruby process. However, DTrace probes are executed in the kernel and thus make use of a different address space than our Ruby process. In order for a probe to read a string residing in user process data, it first needs to copy this string into the kernel’s address space. The copyinstr method is a built-in DTrace function that takes care of this copying for us.

The self->start notation is interesting as well. DTrace variables starting with self-> are thread-local variables. Thread-local variables are useful when you want to tag every thread that fired a probe with some data. In our case we are using self->start = timestamp; to tag every thread that triggers the method-entry probe with a thread-local start variable that contains the time in nanoseconds returned by the built-in timestamp method.

While it is impossible for one thread to access the thread-local variables of another thread, it is perfectly possible for a given probe to access the thread-local variables that were set on the current thread by a different probe. Looking at our DTrace script, you can see that the thread-local self->start variable is being shared between both the method-entry and method-return probes.

Let’s go ahead and run the above DTrace script on our Ruby program.

$ sudo dtrace -q -s sleepy.d -c 'ruby sleepy.rb'

Entering Method: class: RbConfig, method: expand, file: /Users/vaneyckt/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/x86_64-darwin14/rbconfig.rb, line: 241
Returning After: 39393 nanoseconds
Entering Method: class: RbConfig, method: expand, file: /Users/vaneyckt/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/x86_64-darwin14/rbconfig.rb, line: 241
Returning After: 12647 nanoseconds
Entering Method: class: RbConfig, method: expand, file: /Users/vaneyckt/.rvm/rubies/ruby-2.2.3/lib/ruby/2.2.0/x86_64-darwin14/rbconfig.rb, line: 241
Returning After: 11584 nanoseconds
...
Entering Method: class: Object, method: odd, file: sleepy.rb, line: 5
Returning After: 1003988894 nanoseconds
Entering Method: class: Object, method: odd, file: sleepy.rb, line: 5
Returning After: 1003887374 nanoseconds
Entering Method: class: Object, method: even, file: sleepy.rb, line: 1
Returning After: 15839 nanoseconds

It’s a bit hard to convey in the snippet above, but our DTrace script is generating well over a thousand lines of output. These lines can be divided into two sections: a first section listing all the Ruby methods being called as part of the program getting ready to run, and a much smaller second section listing whether our program is calling the even or odd functions, along with the time spent in each of these function calls.

While the above output gives us a great amount of detail about what our Ruby program is doing, we really only want to gather information about the even and odd methods being called. DTrace uses predicates to make just this type of filtering possible. Predicates are / wrapped conditions that define whether a particular probe should be executed. The code below shows the usage of predicates to only have the method-entry and method-return probes triggered by the even and odd methods being called.

/* predicates_sleepy.d */
ruby$target:::method-entry
/copyinstr(arg1) == "even" || copyinstr(arg1) == "odd"/
{
  self->start = timestamp;
  printf("Entering Method: class: %s, method: %s, file: %s, line: %d\n", copyinstr(arg0), copyinstr(arg1), copyinstr(arg2), arg3);
}

ruby$target:::method-return
/copyinstr(arg1) == "even" || copyinstr(arg1) == "odd"/
{
  printf("Returning After: %d nanoseconds\n", (timestamp - self->start));
}
$ sudo dtrace -q -s predicates_sleepy.d -c 'ruby sleepy.rb'

Entering Method: class: Object, method: odd, file: sleepy.rb, line: 5
Returning After: 3005086754 nanoseconds
Entering Method: class: Object, method: even, file: sleepy.rb, line: 1
Returning After: 2004313007 nanoseconds
Entering Method: class: Object, method: even, file: sleepy.rb, line: 1
Returning After: 2005076442 nanoseconds
Entering Method: class: Object, method: even, file: sleepy.rb, line: 1
Returning After: 21304 nanoseconds
...

Running our modified DTrace script, we see that this time around we are only triggering our probes when entering into and returning from the even and odd methods. Now that we have learned a fair few DTrace basics, we can move on to the more advanced topic of writing a DTrace script that will allow us to measure mutex contention in Ruby programs.

Monitoring mutex contention with DTrace

The goal of this section is to come up with a DTrace script that measures mutex contention in a multi-threaded Ruby program. This is far from a trivial undertaking and will require us to go and investigate the source code of the Ruby language itself. However, before we get to that, let’s first take a look at the Ruby program that we will analyze with the DTrace script that we are going to write in this section.

# mutex.rb
mutex = Mutex.new
threads = []

threads << Thread.new do
  loop do
    mutex.synchronize do
      sleep 2
    end
  end
end

threads << Thread.new do
  loop do
    mutex.synchronize do
      sleep 4
    end
  end
end

threads.each(&:join)

The above Ruby code starts by creating a mutex object, after which it kicks off two threads. Each thread runs an infinite loop that causes the thread to grab the mutex for a short while before releasing it again. Since the second thread is holding onto the mutex for longer than the first thread, it is intuitively obvious that the first thread will spend a fair amount of time waiting for the second thread to release the mutex.

Our goal is to write a DTrace script that tracks when a given thread has to wait for a mutex to become available, as well as which particular thread is holding the mutex at that point in time. To the best of my knowledge, it is impossible to obtain this contention information by monkey patching the Mutex object, which makes this a great showcase for DTrace. Please get in touch if you think I am wrong on this.

In order for us to write a DTrace script that does the above, we first need to figure out what happens when a thread calls synchronize on a Mutex object. However, mutexes and their methods are implemented as part of the Ruby language itself. This means we are going to have to go and take a look at the Ruby MRI source code, which is written in C. Do not worry if you’ve never used C. We’ll focus on only those parts relevant to our use case.

Let’s start at the beginning and look closely at what happens when you call synchronize on a Mutex object. We’ll take this step by step:

  1. synchronize (source) calls rb_mutex_synchronize_m
  2. rb_mutex_synchronize_m (source) checks if synchronize was called with a block and then goes on to call rb_mutex_synchronize
  3. rb_mutex_synchronize (source) calls rb_mutex_lock
  4. rb_mutex_lock (source) is where the currently active Ruby thread that executed the mutex.synchronize code will try to grab the mutex

There’s a lot going on in rb_mutex_lock. The one thing that we are especially interested in is the call to rb_mutex_trylock (source) on line 252. This method immediately returns true or false depending on whether the Ruby thread managed to grab the mutex. By following the code from line 252 onwards, we can see that rb_mutex_trylock returning true causes rb_mutex_lock to immediately return. On the other hand, rb_mutex_trylock returning false causes rb_mutex_lock to keep executing (and occasionally blocking) until the Ruby thread has managed to get a hold of the mutex.

This is actually all we needed to know in order to be able to go and write our DTrace script. Our investigation showed that when a thread starts executing rb_mutex_lock, this means it wants to acquire a mutex. And when a thread returns from rb_mutex_lock, we know that it managed to successfully obtain this lock. In a previous section, we saw how DTrace allows us to set probes that fire upon entering into or returning from a particular method. We will now use this to write our DTrace script.

Let’s go over what exactly our DTrace script should do:

  1. when our Ruby program calls mutex.synchronize, we want to make a note of which particular file and line these instructions appear on. We will see later how this allows us to pinpoint problematic code.
  2. when rb_mutex_lock starts executing, we want to write down the current timestamp, as this is when the thread starts trying to acquire the mutex
  3. when rb_mutex_lock returns, we want to compare the current timestamp with the one we wrote down earlier, as this tells us how long the thread had to wait trying to acquire the mutex. We then want to print this duration, along with some information about the location of this particular mutex.synchronize call, to the terminal.

Putting it all together, we end up with a DTrace script like shown below.

/* mutex.d */
ruby$target:::cmethod-entry
/copyinstr(arg0) == "Mutex" && copyinstr(arg1) == "synchronize"/
{
  self->file = copyinstr(arg2);
  self->line = arg3;
}

pid$target:ruby:rb_mutex_lock:entry
/self->file != NULL && self->line != NULL/
{
  self->mutex_wait_start = timestamp;
}

pid$target:ruby:rb_mutex_lock:return
/self->file != NULL && self->line != NULL/
{
  mutex_wait_ms = (timestamp - self->mutex_wait_start) / 1000;
  printf("Thread %d acquires mutex %d after %d ms - %s:%d\n", tid, arg1, mutex_wait_ms, self->file, self->line);
  self->file = NULL;
  self->line = NULL;
}

The snippet above contains three different probes, the first of which is a Ruby probe that fires whenever a C method is entered. Since the Mutex class and its methods have been implemented in C as part of the Ruby MRI, it makes sense for us to use a cmethod-entry probe. Note how we use a predicate to ensure this probe only gets triggered when its first two arguments are “Mutex” and “synchronize”. We covered earlier how these arguments correspond to the class and method name of the Ruby code that triggered the probe. So this predicate guarantees that this particular probe will only fire when our Ruby code calls the synchronize method on a Mutex object.

The rest of this probe is rather straightforward. The only thing we are doing is storing the file and line number of the Ruby code that triggered the probe into thread-local variables. We are using thread-local variables for two reasons. Firstly, thread-local variables make it trivial to share data with other probes. Secondly, Ruby programs that make use of mutexes will generally be running multiple threads. Using thread-local variables ensures that each Ruby thread will get its own set of probe-specific variables.

Our second probe comes from the pid provider. This provider supplies us with probes for every internal method of a process. In this case we want to use it to get notified whenever rb_mutex_lock starts executing. We saw earlier that a thread will invoke this method when starting to acquire a mutex. The probe itself is pretty simple in that it just stores the current time in a thread-local variable, so as to keep track of when a thread started trying to obtain a mutex. We also use a simple predicate that ensures this probe can only be triggered after the previous probe has fired.

The final probe fires whenever rb_mutex_lock finishes executing. It has a similar predicate as the second probe so as to ensure it can only be triggered after the first probe has fired. We saw earlier how rb_mutex_lock returns whenever a thread has successfully obtained a lock. We can easily calculate the time spent waiting on this lock by comparing the current time with the previously stored self->mutex_wait_start variable. We then print the time spent waiting, along with the IDs of the current thread and mutex, as well as the location of where the call to mutex.synchronize took place. We finish this probe by assigning NULL to the self->file and self->line variables, so as to ensure that the second and third probe can only be triggered after the first one has fired again.

In case you are wondering about how exactly the thread and mutex IDs are obtained, tid is a built-in DTrace variable that identifies the current thread. A pid:::return probe stores the return value of the method that triggered it inside its arg1 variable. The rb_mutex_lock method just happens to return an identifier for the mutex that was passed to it, so the arg1 variable of this probe does in fact contain the mutex ID.

The final result looks a lot like this.

$ sudo dtrace -q -s mutex.d -c 'ruby mutex.rb'

Thread 286592 acquires mutex 4313316240 after 2 ms - mutex.rb:14
Thread 286591 acquires mutex 4313316240 after 4004183 ms - mutex.rb:6
Thread 286592 acquires mutex 4313316240 after 2004170 ms - mutex.rb:14
Thread 286592 acquires mutex 4313316240 after 6 ms - mutex.rb:14
Thread 286592 acquires mutex 4313316240 after 4 ms - mutex.rb:14
Thread 286592 acquires mutex 4313316240 after 4 ms - mutex.rb:14
Thread 286591 acquires mutex 4313316240 after 16012158 ms - mutex.rb:6
Thread 286592 acquires mutex 4313316240 after 2002593 ms - mutex.rb:14
Thread 286591 acquires mutex 4313316240 after 4001983 ms - mutex.rb:6
Thread 286592 acquires mutex 4313316240 after 2004418 ms - mutex.rb:14
Thread 286591 acquires mutex 4313316240 after 4000407 ms - mutex.rb:6
Thread 286592 acquires mutex 4313316240 after 2004163 ms - mutex.rb:14
Thread 286591 acquires mutex 4313316240 after 4003191 ms - mutex.rb:6
Thread 286591 acquires mutex 4313316240 after 2 ms - mutex.rb:6
Thread 286592 acquires mutex 4313316240 after 4005587 ms - mutex.rb:14
...

We can get some interesting info about our program just by looking at the above output:

  1. there are two threads: 286591 and 286592
  2. both threads try to acquire mutex 4313316240
  3. the mutex acquisition code of the fist thread lives at line 6 of the mutex.rb file
  4. the acquisition code of the second thread is located at line 14 of the same file
  5. there is a lot of mutex contention, with threads having to wait several seconds for the mutex to become available

Of course we already knew all of the above was going to happen, as we were familiar with the source code of our Ruby program. The real power of DTrace lies in how we can now go and run our mutex.d script against any Ruby program, no matter how complex, and obtain this level of information without having to read any source code at all. We can even go one step further and run our mutex contention script against an already running Ruby process with sudo dtrace -q -s mutex.d -p <pid>. This can even be run against active production code with minimal overhead.

Before moving on to the next section, I’d just like to point out that the above DTrace output actually tells us some cool stuff about how the Ruby MRI schedules threads. If you look at lines 3-6 of the output, you’ll notice that the second thread gets scheduled four times in a row. This tells us that when multiple threads are competing for a mutex, the Ruby MRI does not care if a particular thread recently held the mutex.

Advanced mutex contention monitoring

We can take the above DTrace script one step further by adding an additional probe that triggers whenever a thread releases a mutex. We will also be slightly altering the output of our script so as to print timestamps rather than durations. While this will make the script’s output less suitable for direct human consumption, this timestamp information will make it easier to construct a chronological sequence of the goings-on of our mutexes.

Note that the above doesn’t mean that we no longer care about producing output suitable for humans. We’ll see how we can easily write a Ruby script to aggregate this new output into something a bit more comprehensive. As an aside, DTrace actually has built-in logic for aggregating data, but I personally prefer to focus my DTrace usage on obtaining data that would otherwise be hard to get, while having my aggregation logic live somewhere else.

Let’s start by having a look at how to add a probe that can detect a mutex being released. Luckily, this turns out to be relatively straightforward. It turns out there is a C method called rb_mutex_unlock (source) that releases mutexes. Similarly to rb_mutex_lock, this method returns an identifier to the mutex it acted on. So all we need to do is add a probe that fires whenever rb_mutex_unlock returns. Our final script looks like this.

/* mutex.d */
ruby$target:::cmethod-entry
/copyinstr(arg0) == "Mutex" && copyinstr(arg1) == "synchronize"/
{
  self->file = copyinstr(arg2);
  self->line = arg3;
}

pid$target:ruby:rb_mutex_lock:entry
/self->file != NULL && self->line != NULL/
{
  printf("Thread %d wants to acquire mutex %d at %d - %s:%d\n", tid, arg1, timestamp, self->file, self->line);
}

pid$target:ruby:rb_mutex_lock:return
/self->file != NULL && self->line != NULL/
{
  printf("Thread %d has acquired mutex %d at %d - %s:%d\n", tid, arg1, timestamp, self->file, self->line);
  self->file = NULL;
  self->line = NULL;
}

pid$target:ruby:rb_mutex_unlock:return
{
  printf("Thread %d has released mutex %d at %d\n", tid, arg1, timestamp);
}
$ sudo dtrace -q -s mutex.d -c 'ruby mutex.rb'

Thread 500152 wants to acquire mutex 4330240800 at 53341356615492 - mutex.rb:6
Thread 500152 has acquired mutex 4330240800 at 53341356625449 - mutex.rb:6
Thread 500153 wants to acquire mutex 4330240800 at 53341356937292 - mutex.rb:14
Thread 500152 has released mutex 4330240800 at 53343360214311
Thread 500152 wants to acquire mutex 4330240800 at 53343360266121 - mutex.rb:6
Thread 500153 has acquired mutex 4330240800 at 53343360301928 - mutex.rb:14
Thread 500153 has released mutex 4330240800 at 53347365475537
Thread 500153 wants to acquire mutex 4330240800 at 53347365545277 - mutex.rb:14
Thread 500152 has acquired mutex 4330240800 at 53347365661847 - mutex.rb:6
Thread 500152 has released mutex 4330240800 at 53349370397555
Thread 500152 wants to acquire mutex 4330240800 at 53349370426972 - mutex.rb:6
Thread 500153 has acquired mutex 4330240800 at 53349370453489 - mutex.rb:14
Thread 500153 has released mutex 4330240800 at 53353374785751
Thread 500153 wants to acquire mutex 4330240800 at 53353374834184 - mutex.rb:14
Thread 500152 has acquired mutex 4330240800 at 53353374868435 - mutex.rb:6
...

The above output is pretty hard to parse for a human reader. The snippet below shows a Ruby program that aggregates this data into a more readable format. I’m not going to go into the details of this Ruby program as it is essentially just a fair bit of string filtering with some bookkeeping to help keep track of how each thread interacts with the mutexes and the contention this causes.

# aggregate.rb
mutex_owners     = Hash.new
mutex_queuers    = Hash.new { |h,k| h[k] = Array.new }
mutex_contention = Hash.new { |h,k| h[k] = Hash.new(0) }

time_of_last_update = Time.now
update_interval_sec = 1

ARGF.each do |line|
  # when a thread wants to acquire a mutex
  if matches = line.match(/^Thread (\d+) wants to acquire mutex (\d+) at (\d+) - (.+)$/)
    captures  = matches.captures
    thread_id = captures[0]
    mutex_id  = captures[1]
    timestamp = captures[2].to_i
    location  = captures[3]

    mutex_queuers[mutex_id] << {
      thread_id: thread_id,
      location:  location,
      timestamp: timestamp
    }
  end

  # when a thread has acquired a mutex
  if matches = line.match(/^Thread (\d+) has acquired mutex (\d+) at (\d+) - (.+)$/)
    captures  = matches.captures
    thread_id = captures[0]
    mutex_id  = captures[1]
    timestamp = captures[2].to_i
    location  = captures[3]

    # set new owner
    mutex_owners[mutex_id] = {
      thread_id: thread_id,
      location: location
    }

    # remove new owner from list of queuers
    mutex_queuers[mutex_id].delete_if do |queuer|
      queuer[:thread_id] == thread_id &&
      queuer[:location] == location
    end
  end

  # when a thread has released a mutex
  if matches = line.match(/^Thread (\d+) has released mutex (\d+) at (\d+)$/)
    captures  = matches.captures
    thread_id = captures[0]
    mutex_id  = captures[1]
    timestamp = captures[2].to_i

    owner_location = mutex_owners[mutex_id][:location]

    # calculate how long the owner caused each queuer to wait
    # and change queuer timestamp to the current timestamp in preparation
    # for the next round of queueing
    mutex_queuers[mutex_id].each do |queuer|
      mutex_contention[owner_location][queuer[:location]] += (timestamp - queuer[:timestamp])
      queuer[:timestamp] = timestamp
    end
  end

  # print mutex contention information
  if Time.now - time_of_last_update > update_interval_sec
    system('clear')
    time_of_last_update = Time.now

    puts 'Mutex Contention'
    puts "================\n\n"

    mutex_contention.each do |owner_location, contention|
      puts owner_location
      owner_location.length.times { print '-' }
      puts "\n"

      total_duration_sec = 0.0
      contention.sort.each do |queuer_location, queueing_duration|
        duration_sec = queueing_duration / 1000000000.0
        total_duration_sec += duration_sec
        puts "#{queuer_location}\t#{duration_sec}s"
      end
      puts "total\t\t#{total_duration_sec}s\n\n"
    end
  end
end
$ sudo dtrace -q -s mutex.d -c 'ruby mutex.rb' | ruby aggregate.rb


Mutex Contention
================

mutex.rb:6
----------
mutex.rb:14	  10.016301065s
total		  10.016301065s

mutex.rb:14
-----------
mutex.rb:6	  16.019252339s
total		  16.019252339s

The final result looks like shown above. Note that our program will clear the terminal every second before printing summarized contention information. Here we see that after running the program for a bit, mutex.rb:6 caused mutex.rb:14 to spend about 10 seconds waiting for the mutex to become available. The total field indicates the total amount of waiting across all other threads caused by mutex.rb:6. This number becomes more useful when there are more than two threads competing for a single mutex.

I want to stress that while the example shown here was kept simple on purpose, our code is in fact more than capable of handling more complex scenarios. For example, let’s have a look at some Ruby code that uses multiple mutexes, some of which are nested.

# mutex.rb
mutexes =[Mutex.new, Mutex.new]
threads = []

threads << Thread.new do
  loop do
    mutexes[0].synchronize do
      sleep 2
    end
  end
end

threads << Thread.new do
  loop do
    mutexes[1].synchronize do
      sleep 2
    end
  end
end

threads << Thread.new do
  loop do
    mutexes[1].synchronize do
      sleep 1
    end
  end
end

threads << Thread.new do
  loop do
    mutexes[0].synchronize do
      sleep 1
      mutexes[1].synchronize do
        sleep 1
      end
    end
  end
end

threads.each(&:join)
$ sudo dtrace -q -s mutex.d -c 'ruby mutex.rb' | ruby aggregate.rb


Mutex Contention
================

mutex.rb:6
----------
mutex.rb:30	  36.0513079s
total		  36.0513079s

mutex.rb:14
-----------
mutex.rb:22	  78.123187353s
mutex.rb:32	  36.062005125s
total		  114.185192478s

mutex.rb:22
-----------
mutex.rb:14	  38.127435904s
mutex.rb:32	  19.060814411s
total		  57.188250315000005s

mutex.rb:32
-----------
mutex.rb:14	  24.073966949s
mutex.rb:22	  24.073383955s
total		  48.147350904s

mutex.rb:30
-----------
mutex.rb:6	  103.274153073s
total		  103.274153073s

The above output tells us very clearly that we should concentrate our efforts on lines 14 and 30 when we want to try to make our program faster. The really cool thing about all this is that this approach will work regardless of the complexity of your program and requires absolutely no familiarity with the source code at all. You can literally run this against code you’ve never seen and walk away with a decent idea of where the mutex bottlenecks are located. And on top of that, since we are using DTrace, we don’t even have to add any instrumentation code to the program we want to investigate. Instead, we can just run this against an already active process without even having to interrupt it.

Conclusion

I hope to have convinced you that DTrace is a pretty amazing tool that can open up whole new ways of trying to approach a problem. There is so so much I haven’t even touched on yet. The topic is just too big to cover in a single post. If you’re interested in learning DTrace, here are some resources I can recommend:

Just one more thing before I finish this. If you’re on OS X and encounter DTrace complaining about not being able to control executables signed with restricted entitlements, be aware that you can easily work around this by using the -p parameter to directly specify the pid of the process you want DTrace to run against. Please contact me if you manage to find the proper fix for this.


Ruby concurrency: in praise of the mutex

When reading about Ruby you will inevitably be introduced to the Global Interpreter Lock. This mechanism tends to come up in explanations of why Ruby threads run concurrently on a single core, rather than being scheduled across multiple cores in true parallel fashion. This single core scheduling approach also explains why adding threads to a Ruby program does not necessarily result in faster execution times.

This post will start by explaining some of the details behind the GIL. Next up, we’ll take a look at the three crucial concepts of concurrency: atomicity, visibility, and ordering. While most developers are familiar with atomicity, the concept of visibility is often not very well understood. We will be going over these concepts in quite some detail and will illustrate how to address their needs through correct usage of the mutex data structure.

Parallelism and the GIL

Ruby’s Global Interpreter Lock is a global lock around the execution of Ruby code. Before a Ruby thread can execute any code, it first needs to acquire this lock. A thread holding the GIL will be forced to release it after a certain amount of time, at which point the kernel can hand the GIL to another Ruby thread. As the GIL can only be held by one thread at a time, it effectively prevents two Ruby threads from being executed at the same time.

Luckily Ruby comes with an optimization that forces threads to let go off the GIL when they find themselves waiting on blocking IO to complete. Such threads will use the ppoll system call to be notified when their blocking IO has finished. Only then will they make an attempt to reacquire the GIL again. This type of behavior holds true for all blocking IO calls, as well as backtick and system calls. So even with the Global Interpreter Lock, Ruby is still able to have moments of true parallelism.

Note that the GIL is specific to the default Ruby interpreter (MRI) which relies on a global lock to protect its internals from race conditions. The GIL also makes it possible to safely interface the MRI interpreter with C libraries that may not be thread-safe themselves. Other interpreters have taken different approaches to the concept of a global lock; Rubinius opts for a collection of fine-grained locks instead of a single global one, whereas JRuby does not use global locking at all.

Concurrency and the Mutex

There are three crucial concepts to concurrency: atomicity, visibility, and ordering. We’ll be taking a look at how Ruby’s mutex data structure addresses these. It is worth pointing out that different languages tackle these concepts in different ways. As such, the mutex-centric approach described here is only guaranteed to work in Ruby.

Atomicity

Atomicity is probably the best-known concurrency concept. A section of code is said to atomically modify the state of an object if all other threads are unable to see any of the intermediate states of the object being modified. These other threads either see the state as it was before the operation, or they see the state as it is after the operation.

In the example below we have created a counters array that holds ten entries, each of which is set to zero. This array represents an object that we want to modify, and its entries represent its internal state. Let’s say we have five threads, each of which executes a loop for 100.000 iterations that increments every entry by one. Intuitively we’d expect the output of this to be an array with each entry set to 500.000. However, as we can see below, this is not the case.

# atomicity.rb
counters = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

threads = 5.times.map do
  Thread.new do
    100000.times do
      counters.map! { |counter| counter + 1 }
    end
  end
end
threads.each(&:join)

puts counters.to_s
# => [500000, 447205, 500000, 500000, 500000, 500000, 203656, 500000, 500000, 500000]

The reason for this unexpected output is that counters.map! { |counter| counter + 1 } is not atomic. For example, imagine that our first thread has just read the value of the first entry, incremented it by one, and is now getting ready to write this incremented value to the first entry of our array. However, before our thread can write this incremented value, it gets interrupted by the second thread. This second thread then goes on to read the current value of the first entry, increments it by one, and succeeds in writing the result back to the first entry of our array. Now we have a problem!

We have a problem because the first thread got interrupted before it had a chance to write its incremented value to the array. When the first thread resumes, it will end up overwriting the value that the second thread just placed in the array. This will cause us to essentially lose an increment operation, which explains why our program output has entries in it that are less than 500.000.

It should hopefully be clear that none of this would have happened if we had made sure that counters.map! { |counter| counter + 1 } was atomic. This would have made it impossible for the second thread to just come in and modify the intermediate state of the counters array.

# atomicity.rb
mutex = Mutex.new
counters = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

threads = 5.times.map do
  Thread.new do
    100000.times do
      mutex.synchronize do
        counters.map! { |counter| counter + 1 }
      end
    end
  end
end
threads.each(&:join)

puts counters.to_s
# => [500000, 500000, 500000, 500000, 500000, 500000, 500000, 500000, 500000, 500000]

Atomicity can be accomplished by using a mutex as a locking mechanism that ensures no two threads can simultaneously execute the same section of code. The code above shows how we can prevent a thread executing counters.map! { |counter| counter + 1 } from being interrupted by other threads wanting to execute the same code. Also, be sure to note that mutex.synchronize only prevents a thread from being interrupted by others wanting to execute code wrapped inside the same mutex variable!

Visibility

Visibility determines when the results of the actions performed by a thread become visible to other threads. For example, when a thread wants to write an updated value to memory, that updated value may end up being put in a cache for a while until the kernel decides to flush it to main memory. Other threads that read from that memory will therefore end up with a stale value!

The code below shows an example of the visibility problem. Here we have several threads flipping the boolean values in the flags array over and over again. The code responsible for changing these values is wrapped inside a mutex, so we know the intermediate states of the flags array won’t be visible to other threads. We would thus expect the output of this program to contain the same boolean value for every entry of this array. However, we shall soon see that this does not always hold true.

# visibility.rb
mutex = Mutex.new
flags = [false, false, false, false, false, false, false, false, false, false]

threads = 50.times.map do
  Thread.new do
    100000.times do
      puts flags.to_s
      mutex.synchronize do
        flags.map! { |f| !f }
      end
    end
  end
end
threads.each(&:join)
$ ruby visibility.rb > visibility.log
$ grep -Hnri 'true, false' visibility.log | wc -l
    30

This code will produce five million lines of output. We’ll use the > operator to write all these lines to a file. Having done this, we can then grep for inconsistencies in the output. We would expect every line of the output to contain an array with all its entries set to the same boolean value. However, it turns out that this only holds true for 99.9994% of all lines. Sometimes the flipped boolean values don’t get written to memory fast enough, causing other threads to read stale data. This is a great illustration of the visibility problem.

Luckily we can solve this problem by using a memory barrier. A memory barrier enforces an ordering constraint on memory operations thereby preventing the possibility of reading stale data. In Ruby, a mutex not only acts as an atomic lock, but also functions as a memory barrier. When wanting to read the value of a variable being modified by multiple threads, a memory barrier will effectively tell your program to wait until all in-flight memory writes are complete. In practice this means that if we use a mutex when writing to a variable, we need to use this same mutex when reading from that variable as well.

# visibility.rb
mutex = Mutex.new
flags = [false, false, false, false, false, false, false, false, false, false]

threads = 50.times.map do
  Thread.new do
    100000.times do
      mutex.synchronize do
        puts flags.to_s
      end
      mutex.synchronize do
        flags.map! { |f| !f }
      end
    end
  end
end
threads.each(&:join)
$ ruby visibility.rb > visibility.log
$ grep -Hnri 'true, false' visibility.log | wc -l
    0

As expected, this time we found zero inconsistencies in the output data due to us using the same mutex for both reading and writing the boolean values of the flags array. Do keep in mind that not all languages allow for using a mutex as a memory barrier, so be sure to check the specifics of your favorite language before going off to write concurrent code.

Ordering

As if dealing with visibility isn’t hard enough, the Ruby interpreter is also allowed to change the order of the instructions in your code in an attempt at optimization. Before I continue I should point out that there is no official specification for the Ruby language. This can make it hard to find information about topics such as this. So I’m just going to describe how I think instruction reordering currently works in Ruby.

Your Ruby code gets compiled to bytecode by the Ruby interpreter. The interpreter is free to reorder your code in an attempt to optimize it. This bytecode will then generate a set of CPU instructions, which the CPU is free to reorder as well. I wasn’t able to come up with example code that actually showcases this reordering behavior, so this next bit is going to be somewhat hand-wavy. Let’s say we were given the code shown below (original source).

# ordering.rb
a = false
b = false
threads = []

thr1 = Thread.new do
  a = true
  b = true
end

thr2 = Thread.new do
  r1 = b # could see true
  r2 = a # could see false
  r3 = a # could see true
  puts (r1 && !r2) && r3 # could print true
end

thr1.join
thr2.join

Since there are a lot of ways for instruction reordering to take place, it is not impossible for b = true to be executed before a = true. In theory, this could therefore allow for thr2 to end up outputting true. This is rather counterintuitive, as this would only be possible if the variable b had changed value before the variable a.

Luckily there is no need to worry too much about this. When looking at the code above, it should be obvious that code reordering is going to be the least of its problems. The lack of any kind of synchronization to help deal with atomicity and visibility issues in this threaded program is going to cause way bigger headaches than code reordering ever could.

Those synchronization issues can be fixed by using a mutex. By introducing a mutex we are explicitly telling the interpreter and CPU how our code should behave, thus preventing any problematic code reordering from occurring. Dealing with atomicity and visibility issues will therefore implicitly prevent any dangerous code reordering.

Conclusion

I hope this post has helped show just how easy it can be to introduce bugs in concurrent code. In my experience, the concept of memory barriers is often poorly understood, which can result in introducing some incredibly hard to find bugs. Luckily, as we saw in this post, the mutex data structure can be a veritable panacea for addressing these issues in Ruby.

Please feel free to contact me if you think I got anything wrong. While all of the above is correct to the best of my knowledge, the lack of an official Ruby specification can make it hard to locate information that is definitively without error.


How to write your own rspec retry mechanism

Imagine you have an rspec test suite filled with system tests. Each system test simulates how a real user would interact with your app by opening a browser session through which it fills out text fields, clicks on buttons, and sends data to public endpoints. Unfortunately, browser drivers are not without bugs and sometimes your tests will fail because of these. Wouldn’t it be nice if we could automatically retry these failed tests?

This article starts by investigating how rspec formatters can be used to help us keep track of failed tests. Next, we’ll use this information to take a first stab at creating a rake task that can automatically retry failed tests. Lastly, we’ll explore how to further improve our simple rake task so as to make it ready for use in production.

Note that any code shown in this post is only guaranteed to work with rspec 3.3. In the past I’ve written similar code for other rspec versions as well though. So don’t worry, it shouldn’t be too hard to get all of this to work on whatever rspec version you find yourself using.

Rspec formatters

Rspec generates its command line output by relying on formatters that receive messages on events like example_passed and example_failed. We can use these hooks to help us keep track of failed tests by having them write the descriptions of failed tests to a text file named tests_failed. Our FailureFormatter class does just that.

# failure_formatter.rb
require 'rspec/core/formatters/progress_formatter'

class FailureFormatter < RSpec::Core::Formatters::ProgressFormatter
  RSpec::Core::Formatters.register self, :example_failed

  def example_failed(notification)
    super
    File.open('tests_failed', 'a') do |file|
      file.puts(notification.example.full_description)
    end
  end
end

We’ll soon have a look at how tests behave when we try to run them with the formatter shown above. But first, let’s prepare some example tests. We’ll create two tests. One of which will always pass, and another one which will always fail.

# my_fake_tests_spec.rb
describe 'my fake tests', :type => :feature do

  it 'this scenario should pass' do
    expect(true).to eq true
  end

  it 'this scenario should fail' do
    expect(false).to eq true
  end
end

Having done that, we can now run our tests with the FailureFormatter we wrote earlier. As you can see below, we’ll have to pass both --require and --format params in order to get our formatter to work. I’m also using the --no-fail-fast flag so as to prevent our test suite from exiting upon encountering its first failure.

$ bundle exec rspec --require ./spec/formatters/failure_formatter.rb --format FailureFormatter --no-fail-fast
.F

Failures:

  1) my fake tests this scenario should fail
     Failure/Error: expect(false).to eq true

       expected: true
            got: false

       (compared using ==)
     # ./spec/my_fake_tests_spec.rb:8:in `block (2 levels) in <top (required)>'

Finished in 0.02272 seconds (files took 0.0965 seconds to load)
2 examples, 1 failure

Failed examples:

rspec ./spec/my_fake_tests_spec.rb:7 # my fake tests this scenario should fail

After running this, we should now have a tests_failed file that contains a single line describing our failed test. As we can see in the snippet below, this is indeed the case.

$ cat tests_failed

my fake tests this scenario should fail

Take a moment to reflect on what we have just done. By writing just a few lines of code we have effectively created a logging mechanism that will help us keep track of failed tests. In the next section we will look at how we can make use of this mechanism to automatically rerun failed tests.

First pass at creating the retry task

In this section we will create a rake task that runs our rspec test suite and automatically retries any failed tests. The finished rake task is shown below. For now, have a look at this code and then we’ll go over its details in the next few paragraphs.

require 'fileutils'

task :rspec_with_retries, [:max_tries] do |_, args|
  max_tries = args[:max_tries].to_i

  # construct initial rspec command
  command = 'bundle exec rspec --require ./spec/formatters/failure_formatter.rb --format FailureFormatter --no-fail-fast'

  max_tries.times do |t|
    puts "\n"
    puts '##########'
    puts "### STARTING TEST RUN #{t + 1} OUT OF A MAXIMUM OF #{max_tries}"
    puts "### executing command: #{command}"
    puts '##########'

    # delete tests_failed file left over by previous run
    FileUtils.rm('tests_failed', :force => true)

    # run tests
    puts `#{command}`

    # early out
    exit 0 if $?.exitstatus.zero?
    exit 1 if (t == max_tries - 1)

    # determine which tests need to be run again
    failed_tests = []
    File.open('tests_failed', 'r') do |file|
      failed_tests = file.readlines.map { |line| "\"#{line.strip}\"" }
    end

    # construct command to rerun just the failed tests
    command  = ['bundle exec rspec']
    command += Array.new(failed_tests.length, '-e').zip(failed_tests).flatten
    command += ['--require ./spec/formatters/failure_formatter.rb --format FailureFormatter --no-fail-fast']
    command = command.join(' ')
  end
end

The task executes the bundle exec rspec command a max_tries number of times. The first iteration runs the full rspec test suite with the FailureFormatter class and writes the descriptions of failed tests to a tests_failed file. Subsequent iterations read from this file and use the -e option to rerun the tests listed there.

Note that these subsequent iterations use the FailureFormatter as well. This means that any tests that failed during the second iteration will get written to the tests_failed file to be retried by the third iteration. This continues until we reach the max number of iterations or until one of our iterations has all its tests pass.

Every iteration deletes the tests_failed file from the previous iteration. For this we use the FileUtils.rm method with the :force flag set to true. This flag ensures that the program doesn’t crash in case the tests_failed file doesn’t exist. The above code relies on backticks to execute the bundle exec rspec subprocess. Because of this we need to use the global variable $? to access the exit status of this subprocess.

Below you can see the output of a run of our rake task. Notice how the first iteration runs both of our tests, whereas the second and third iterations rerun just the failed test. This shows our retry mechanism is indeed working as expected.

$ rake rspec_with_retries[3]

##########
### STARTING TEST RUN 1 OUT OF A MAXIMUM OF 3
### executing command: bundle exec rspec --require ./spec/formatters/failure_formatter.rb --format FailureFormatter --no-fail-fast
##########
.F

Failures:

  1) my fake tests this scenario should fail
     Failure/Error: expect(false).to eq true

       expected: true
            got: false

       (compared using ==)
     # ./spec/my_fake_tests_spec.rb:8:in `block (2 levels) in <top (required)>'

Finished in 0.02272 seconds (files took 0.0965 seconds to load)
2 examples, 1 failure

Failed examples:

rspec ./spec/my_fake_tests_spec.rb:7 # my fake tests this scenario should fail


##########
### STARTING TEST RUN 2 OUT OF A MAXIMUM OF 3
### executing command: bundle exec rspec -e "my fake tests this scenario should fail" --require ./spec/formatters/failure_formatter.rb --format FailureFormatter --no-fail-fast
##########
Run options: include {:full_description=>/my\ fake\ tests\ this\ scenario\ should\ fail/}
F

Failures:

  1) my fake tests this scenario should fail
     Failure/Error: expect(false).to eq true

       expected: true
            got: false

       (compared using ==)
     # ./spec/my_fake_tests_spec.rb:8:in `block (2 levels) in <top (required)>'

Finished in 0.02286 seconds (files took 0.09094 seconds to load)
1 example, 1 failure

Failed examples:

rspec ./spec/my_fake_tests_spec.rb:7 # my fake tests this scenario should fail


##########
### STARTING TEST RUN 3 OUT OF A MAXIMUM OF 3
### executing command: bundle exec rspec -e "my fake tests this scenario should fail" --require ./spec/formatters/failure_formatter.rb --format FailureFormatter --no-fail-fast
##########
Run options: include {:full_description=>/my\ fake\ tests\ this\ scenario\ should\ fail/}
F

Failures:

  1) my fake tests this scenario should fail
     Failure/Error: expect(false).to eq true

       expected: true
            got: false

       (compared using ==)
     # ./spec/my_fake_tests_spec.rb:8:in `block (2 levels) in <top (required)>'

Finished in 0.02378 seconds (files took 0.09512 seconds to load)
1 example, 1 failure

Failed examples:

rspec ./spec/my_fake_tests_spec.rb:7 # my fake tests this scenario should fail

The goal of this section was to introduce the general idea behind our retry mechanism. There are however several shortcomings in the code that we’ve shown here. The next section will focus on identifying and fixing these.

Perfecting the retry task

The code in the previous section isn’t all that bad, but there are a few things related to the bundle exec rspec subprocess that we can improve upon. In particular, using backticks to initiate subprocesses has several downsides:

  • the standard output stream of the subprocess gets written into a buffer which we cannot print until the subprocess finishes
  • the standard error stream does not even get written to this buffer
  • the backticks approach does not return the id of the subprocess to us

This last downside is especially bad as not having the subprocess id makes it hard for us to cancel the subprocess in case the rake task gets terminated. This is why I prefer to use the childprocess gem for handling subprocesses instead.

require 'fileutils'
require 'childprocess'

task :rspec_with_retries, [:max_tries] do |_, args|
  max_tries = args[:max_tries].to_i

  # exit hook to ensure rspec process gets stopped when CTRL+C (SIGTERM is pressed)
  # needs to be set outside the times loop as otherwise each iteration would add its
  # own at_exit hook
  process = nil
  at_exit do
    process.stop unless process.nil?
  end

  # construct initial rspec command
  command = ['bundle', 'exec', 'rspec', '--require', './spec/formatters/failure_formatter.rb', '--format', 'FailureFormatter', '--no-fail-fast']

  max_tries.times do |t|
    puts "\n"
    puts '##########'
    puts "### STARTING TEST RUN #{t + 1} OUT OF A MAXIMUM OF #{max_tries}"
    puts "### executing command: #{command}"
    puts '##########'

    # delete tests_failed file left over by previous run
    FileUtils.rm('tests_failed', :force => true)

    # run tests in separate process
    process = ChildProcess.build(*command)
    process.io.inherit!
    process.start
    process.wait

    # early out
    exit 0 if process.exit_code.zero?
    exit 1 if (t == max_tries - 1)

    # determine which tests need to be run again
    failed_tests = []
    File.open('tests_failed', 'r') do |file|
      failed_tests = file.readlines.map { |line| line.strip }
    end

    # construct command to rerun just the failed tests
    command  = ['bundle', 'exec', 'rspec']
    command += Array.new(failed_tests.length, '-e').zip(failed_tests).flatten
    command += ['--require', './spec/formatters/failure_formatter.rb', '--format', 'FailureFormatter', '--no-fail-fast']
  end
end

As we can see from the line process = ChildProcess.build(*command), this gem makes it trivial to obtain the subprocess id. This then allows us to write an at_exit hook that shuts this subprocess down upon termination of our rake task. For example, using ctrl+c to cease the rake task will now cause the rspec subprocess to stop as well.

This gem also makes it super easy to inherit the stdout and stderr streams from the parent process (our rake task). This means that anything that gets written to the stdout and stderr streams of the subprocess will now be written directly to the stdout and stderr streams of our rake task. Or in other words, our rspec subprocess is now able to output directly to the rake task’s terminal session. Having made these improvements, our rspec_with_retries task is now ready for use in production.

Conclusion

I hope this post helped some people out there who find themselves struggling to deal with flaky tests. Please note that a retry mechanism such as this is really only possible because of rspec’s powerful formatters. Get in touch if you have any examples of other cool things built on top of this somewhat underappreciated feature!


The disaster that is Ruby's timeout method

On paper, Ruby’s timeout method looks like an incredibly useful piece of code. Ever had a network request occasionally slow down your entire program because it just wouldn’t finish? That’s where timeout comes in. It provides a hard guarantee that a block of code will be finished within a specified amount of time.

require 'timeout'

timeout(5) do
  # block of code that should be interrupted if it takes more than 5 seconds
end

There’s one thing the documentation doesn’t tell you though. If any of the lines in that block of code introduces side effects that rely on the execution of later lines of code to leave things in a stable state, then using the timeout method is a great way to introduce instability in your program. Examples of this include pretty much any program that is not entirely without stateful information. Let’s have a closer look at this method to try and figure out what’s going on here exactly.

Exceptions absolutely anywhere

The problem with timeout is that it relies upon Ruby’s questionable ability to have one thread raise an exception absolutely anywhere in an entirely different thread. The idea is that when you place code inside a timeout block, this code gets wrapped inside a new thread that executes in the background while the main thread goes to sleep for 5 seconds. Upon waking, the main thread grabs the background thread and forcefully stops it by raising a Timeout::Error exception on it (actual implementation).

# raising_exceptions.rb
# threads can raise exceptions in other threads
thr = Thread.new do
  puts '...initializing resource'
  sleep 1

  puts '...using resource'
  sleep 1

  puts '...cleaning resource'
  sleep 1
end

sleep 1.5
thr.raise('raising an exception in the thread')
$ ruby raising_exeptions.rb

...initializing resource
...using resource

The problem with this approach is that the main thread does not care what code the background thread is executing when it raises the exception. This means that the engineer responsible for the code that gets executed by the background thread needs to assume an exception can get thrown from absolutely anywhere within her code. This is madness! No one can be expected to place exception catchers around every single block of code!

The following code further illustrates the problem of being able to raise an exception absolutely anywhere. Turns out that absolutely anywhere includes locations like the inside of ensure blocks. These locations are generally not designed for handling any exceptions at all. I hope you weren’t using an ensure block to terminate your database connection!

# ensure_block.rb
# raising exceptions inside an ensure block of another thread
# note how we never finish cleaning the resource here
thr = Thread.new do
  begin
    puts '...initializing resource'
    sleep 1

    raise 'something went wrong'

    puts '...using resource'
    sleep 1
  ensure
    puts '...started cleaning resource'
    sleep 1
    puts '...finished cleaning resource'
  end
end

sleep 1.5
thr.raise('raising an exception in the thread')

# prevent program from immediately terminating after raising exception
sleep 5
$ ruby ensure_blocks.rb

...initializing resource
...started cleaning resource

Real world example

Recently, I spent a lot of time working with the curb http client. I ended up wrapping quite a few of my curb calls within timeout blocks because of tight time constraints. However, this caused great instability within the system I was working on. Sometimes a call would work, whereas other times that very same call would throw an exception about an invalid handle. It was this that caused me to start investigating the timeout method.

After having a bit of think, I came up with a proof of concept that showed beyond a doubt that the timeout method was introducing instability in the very internals of my http client. The finished proof of concept code can look a bit complex, so rather than showing the final concept code straightaway, I’ll run you through my thought process instead.

Let’s start with the basics and write some code that uses the http client to fetch a random google page. A randomized parameter is added to the google url in order to circumvent any client-side caching. The page fetch itself is wrapped inside a timeout block as we are interested in testing whether the timeout method is corrupting the http client.

# basics.rb
# timeout doesn't get triggered
require 'curb'
require 'timeout'

timeout(1) do
  Curl.get("http://www.google.com?foo=#{rand}")
end

This code will rarely timeout as a page fetch generally takes way less than one second to complete. This is why we’re going to wrap our page fetch inside an infinite while loop.

# infinite_loop.rb
# timeout gets triggered and Timeout::Error exception gets thrown
require 'curb'
require 'timeout'

timeout(1) do
  while true
    Curl.get("http://www.google.com?foo=#{rand}")
  end
end
$ ruby infinite_loop.rb

/Users/vaneyckt/.rvm/gems/ruby-2.0.0-p594/gems/curb-0.8.8/lib/curl/easy.rb:68:
  in 'perform': execution expired (Timeout::Error)

The above code is now timing out and throwing a Timeout::Error exception. Next we want to determine whether the timing out of a page fetch could corrupt the internal state of the http client, thereby causing problems for a subsequent page fetch. We’ll need to make lots of page fetches to test this, so we’re going to wrap all of our current code inside another infinite while loop. Furthermore, we don’t want any Timeout::Error exceptions to break us out of this while loop, so we’re going to catch and ignore these exceptions inside the while loop we just created. This gives us our finished proof of concept code.

# proof_of_concept.rb
# timeout corrupts the very internals of the curb http client
require 'curb'
require 'timeout'

while true
  begin
    timeout(1) do
      while true
        Curl.get("http://www.google.com?foo=#{rand}")
      end
    end
  rescue Timeout::Error => e
  end
end
$ ruby proof_of_concept.rb

/Users/vaneyckt/.rvm/gems/ruby-2.0.0-p594/gems/curb-0.8.8/lib/curl/easy.rb:67:
  in 'add': CURLError: The easy handle is already added to a multi handle
  (Curl::Err::MultiAddedAlready)

Running the above program will result in an exception being thrown after a few seconds. At some point, the timeout method is causing a Timeout::Error exception to be raised inside a critical code path of the http client. This badly timed Timeout::Error exception leaves the client in an invalid state, which in turn causes the next page fetch to fail with the exception shown above. Hopefully this illustrates why you should avoid creating programs that can have Timeout::Error exceptions pop up absolutely anywhere.

Conclusion

I hope this has convinced you there is nothing you can do to prevent timeout from doing whatever it wants to your program’s internal state. There is just no way a program can deal with Timeout::Error exceptions being able to potentially pop up absolutely anywhere. The only time you can really get away with using timeouts is when writing functional code that does not rely on any state. In all other cases, it is best to just avoid timeouts entirely.


A javascript closures recap

Javascript closures have always been one those things that I used to navigate by intuition. Recently however, upon stumbling across some code that I did not quite grok, it became clear I should try and obtain a more formal understanding. This post is mainly intended as a quick recap for my future self. It won’t go into all the details about closures; instead it will focus on the bits that I found most helpful.

There seem to be very few step-by-step overviews of javascript closures. As a matter of fact, I only found two. Luckily they are both absolute gems. You can find them here and here. I heartily recommend both these articles to anyone wanting to gain a more complete understanding of closures.

Closure basics

I’m going to shamelessly borrow a few lines from the first of the two articles linked above to illustrate the basic concept of a closure.

function doSome() {
  var x = 10;

  function f(y) {
    return x + y;
  }
  return f;
}

var foo = doSome();
foo(20); // returns 30
foo(30); // returns 40

In the above example, the function f creates a closure. If you just look at f, it seems that the variable x is not defined. Actually, x is caught from the enclosing function. A closure is a function which closes (or survives) variables of the enclosing function. In the above example, the function f creates a closure because it closes the variable x into the scope of itself. If the closure object, a Function instance, is still alive, the closed variable x keeps alive. It’s like that the scope of the variable x is extended.

This is really all you need to know about closures: they refer to variables declared outside the scope of the function and by doing so keep these variables alive. Closure behavior can be entirely explained just by keeping these two things in mind.

Closures and primitive data types

The rest of this post will go over some code examples to illustrate the behavior of closures for both primitive and object params. In this section, we’ll have a look at the behavior of a closure with a primitive data type param.

Example 1

The code below will be our starting point for studying closures. Be sure to take a good look at it, as all our examples will be a variation of this code. Throughout this post, we are going to try and understand closures by examining the values returned by the foo() function.

var prim = 1;

var foo = function(p) {
  var f = function() {
    return p;
  }
  return f;
}(prim);

foo();    // returns 1
prim = 3;
foo();    // returns 1

When the javascript runtime wants to resolve the value returned by return p;, it finds that this p variable is the same as the p variable from var foo = function(p) {. In other words, there is no direct link between the p from return p; and the variable prim from var prim = 1;. We see this is true because assigning a new value to prim does not cause the value returned by foo() to change.

Example 2

Now let’s have a look at what happens when we make a small change to the previous code sample by adding the line p = 2; to it.

var prim = 1;

var foo = function(p) {
  var f = function() {
    return p;
  }
  p = 2;
  return f;
}(prim);

foo();    // returns 2
prim = 3;
foo();    // returns 2

The code above is interesting in that it shows that the p variable from return p; is indeed the same as the p variable from var foo = function(p) {. Even though the variable f gets created at a time when p is set to 1, the act of setting p to 2 does indeed cause the value returned by foo() to change. This is a great example of a closure keeping a closed variable alive.

Example 3

This sample shows code similar to the first, but this time we made the closure close over the prim variable.

var prim = 1;

var foo = function() {
  return prim;
}

foo();    // returns 1
prim = 3;
foo();    // returns 3

Here too we can make a similar deduction as we did for the previous samples. When the javascript runtime wants to resolve the value returned by return prim;, it finds that this prim variable is the same as the prim variable from var prim = 1;. This explains why setting prim to 3 causes the value returned by foo() to change.

Closures and objects

In this section we’ll see what happens when we take our code samples and change the param from a primitive data type to an object.

Example 1.a

The code below is interesting because in the previous section we saw that a similar example using a primitive param had both calls to foo() return the same value. So what’s different here? Let’s inspect how the runtime resolves the variables involved.

var obj = ["a"];

var foo = function(o) {
  var f = function() {
    return o.length;
  }
  return f;
}(obj);

foo();        // returns 1
obj[1] = "b"; // modifies the object pointed to by the obj var
obj[2] = "c"; // modifies the object pointed to by the obj var
foo();        // returns 3

When the runtime tries to resolve the variable o from return o.length;, it finds that this variable o is the same as the variable o from var foo = function(o) {. We saw this exact same thing in the previous section. Unlike the previous section, the variable o now contains a reference to an array object. This causes our closure to have a direct link to this array object, and thus any changes to it will get reflected in the output of foo(). This explains why the second call to foo() gives a different output than the first.

A good rule of thumb goes like this:

  • if a closed variable contains a value, then the closure links to that variable
  • if a closed variable contains a reference to an object, then the closure links to that object, and will pick up on any changes made to it

Example 1.b

Note that the closure will only pick up on changes made to the particular object that was present when the closure was created. Assigning a new object to the obj variable after the closure was created will have no effect. The code below illustrates this.

var obj = ["a"];

var foo = function(o) {
  var f = function() {
    return o.length;
  }
  return f;
}(obj);

foo();                 // returns 1
obj = ["a", "b", "c"]; // assign a new array object to the obj variable
foo();                 // returns 1

In fact, this code is practically identical to the code from Example 1 of the previous section.

Example 2

We’ll now modify the previous code sample a bit. This time we’ll take a look at what happens when we add the line o[1] = "b";.

var obj = ["a"];

var foo = function(o) {
  var f = function() {
    return o.length;
  }
  o[1] = "b";
  return f;
}(obj);

foo();        // returns 2
obj[1] = "b";
obj[2] = "c";
foo();        // returns 3

Once again, we can start by reasoning about how the runtime resolves the variable o from return o.length;. As you probably know by now, this variable o is the same as the variable o from var foo = function(o) {. And since it contains a reference to an object, any changes to this object will get reflected in the output of foo(). This explains why the first call to foo() now returns 2, whereas previously it was returning 1.

Example 3

If you managed to make it this far, this last bit of code should hold no surprises for you.

var obj = ["a"];

var foo = function() {
  return obj.length;
}

foo();        // returns 1
obj[1] = "b";
obj[2] = "c";
foo();        // returns 3

The runtime will resolve the variable obj from return obj.length; to be the same as the variable obj from var obj = ["a"];. As a result, any changes to the obj variable will have an effect on the output of foo().

Conclusion

Hopefully this post has demystified closures a bit. Time and time again, we’ve shown how following a few simple steps will lead you to understand their behavior. Just keep in mind these rules of thumb and you should be good to go:

  • if a closed variable contains a value, then the closure links to that variable
  • if a closed variable contains a reference to an object, then the closure links to that object, and will pick up on any changes made to it

Ideally, this is going to become my go-to post for providing an introduction to closures. So please let me know any suggestions you might have to improve this post.


Understanding iostat

I’ve been spending a lot of time lately looking at I/O performance and reading up about the iostat command. While this command provides a wealth of I/O performance data, the sheer amount of it all can make it hard to see the forest for the trees. In this post, we’ll talk about interpreting this data. Before we continue, I would first like to thank the authors of the blog posts mentioned below, as each of these has helped me understand iostat and its many complexities just a little bit better.

The iostat command can display both basic and extended metrics. We’ll take a look at the basic metrics first before moving on to extended metrics in the remainder of this post. Note that this post will not go into detail about every last metric. Instead, I have decided to focus on just those metrics that I found to be especially useful, as well as those that seem to be often misunderstood.

Basic iostat metrics

The iostat command lists basic metrics by default. The -m parameter causes metrics to be displayed in megabytes per second instead of blocks or kilobytes per second. Using the 5 parameter causes iostat to recalculate metrics every 5 seconds, thereby making the numbers an average over this interval.

$ iostat -m 5

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.84    0.16    3.91    7.73    0.04   79.33

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
xvdap1           46.34         0.33         1.03    2697023    8471177
xvdb              0.39         0.00         0.01       9496      71349
xvdg             65.98         1.34         0.97   11088426    8010609
xvdf            205.17         1.62         2.68   13341297   22076001
xvdh             51.16         0.64         1.43    5301463   11806257

The tps number here is the number of I/O Operations Per Second (IOPS). Wikipedia has a nice list of average IOPS for different storage devices. This should give you a pretty good idea of the I/O load on your machine.

Some people put a lot of faith in the %iowait metric as an indicator of I/O performance. However, %iowait is first and foremost a CPU metric that measures the percentage of time the CPU is idle while waiting for an I/O operation to complete. This metric is heavily influenced by both your CPU speed and CPU load and is therefore easily misinterpreted.

For example, consider a system with just two processes: the first one heavily I/O intensive, the second one heavily CPU intensive. As the second process will prevent the CPU from going idle, the %iowait metric will stay low despite the first process’s high I/O utilization. Other examples illustrating the deceptive nature of %iowait can be found here (mirror). The only thing %iowait really tells us is that the CPU occasionally idles while there is an outstanding I/O request, and could thus be made to handle more computational work.

Extended iostat metrics

Let’s now take a look at the extended metrics by calling the iostat -x command.

$ iostat -mx 5

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.84    0.16    3.91    7.73    0.04   79.33

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.57     6.38   20.85   25.49     0.33     1.03    59.86     0.27   17.06   13.15   20.25   1.15   5.33
xvdb              0.00     1.93    0.10    0.29     0.00     0.01    51.06     0.00    7.17    0.33    9.66   0.09   0.00
xvdg              0.55     4.69   42.04   23.94     1.34     0.97    71.89     0.44    6.63    6.82    6.28   1.16   7.67
xvdf              7.33    41.35  132.66   72.52     1.62     2.68    42.87     0.49    2.37    2.79    1.59   0.36   7.42
xvdh              0.00     4.54   15.54   35.63     0.64     1.43    83.04     0.00   10.22    8.39   11.02   1.30   6.68

The r/s and w/s numbers show the amount of read and write requests issued to the I/O device per second. These numbers provide a more detailed breakdown of the tps metric we saw earlier, as tps = r/s + w/s.

The avgqu-sz metric is an important value. Its name is rather poorly chosen as it doesn’t actually show the number of operations that are queued but not yet serviced. Instead, it shows the number of operations that were either queued, or being serviced. Ideally, you’d want to have an idea of this value during normal operations for use as a baseline number for when trouble occurs. Single digit numbers with the occasional double digit spike are safe(ish) values. Triple digit numbers generally are not.

The await metric is the average time from when a request was put in the queue to when the request was completed. This is the sum of the time a request was waiting in the queue and the time our storage device was working on servicing the request. This metric is highly dependent on the number of items in the queue. Much like avgqu-sz, you’ll want to have an idea of the value of this metric during normal operations for use as a baseline.

Our next metric is svctm. You’ll find a lot of older blog posts that go into quite some detail about this one. However, man iostat makes it quite clear that this metric has since been deprecated and should no longer be trusted.

Our last metric is %util. Just like svctm, this metric has been touched by the progress of technology as well. The man iostat pages contain the information shown below.

%util

Percentage of elapsed time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100% for devices serving requests serially. But for devices serving requests in parallel, such as RAID arrays and modern SSDs, this number does not reflect their performance limits.

It’s common to assume that the closer a device gets to 100% utilization, the more saturated it becomes. This is true when the storage device corresponds to a single magnetic disk as such a device can only serve one request at a time. However, a single SSD or a RAID array consisting of multiple disks can serve multiple requests simultaneously. For such devices, %util essentially indicates the percentage of time that the device was busy serving one or more requests. Unfortunately, this value tells us absolutely nothing about the maximum number of simultaneous requests such a device can handle. This metric should therefore not be treated as a saturation indicator for either SSDs or RAID arrays.

Conclusion

By now it should be clear that iostat is an incredibly powerful tool, the metrics of which can take some experience to interpret correctly. In a perfect world your machines should regularly be writing these metrics to a monitoring service, so you’ll always have access to good baseline numbers. In an imperfect world, just knowing your baseline IOPS values will already go a long way when trying to diagnose whether a slowdown is I/O related.


Safer bash scripts with 'set -euxo pipefail'

Often times developers go about writing bash scripts the same as writing code in a higher-level language. This is a big mistake as higher-level languages offer safeguards that are not present in bash scripts by default. For example, a Ruby script will throw an error when trying to read from an uninitialized variable, whereas a bash script won’t. In this article, we’ll look at how we can improve on this.

The bash shell comes with several builtin commands for modifying the behavior of the shell itself. We are particularly interested in the set builtin, as this command has several options that will help us write safer scripts. I hope to convince you that it’s a really good idea to add set -euxo pipefail to the beginning of all your future bash scripts.

set -e

The -e option will cause a bash script to exit immediately when a command fails. This is generally a vast improvement upon the default behavior where the script just ignores the failing command and continues with the next line. This option is also smart enough to not react on failing commands that are part of conditional statements. Moreover, you can append a command with || true for those rare cases where you don’t want a failing command to trigger an immediate exit.

Before

#!/bin/bash

# 'foo' is a non-existing command
foo
echo "bar"

# output
# ------
# line 4: foo: command not found
# bar
#
# Note how the script didn't exit when the foo command could not be found.
# Instead it continued on and echoed 'bar'.

After

#!/bin/bash
set -e

# 'foo' is a non-existing command
foo
echo "bar"

# output
# ------
# line 5: foo: command not found
#
# This time around the script exited immediately when the foo command wasn't found.
# Such behavior is much more in line with that of higher-level languages.

Any command returning a non-zero exit code will cause an immediate exit

#!/bin/bash
set -e

# 'ls' is an existing command, but giving it a nonsensical param will cause
# it to exit with exit code 1
$(ls foobar)
echo "bar"

# output
# ------
# ls: foobar: No such file or directory
#
# I'm putting this in here to illustrate that it's not just non-existing commands
# that will cause an immediate exit.

Preventing an immediate exit

#!/bin/bash
set -e

foo || true
$(ls foobar) || true
echo "bar"

# output
# ------
# line 4: foo: command not found
# ls: foobar: No such file or directory
# bar
#
# Sometimes we want to ensure that, even when 'set -e' is used, the failure of
# a particular command does not cause an immediate exit. We can use '|| true' for this.

Failing commands in a conditional statement will not cause an immediate exit

#!/bin/bash
set -e

# we make 'ls' exit with exit code 1 by giving it a nonsensical param
if ls foobar; then
  echo "foo"
else
  echo "bar"
fi

# output
# ------
# ls: foobar: No such file or directory
# bar
#
# Note that 'ls foobar' did not cause an immediate exit despite exiting with
# exit code 1. This is because the command was evaluated as part of a
# conditional statement.

That’s all for set -e. However, set -e by itself is far from enough. We can further improve upon the behavior created by set -e by combining it with set -o pipefail. Let’s have a look at that next.

set -o pipefail

The bash shell normally only looks at the exit code of the last command of a pipeline. This behavior is not ideal as it causes the -e option to only be able to act on the exit code of a pipeline’s last command. This is where -o pipefail comes in. This particular option sets the exit code of a pipeline to that of the rightmost command to exit with a non-zero status, or to zero if all commands of the pipeline exit successfully.

Before

#!/bin/bash
set -e

# 'foo' is a non-existing command
foo | echo "a"
echo "bar"

# output
# ------
# a
# line 5: foo: command not found
# bar
#
# Note how the non-existing foo command does not cause an immediate exit, as
# it's non-zero exit code is ignored by piping it with '| echo "a"'.

After

#!/bin/bash
set -eo pipefail

# 'foo' is a non-existing command
foo | echo "a"
echo "bar"

# output
# ------
# a
# line 5: foo: command not found
#
# This time around the non-existing foo command causes an immediate exit, as
# '-o pipefail' will prevent piping from causing non-zero exit codes to be ignored.

This section hopefully made it clear that -o pipefail provides an important improvement upon just using -e by itself. However, as we shall see in the next section, we can still do more to make our scripts behave like higher-level languages.

set -u

This option causes the bash shell to treat unset variables as an error and exit immediately. Unset variables are a common cause of bugs in shell scripts, so having unset variables cause an immediate exit is often highly desirable behavior.

Before

#!/bin/bash
set -eo pipefail

echo $a
echo "bar"

# output
# ------
#
# bar
#
# The default behavior will not cause unset variables to trigger an immediate exit.
# In this particular example, echoing the non-existing $a variable will just cause
# an empty line to be printed.

After

#!/bin/bash
set -euo pipefail

echo "$a"
echo "bar"

# output
# ------
# line 5: a: unbound variable
#
# Notice how 'bar' no longer gets printed. We can clearly see that '-u' did indeed
# cause an immediate exit upon encountering an unset variable.

Dealing with ${a:-b} variable assignments

Sometimes you’ll want to use a ${a:-b} variable assignment to ensure a variable is assigned a default value of b when a is either empty or undefined. The -u option is smart enough to not cause an immediate exit in such a scenario.

#!/bin/bash
set -euo pipefail

DEFAULT=5
RESULT=${VAR:-$DEFAULT}
echo "$RESULT"

# output
# ------
# 5
#
# Even though VAR was not defined, the '-u' option realizes there's no need to cause
# an immediate exit in this scenario as a default value has been provided.

Using conditional statements that check if variables are set

Sometimes you want your script to not immediately exit when an unset variable is encountered. A common example is checking for a variable’s existence inside an if statement.

#!/bin/bash
set -euo pipefail

if [ -z "${MY_VAR:-}" ]; then
  echo "MY_VAR was not set"
fi

# output
# ------
# MY_VAR was not set
#
# In this scenario we don't want our program to exit when the unset MY_VAR variable
# is evaluated. We can prevent such an exit by using the same syntax as we did in the
# previous example, but this time around we specify no default value.

This section has brought us a lot closer to making our bash shell behave like higher-level languages. While -euo pipefail is great for the early detection of all kinds of problems, sometimes it won’t be enough. This is why in the next section we’ll look at an option that will help us figure out those really tricky bugs that you encounter every once in a while.

set -x

The -x option causes bash to print each command before executing it. This can be a great help when trying to debug a bash script failure. Note that arguments get expanded before a command gets printed, which will cause our logs to contain the actual argument values that were present at the time of execution!

#!/bin/bash
set -euxo pipefail

a=5
echo $a
echo "bar"

# output
# ------
# + a=5
# + echo 5
# 5
# + echo bar
# bar

That’s it for the -x option. It’s pretty straightforward, but can be a great help for debugging. Next up, we’ll look at an option I had never heard of before that was suggested by a reader of this blog.

Reader suggestion: set -E

Traps are pieces of code that fire when a bash script catches certain signals. Aside from the usual signals (e.g. SIGINT, SIGTERM, …), traps can also be used to catch special bash signals like EXIT, DEBUG, RETURN, and ERR. However, reader Kevin Gibbs pointed out that using -e without -E will cause an ERR trap to not fire in certain scenarios.

Before

#!/bin/bash
set -euo pipefail

trap "echo ERR trap fired!" ERR

myfunc()
{
  # 'foo' is a non-existing command
  foo
}

myfunc
echo "bar"

# output
# ------
# line 9: foo: command not found
#
# Notice that while '-e' did indeed cause an immediate exit upon trying to execute
# the non-existing foo command, it did not case the ERR trap to be fired.

After

#!/bin/bash
set -Eeuo pipefail

trap "echo ERR trap fired!" ERR

myfunc()
{
  # 'foo' is a non-existing command
  foo
}

myfunc
echo "bar"

# output
# ------
# line 9: foo: command not found
# ERR trap fired!
#
# Not only do we still have an immediate exit, we can also clearly see that the
# ERR trap was actually fired now.

The documentation states that -E needs to be set if we want the ERR trap to be inherited by shell functions, command substitutions, and commands that are executed in a subshell environment. The ERR trap is normally not inherited in such cases.

Conclusion

I hope this post showed you why using set -euxo pipefail (or set -Eeuxo pipefail) is such a good idea. If you have any other options you want to suggest, then please let me know and I’ll be happy to add them to this list.


An introduction to javascript promises

I recently had to write some javascript code that required the sequential execution of half a dozen asynchronous requests. I figured this was the perfect time to learn a bit more about javascript promises. This post is a recap of what I read in these three amazing write-ups.

What are promises?

A Promise object represents a value that may not be available yet, but will be resolved at some point in future. This abstraction allows you to write asynchronous code in a more synchronous fashion. For example, you can use a Promise object to represent data that will eventually be returned by a call to a remote web service. The then and catch methods can be used to attach callbacks that will be triggered once the data arrives. We’ll take a closer look at these two methods in the next sections. For now, let’s write a simple AJAX request example that prints a random joke.

var promise = new Promise(function(resolve, reject) {
  $.ajax({
    url: "http://api.icndb.com/jokes/random",
    success: function(result) {
      resolve(result["value"]["joke"]);
    }
  });
});

promise.then(function(result) {
  console.log(result);
});

Note how the Promise object is just a wrapper around the AJAX request and how we’ve instructed the success callback to trigger the resolve method. We’ve also attached a callback to our Promise object with the then method. This callback gets triggered when the resolve method gets called. The result variable of this callback will contain the data that was passed to the resolve method.

Before we take a closer look at the resolve method, let’s first investigate the Promise object a bit more. A Promise object can have one of three states:

  • fulfilled - the action relating to the Promise succeeded
  • rejected - the action relating to the Promise failed
  • pending - the Promise hasn’t been fulfilled or rejected yet

A pending Promise object can be fulfilled or rejected by calling resolve or reject on it. Once a Promise is fulfilled or rejected, this state gets permanently associated with it. The state of a fulfilled Promise also includes the data that was passed to resolve, just as the state of a rejected Promise also includes the data that was passed to reject. In summary, we can say that a Promise executes only once and stores the result of its execution.

var promise = new Promise(function(resolve, reject) {
  $.ajax({
    url: "http://api.icndb.com/jokes/random",
    success: function(result) {
      resolve(result["value"]["joke"]);
    }
  });
});

promise.then(function(result) {
  console.log(result);
});

promise.then(function(result) {
  console.log(result);
});

We can test whether a Promise only ever executes once by adding a second callback to the previous example. In this case, we see that only one AJAX request gets made and that the same joke gets printed to the console twice. This clearly shows that our Promise was only executed once.

The then method and chaining

The then method takes two arguments: a mandatory success callback and an optional failure callback. These callbacks are called when the Promise is settled (i.e. either fulfilled or rejected). If the Promise was fulfilled, the success callback will be fired with the data you passed to resolve. If the Promise was rejected, the failure callback will be called with the data you passed to reject. We’ve already covered most of this in the previous section.

The real magic with the then method happens when you start chaining several of them together. This chaining allows you to express your logic in separate stages, each of which can be made responsible for transforming data passed on by the previous stage or for running additional asynchronous requests. The code below shows how data returned by the success callback of the first then method becomes available to the success callback of the second then method.

var promise = new Promise(function(resolve, reject) {
  $.ajax({
    url: "http://api.icndb.com/jokes/random",
    success: function(result) {
      resolve(result["value"]["joke"]);
    }
  });
});

promise.then(function(result) {
  return result;
}).then(function(result) {
  console.log(result);
});

This chaining is possible because the then method returns a new Promise object that will resolve to the return value of the callback. Or in other words, by calling return result; we cause the creation of an anonymous Promise object that looks something like shown below. Notice that this particular anonymous Promise object will resolve immediately, as it does not make any asynchronous requests.

new Promise(function(resolve, reject) {
  resolve(result);
});

Now that we understand that the then method always returns a Promise object, let’s take a look at what happens when we tell the callback of a then method to explicitly return a Promise object.

function getJokePromise() {
  return new Promise(function(resolve, reject) {
    $.ajax({
      url: "http://api.icndb.com/jokes/random",
      success: function(result) {
        resolve(result["value"]["joke"]);
      }
    });
  });
}

getJokePromise().then(function(result) {
  console.log(result);
  return getJokePromise();
}).then(function(result) {
  console.log(result);
});

In this case, we end up sequentially executing two asynchronous requests. When the first Promise is resolved, the first joke is printed and a new Promise object is returned by the then method. This new Promise object then has then called on it. When the Promise succeeds, the then success callback is triggered and the second joke is printed.

The takeaway from all this is that calling return in a then callback will always result in returning a Promise object. It is this that allows for then chaining!

Error handling

We mentioned in the previous section how the then method can take an optional failure callback that gets triggered when reject is called. It is customary to reject with an Error object as they capture a stack trace, thereby facilitating debugging.

var promise = new Promise(function(resolve, reject) {
  $.ajax({
    url: "http://random.url.com",
    success: function(result) {
      resolve(result["value"]["joke"]);
    },
    error: function(jqxhr, textStatus) {
      reject(Error("The AJAX request failed."));
    }
  });
});

promise.then(function(result) {
  console.log(result);
}, function(error) {
  console.log(error);
  console.log(error.stack);
});

Personally, I find this a bit hard to read. Luckily we can use the catch method to make this look a bit nicer. There’s nothing special about the catch method. In fact, it’s just sugar for then(undefined, func), but it definitely makes code easier to read.

var promise = new Promise(function(resolve, reject) {
  $.ajax({
    url: "http://random.url.com",
    success: function(result) {
      resolve(result["value"]["joke"]);
    },
    error: function(jqxhr, textStatus) {
      reject(Error("The AJAX request failed."));
    }
  });
});

promise.then(function(result) {
  console.log(result);
}).then(function(result) {
  console.log("foo"); // gets skipped
}).then(function(result) {
  console.log("bar"); // gets skipped
}).catch(function(error) {
  console.log(error);
  console.log(error.stack);
});

Aside from illustrating improved readability, the above code showcases another aspect of the reject method in that Promise rejections will cause your code to skip forward to the next then method that has a rejection callback (or the next catch method, since this is equivalent). It is this fallthrough behavior that causes this code to not print “foo” or “bar”!

As a final point, it is useful to know that a Promise is implicitly rejected if an error is thrown in its constructor callback. This means it’s useful to do all your Promise related work inside the Promise constructor callback, so errors automatically become rejections.

var promise = new Promise(function(resolve, reject) {
  // JSON.parse throws an error if you feed it some
  // invalid JSON, so this implicitly rejects
  JSON.parse("This ain't JSON");
});

promise.then(function(result) {
  console.log(result);
}).catch(function(error) {
  console.log(error);
});

The above code will cause the Promise to be rejected and an error to be printed because it will fail to parse the invalid JSON string.


Unwanted spot instance termination in multi-AZ ASG

An auto scaling group is an AWS abstraction that facilitates increasing or decreasing the number of EC2 instances within your application’s architecture. Spot instances are unused AWS servers that are auctioned off for little money. The combination of these two allows for large auto scaling groups at low costs. However, you can lose your spot instances at a moment’s notice as soon as someone out there wants to pay more than you do.

Knowing all this, I recently found myself looking into why AWS was terminating several of our spot instances every day. We were bidding 20% over the average price, so it seemed unlikely that this was being caused by a monetary issue. Nevertheless, we kept noticing multiple spot instances disappearing on a daily basis.

It took a while to get to the bottom of things, but it turned out that this particular problem was being caused by an unfortunate combination of:

  • our auto scaling group spanning multiple availability zones
  • our scaling code making calls to TerminateInstanceInAutoScalingGroup

The step-by-step explanation of this issue was as follows:

  • our scaling code was asking AWS to put 10 instances in our auto scaling group
  • AWS obliged and put 5 instances in availability zone A and another 5 in zone B
  • some time later our scaling code would decide that 2 specific instances were no longer needed. A call would be made to TerminateInstanceInAutoScalingGroup to have just these 2 specific instances terminated.
  • if these 2 instances happened to be in the same availability zone, then one zone would now have 3 instances, while the other one would now have 5
  • AWS would detect that both zones were no longer balanced and would initiate a rebalancing action. This rebalancing action would terminate one of the instances in the zone with 5 instances, and spin up another instance in the zone with 3 instances.

So while this action did indeed end up rebalancing the instances across the different availability zones, it also inadvertently ended up terminating a running instance.

The relevant entry from the AWS Auto Scaling docs is shown below.

Instance Distribution and Balance across Multiple Zones

Auto Scaling attempts to distribute instances evenly between the Availability Zones that are enabled for your Auto Scaling group. Auto Scaling attempts to launch new instances in the Availability Zone with the fewest instances. If the attempt fails, however, Auto Scaling will attempt to launch in other zones until it succeeds.

Certain operations and conditions can cause your Auto Scaling group to become unbalanced. Auto Scaling compensates by creating a rebalancing activity under any of the following conditions:

  • You issue a request to change the Availability Zones for your group.
  • You call TerminateInstanceInAutoScalingGroup, which causes the group to become unbalanced.
  • An Availability Zone that previously had insufficient capacity recovers and has additional capacity available.

Auto Scaling always launches new instances before attempting to terminate old ones, so a rebalancing activity will not compromise the performance or availability of your application.

Multi-Zone Instance Counts when Approaching Capacity

Because Auto Scaling always attempts to launch new instances before terminating old ones, being at or near the specified maximum capacity could impede or completely halt rebalancing activities. To avoid this problem, the system can temporarily exceed the specified maximum capacity of a group by a 10 percent margin during a rebalancing activity (or by a 1-instance margin, whichever is greater). The margin is extended only if the group is at or near maximum capacity and needs rebalancing, either as a result of user-requested rezoning or to compensate for zone availability issues. The extension lasts only as long as needed to rebalance the group—typically a few minutes.

I’m not sure about the best way to deal with this behavior. In our case, we just restricted our auto scaling group to one availability zone. This was good enough for us as none of the work done by our spot instances is critical. Going through the docs, it seems one approach might be to disable the AZRebalance process. However, I have not had the chance to try this, so I cannot guarantee a lack of unexpected side effects.


Creating an EC2 Instance in a VPC with the AWS CLI

Setting up an EC2 instance on AWS used to be as straightforward as provisioning a machine and SSHing into it. However, this process has become a bit more complicated now that Amazon VPC has become the standard for managing machines in the cloud.

So what exactly is a Virtual Private Cloud? Amazon defines a VPC as ‘a logically isolated section of the AWS Cloud’. Instances inside a VPC can by default only communicate with other instances in the same VPC and are therefore invisible to the rest of the internet. This means they will not accept SSH connections coming from your computer, nor will they respond to any http requests. In this article we’ll look into changing these default settings into something more befitting a general purpose server.

Setting up your VPC

Start by installing the AWS Command Line Interface on your machine if you haven’t done so already. With this done, we can now create our VPC.

$ vpcId=`aws ec2 create-vpc --cidr-block 10.0.0.0/28 --query 'Vpc.VpcId' --output text`

There are several interesting things here:

  • the --cidr-block parameter specifies a /28 netmask that allows for 16 IP addresses. This is the smallest supported netmask.
  • the create-vpc command returns a JSON string. We can filter out specific fields from this string by using the --query and --output parameters.

The next step is to overwrite the default VPC DNS settings. As mentioned earlier, instances launched inside a VPC are invisible to the rest of the internet by default. AWS therefore does not bother assigning them a public DNS name. Luckily this can be changed easily.

$ aws ec2 modify-vpc-attribute --vpc-id $vpcId --enable-dns-support "{\"Value\":true}"
$ aws ec2 modify-vpc-attribute --vpc-id $vpcId --enable-dns-hostnames "{\"Value\":true}"

Adding an Internet Gateway

Next we need to connect our VPC to the rest of the internet by attaching an internet gateway. Our VPC would be isolated from the internet without this.

$ internetGatewayId=`aws ec2 create-internet-gateway --query 'InternetGateway.InternetGatewayId' --output text`
$ aws ec2 attach-internet-gateway --internet-gateway-id $internetGatewayId --vpc-id $vpcId

Creating a Subnet

A VPC can have multiple subnets. Since our use case only requires one, we can reuse the cidr-block specified during VPC creation so as to get a single subnet that spans the entire VPC address space.

$ subnetId=`aws ec2 create-subnet --vpc-id $vpcId --cidr-block 10.0.0.0/28 --query 'Subnet.SubnetId' --output text`

While this --cidr-block parameter specifies a subnet that can contain 16 IP addresses (10.0.0.1 - 10.0.0.16), AWS will reserve 5 of those for private use. While this doesn’t really have an impact on our use case, it is still good to be aware of such things.

Configuring the Route Table

Each subnet needs to have a route table associated with it to specify the routing of its outbound traffic. By default every subnet inherits the default VPC route table which allows for intra-VPC communication only.

Here we add a route table to our subnet so as to allow traffic not meant for an instance inside the VPC to be routed to the internet through the internet gateway we created earlier.

$ routeTableId=`aws ec2 create-route-table --vpc-id $vpcId --query 'RouteTable.RouteTableId' --output text`
$ aws ec2 associate-route-table --route-table-id $routeTableId --subnet-id $subnetId
$ aws ec2 create-route --route-table-id $routeTableId --destination-cidr-block 0.0.0.0/0 --gateway-id $internetGatewayId

Adding a Security Group

Before we can launch an instance, we first need to create a security group that specifies which ports should allow traffic. For now we’ll just allow anyone to try and make an SSH connection by opening port 22 to any IP address.

$ securityGroupId=`aws ec2 create-security-group --group-name my-security-group --description "my-security-group" --vpc-id $vpcId --query 'GroupId' --output text`
$ aws ec2 authorize-security-group-ingress --group-id $securityGroupId --protocol tcp --port 22 --cidr 0.0.0.0/0

Launching your Instance

All that’s left to do is to create an SSH key pair and then launch an instance secured by this. Let’s generate this key pair and store it locally with the correct permissions.

$ aws ec2 create-key-pair --key-name my-key --query 'KeyMaterial' --output text > ~/.ssh/my-key.pem
$ chmod 400 ~/.ssh/my-key.pem

We can now launch a single t2.micro instance based on the public AWS Ubuntu image.

$ instanceId=`aws ec2 run-instances --image-id ami-9eaa1cf6 --count 1 --instance-type t2.micro --key-name my-key --security-group-ids $securityGroupId --subnet-id $subnetId --associate-public-ip-address --query 'Instances[0].InstanceId' --output text`

After a few minutes your instance should be up and running. You should now be able to obtain the url of your active instance and SSH into it.

$ instanceUrl=`aws ec2 describe-instances --instance-ids $instanceId --query 'Reservations[0].Instances[0].PublicDnsName' --output text`
$ ssh -i ~/.ssh/my-key.pem ubuntu@$instanceUrl

And that’s it. It’s really not all that hard. There’s just an awful lot of concepts that you need to get your head around which can make it a bit daunting at first. Be sure to check out the free Amazon Virtual Private Cloud User Guide if you want to learn more about VPCs.


Finding and deleting old tags in a Github repository

It’s very easy for a Github repository to accumulate lots of tags over time. This onslaught of tags tends to be tolerated until it starts impacting git performance. It is at this point, when you have well in excess of tens of thousands of tags, that a call to action tends to be made. In this article, we’ll look at two approaches to rid yourself of these old tags.

The cut-off tag approach

This approach has us specify a cut-off tag. All tags that can trace their ancestry back to this cut-off tag will be allowed to remain. All others will get deleted. This is especially useful for when you have just merged a new feature, and now you want to delete all tags that were created before this merge. In this scenario, all you have to do is tag the merge commit and then use this as the cut-off tag.

The sequence of commands below deletes all tags that do not have the release-5 tag as an ancestor. Most of these commands are pretty self-explanatory, except for the one in the middle. The remainder of this section will focus on explaining this command.

# fetch all tags from the remote
git fetch

# delete all tags on the remote that do not have the release-5 tag as an ancestor
comm -23 <(git tag | sort) <(git tag --contains release-5 | sort) | xargs git push --delete origin

# delete all local tags that are no longer present on the remote
git fetch --prune origin +refs/tags/*:refs/tags/*

The comm command is used to compare two sorted files line by line. Luckily, we can avoid having to create any actual files by relying on process substitution instead.

comm -23 <(command to act as file 1) <(command to act as file 2) | xargs git push --delete origin

The -23 flag tells comm to suppress any lines that are unique to file 2, as well as any lines that appear in both files. In other words, it causes comm to return just those lines that only appear in file 1. Looking back at our sequence of commands above, it should be clear that this will cause us to obtain all tags that do not have the release-5 tag as an ancestor. Piping this output to xargs git push --delete origin will then remove these tags from Github.

The cut-off date approach

While the cut-off tag approach works great in a lot of scenarios, sometimes you just want to delete all tags that were created before a given cut-off date instead. Unfortunately, git doesn’t have any built-in functionality for accomplishing this. This is why we are going to make use of a Ruby script here.

# CUT_OFF_DATE needs to be of YYYY-MM-DD format
CUT_OFF_DATE = "2015-05-10"

def get_old_tags(cut_off_date)  
  `git log --tags --simplify-by-decoration --pretty="format:%ai %d"`
  .split("\n")
  .each_with_object([]) do |line, old_tags|
    if line.include?("tag: ")
      date = line[0..9]
      tags = line[28..-2].gsub(",", "").concat(" ").scan(/tag: (.*?) /).flatten
      old_tags.concat(tags) if date < cut_off_date
    end
  end
end

# fetch all tags from the remote
`git fetch`

# delete all tags on the remote that were created before the CUT_OFF_DATE
get_old_tags(CUT_OFF_DATE).each_slice(100) do |batch|
  system("git", "push", "--delete", "origin", *batch)
end

# delete all local tags that are no longer present on the remote
`git fetch --prune origin +refs/tags/*:refs/tags/*`

This Ruby script should be pretty straightforward. The get_old_tags method might stand out a bit here. It can look pretty complex, but most of it is just string manipulation to get the date and tags of each line outputted by the git log command, and storing old tags in the old_tags array. Note how we invoke the system method with an array of arguments for those calls that require input. This protects us against possible shell injection.

Be careful, as running this exact script inside your repository will delete all tags created before 2015-05-10. Also, be sure to specify your cut-off date in YYYY-MM-DD format!


Adding a post-execution hook to the db:migrate task

A few days ago we discovered that our MySQL database’s default character set and collation had been changed to the wrong values. Worse yet, it looked like this change had happened many months ago; something which we had been completely unaware of until now! In order to make sure this didn’t happen again, we looked into adding a post-execution hook to the rails db:migrate task.

Our first attempt is shown below. Here, we append a post-execution hook to the existing db:migrate task by creating a new db:migrate task. In rake, when a task is defined twice, the behavior of the new task gets appended to the behavior of the old task. So even though the code below may give the impression of overwriting the rails db:migrate task, we are actually just appending a call to the post_execution_hook method to it.

namespace :db do
  def post_execution_hook
    puts 'This code gets run after the rails db:migrate task.'
    puts 'However, it only runs if the db:migrate task does not throw an exception.'
  end

  task :migrate do
    post_execution_hook
  end
end

However, the above example only runs the appended code if the original db:migrate task does not throw any exceptions. Luckily we can do better than that by taking a slightly different approach. Rather than appending code, we are going to have a go at prepending it instead.

namespace :db do
  def post_execution_hook
    puts 'This code gets run after the rails db:migrate task.'
    puts 'It will ALWAYS run.'
  end

  task :attach_hook do
    at_exit { post_execution_hook }
  end
end

Rake::Task['db:migrate'].enhance(['db:attach_hook'])

Here we make use of the enhance method to add db:attach_hook as a prerequisite task to db:migrate. This means that calling db:migrate will now cause the db:attach_hook task to get executed before db:migrate gets run. The db:attach_hook task creates an at_exit hook that will trigger our post-execution code upon exit of the db:migrate task. Hence, our post-execution hook will now get called even when db:migrate raises an exception!


Installing chromedriver

Some time ago I needed to install chromedriver on a ubuntu machine. While this wasn’t too hard, I was nevertheless surprised by the number of open StackOverflow questions on this topic. So I decided to leave some notes for my future self.

First of all, let’s install chromedriver.

$ LATEST_RELEASE=$(curl http://chromedriver.storage.googleapis.com/LATEST_RELEASE)
$ wget http://chromedriver.storage.googleapis.com/$LATEST_RELEASE/chromedriver_linux64.zip
$ unzip chromedriver_linux64.zip
$ rm chromedriver_linux64.zip
$ sudo mv chromedriver /usr/local/bin

Let’s see what happens when we try and run it.

$ chromedriver

chromedriver: error while loading shared libraries: libgconf-2.so.4:
cannot open shared object file: No such file or directory

That’s a bit unexpected. Luckily we can easily fix this.

$ sudo apt-get install libgconf-2-4

Now that we have a functioning chromedriver, the only thing left to do is to install Chrome. After all, chromedriver can’t function without Chrome.

$ wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
$ sudo sh -c 'echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list'
$ sudo apt-get update
$ sudo apt-get install google-chrome-stable

And that’s it. You should be good to go now.


Programmatically rotating the Android screen

A lot of digital ink has been spilled on this subject, so I figured it might be worth to briefly talk about this. You can either change the orientation through ADB or through an app. While the ADB approach is the easiest, it might not work on all devices or on all Android versions. For example, the dumpsys output of a Kindle Fire is different than that of a Samsung Galaxy S4, so you might need to tweak the grepping of the output.

# get current orientation
adb shell dumpsys input | grep SurfaceOrientation | awk '{print $2}'

# change orientaton to portait
adb shell content insert --uri content://settings/system --bind name:s:accelerometer_rotation --bind value:i:0
adb shell content insert --uri content://settings/system --bind name:s:user_rotation --bind value:i:0

# change orientation to landscape
adb shell content insert --uri content://settings/system --bind name:s:accelerometer_rotation --bind value:i:0
adb shell content insert --uri content://settings/system --bind name:s:user_rotation --bind value:i:1

If you don’t want to use ADB and prefer to change the orientation through an Android app instead, then you can just use these commands.

// get current orientation
final int orientation = myActivity.getResources().getConfiguration().orientation;

// change orientation to portrait
myActivity.setRequestedOrientation(ActivityInfo.SCREEN_ORIENTATION_PORTRAIT);

// change orientation to landscape
myActivity.setRequestedOrientation(ActivityInfo.SCREEN_ORIENTATION_LANDSCAPE);

Programmatically creating Android touch events

Recent versions of Android have the adb shell input touch functionality to simulate touch events on an Android device or simulator. However, older Android versions (like 2.3) do not support this command. Luckily it is possible to recreate this functionality by running adb shell getevent to capture events as they are being generated. These events can then later be replayed using the adb shell sendevent command.

Running adb shell getevent when touching the screen might get you something like shown below. Notice how the output is in hexadecimal.

/dev/input/event7: 0001 014a 00000001
/dev/input/event7: 0003 003a 00000001
/dev/input/event7: 0003 0035 000001ce
/dev/input/event7: 0003 0036 00000382
/dev/input/event7: 0000 0002 00000000
/dev/input/event7: 0000 0000 00000000
/dev/input/event7: 0001 014a 00000000
/dev/input/event7: 0003 003a 00000000
/dev/input/event7: 0003 0035 000001ce
/dev/input/event7: 0003 0036 00000382
/dev/input/event7: 0000 0002 00000000
/dev/input/event7: 0000 0000 00000000

However, the adb shell sendevent command expect all of its input to be in decimal. So if we wanted to replay the above events, we’d need to do something like shown below. Note that 462 and 898 are the x and y coordinates of this particular touch event.

adb shell sendevent /dev/input/event7: 1 330 1
adb shell sendevent /dev/input/event7: 3 58 1
adb shell sendevent /dev/input/event7: 3 53 462
adb shell sendevent /dev/input/event7: 3 54 898
adb shell sendevent /dev/input/event7: 0 2 0
adb shell sendevent /dev/input/event7: 0 0 0
adb shell sendevent /dev/input/event7: 1 330 0
adb shell sendevent /dev/input/event7: 3 58 0
adb shell sendevent /dev/input/event7: 3 53 462
adb shell sendevent /dev/input/event7: 3 54 898
adb shell sendevent /dev/input/event7: 0 2 0
adb shell sendevent /dev/input/event7: 0 0 0

Some lesser known Github API functionality

One of our automation tools occasionally needs to interact with our Github repositories. Unfortunately, the current implementation of this tool leaves something to be desired as it requires cloning these repositories to local disk. Changes against these local repositories are then made on local branches, after which these branches get pushed to Github.

However, in order to save on disk space this tool will only ever create a single local copy of each repository. This makes it unsafe to run multiple instances of this tool as multiple instances simultaneously executing sequences of git commands against the same local repositories might lead to these commands inadvertently getting interpolated, thereby leaving the local repositories in an undefined state.

The solution to this complexity was to completely remove the need for local repositories and instead aim to have everything done through the wonderful Github API. This article is a reminder to myself about some API functionality that I found while looking into this.

Checking if a branch contains a commit

While the Github API does not have an explicit call to check whether a given commit is included in a branch, we can nevertheless use the compare call for just this purpose. This call takes two commits as input and returns a large JSON response of comparison data. We can use the status field of the response to ascertain if a given commit is behind or identical to the HEAD commit of a branch. If so, then the branch contains that commit.

We can use the Ruby octokit gem to implement this as follows.

require 'octokit'

class GithubClient < Octokit::Client
  def branch_contains_sha?(repo, branch, sha)
    ['behind', 'identical'].include?(compare(repo, branch, sha).status)
  end
end

Creating a remote branch from a remote commit

Sometimes you’ll want to create a remote branch by branching from a remote commit. We can use the create_reference call to accomplish this. Note that the ref parameter of this call needs to be set to refs/heads/#{branch} when creating a remote branch.

require 'octokit'

class GithubClient < Octokit::Client
  def create_branch_from_sha(repo, branch, sha)
    # create_ref internally transforms "heads/#{branch}" into "refs/heads/#{branch}"
    # as mentioned above, this is required by the Github API
    create_ref(repo, "heads/#{branch}", sha)
  end
end

Setting the HEAD of a remote branch to a specific remote commit

You can even forcefully set the HEAD of a remote branch to a specific remote commit by using the update_reference call. As mentioned earlier, the ref parameter needs to be set to refs/heads/#{branch}. Be careful when using this functionality though as it essentially allows you to overwrite the history of a remote branch!

require 'octokit'

class GithubClient < Octokit::Client
  def update_branch_to_sha(repo, branch, sha, force = true)
    # update_ref internally transforms "heads/#{branch}" into "refs/heads/#{branch}"
    # as mentioned earlier, this is required by the Github API
    update_ref(repo, "heads/#{branch}", sha, force)
  end
end

The amazing bitwise XOR operator

One of my colleagues recently mentioned this interview question to me.

Imagine there is an array which contains 2n+1 elements, n of which have exactly one duplicate. Can you find the one unique element in this array?

This seemed simple enough and I quickly came up with the Ruby solution below.

> array = [3, 5, 4, 5, 3]
# => [3, 5, 4, 5, 3]
> count = array.each_with_object(Hash.new(0)) { |number, hash| hash[number] += 1 }
# => {3=>2, 5=>2, 4=>1}
> count.key(1)
# => 4

I thought that would be the end of it, but instead I was asked if I could see a way to solve the problem in a significantly more performant way using the XOR operator.

XOR characteristics

In order to solve this problem with the XOR operator, we first need to understand some of its characteristics. This operator obeys the following rules:

  • commutativity: A^B=B^A
  • associativity: (A^B)^C=A^(B^C)
  • the identity element is 0: A^0=A
  • each element is its own inverse: A^A=0

Now imagine an array with the elements [3, 5, 4, 5, 3]. Using the above rules, we can show that XORing all these elements will leave us with the array’s unique element.

accum = 3 ^ 5 ^ 4 ^ 5 ^ 3
accum = 0 ^ 3 ^ 5 ^ 4 ^ 5 ^ 3    # 0 is the identity element
accum = 0 ^ 3 ^ 3 ^ 4 ^ 5 ^ 5    # commutativity and associativity rules
accum = 0 ^ 0 ^ 4 ^ 0            # A^A = 0
accum = 4                        # 0 is the identity element

Putting this approach in code would give us something like this.

> array = [3, 5, 4, 5, 3]
# => [3, 5, 4, 5, 3]
> accum = 0
# => 0
> array.each { |number| accum = accum ^ number }
# => [3, 5, 4, 5, 3]
> accum
# => 4

Benchmarks

Let’s use Ruby’s Benchmark module to do a comparison of both approaches.

require 'benchmark'

array = [-1]
1000000.times do |t|
  array << t
  array << t
end

Benchmark.measure do
  count = array.each_with_object(Hash.new(0)) { |number, hash| hash[number] += 1 }
  count.key(1)
end
# => #<Benchmark::Tms:0x007f83fa0279e0 @label="", @real=0.83534, @cstime=0.0, @cutime=0.0, @stime=0.010000000000000009, @utime=0.8300000000000005, @total=0.8400000000000005>

Benchmark.measure do
  accum = 0
  array.each { |number| accum = accum ^ number }
  accum
end
# => #<Benchmark::Tms:0x007f83fa240ba0 @label="", @real=0.136726, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.13999999999999968, @total=0.13999999999999968>

So there you have it. Given an array that contains two million elements, the XOR operator approach turns out to be more than 6 times faster than utilizing a hashmap. That’s quite a nice performance improvement!


A visual explanation of SQL joins

I admit that I find myself going to this article every time I need to write some joins. Hopefully putting it here will save me from always having to google it.


Check the order of your rescue_from handlers!

Our rescue_from handlers used to be defined like shown below. This might look okay to you. At first glance everything looks fine, right?

class WidgetsController < ActionController::Base
  rescue_from ActionController::RoutingError, :with => :render_404
  rescue_from Exception,                      :with => :render_500
end

Turns out it’s not okay at all. Handlers are searched from bottom to top. This means that they should always be defined in order of most generic to most specific. Or in other words, the above code is exactly the wrong thing to do. Instead, we need to write our handlers like shown here.

class WidgetsController < ActionController::Base
  rescue_from Exception,                      :with => :render_500
  rescue_from ActionController::RoutingError, :with => :render_404
end

The javascript event loop

Sometimes you come across an article that is so well written you can’t do anything but link to it. So if you’ve ever wondered why the javascript runtime is so good at asynchronous operations, then you should definitely give this article a read.

Some snippets:

JavaScript runtimes contain a message queue which stores a list of messages to be processed and their associated callback functions. These messages are queued in response to external events (such as a mouse being clicked or receiving the response to an HTTP request) given a callback function has been provided. If, for example a user were to click a button and no callback function was provided – no message would have been enqueued.

In a loop, the queue is polled for the next message (each poll referred to as a “tick”) and when a message is encountered, the callback for that message is executed.

The calling of this callback function serves as the initial frame in the call stack, and due to JavaScript being single-threaded, further message polling and processing is halted pending the return of all calls on the stack.

As well as:

Using Web Workers enables you to offload an expensive operation to a separate thread of execution, freeing up the main thread to do other things. The worker includes a separate message queue, event loop, and memory space independent from the original thread that instantiated it. Communication between the worker and the main thread is done via message passing, which looks very much like the traditional, evented code-examples we’ve already seen.


Bug hunting with git bisect

Today I was looking into what I thought was going to be a simple bug. The problem seemed straightforward enough, so I did a quick grep of the codebase, found three pieces of code that looked like likely culprits, made some modifications, triggered the bug, and found that absolutely nothing had changed. Half an hour and a lot of additional digging later I was stumped. I had no idea what was going on.

It was at this point that I remembered git bisect. This git command asks you to specify two commits: one where things are working, and another one where things are broken. It then does a binary search across the range of commits in between these two. Each search step asks you whether the current commit contains broken code or not, after which it automatically selects the next commit for you. There’s a great tutorial over here.

$ git bisect start
$ git bisect good rj6y4j3
$ git bisect bad 2q7f529

It took me all of five minutes to discover the source of the bug this way. I can safely say that it would have taken me ages to track down this particular bit of offending code as it was located in a custom bug fix for a popular third party library (I’m looking at you Sentry).


Why is MySQL converting my NULLs to blanks?

A while ago I ran into an issue where some records were showing a blank value in a given column. This was a bit weird as a blank value had never been written to that column. After a bit of searching we found that we had a bug that had inadvertently been writing the occasional NULL value to that particular column though. So how did those NULLs get turned into blanks?

It turns out that MySQL can operate in different server modes. You can check your server mode by running one of the two commands below. Note that your server mode will be blank by default.

SHOW GLOBAL VARIABLES where Variable_name = 'sql_mode';
SHOW SESSION VARIABLES where Variable_name = 'sql_mode'

Now that we know about server modes we can talk about data type defaults. Basically, each MySQL column has an implicit default value assigned to it. Under certain circumstances this default value might be used instead of the value you were expecting.

As of MySQL 5.0.2, if a column definition includes no explicit DEFAULT value, MySQL determines the default value as follows:

If the column can take NULL as a value, the column is defined with an explicit DEFAULT NULL clause. This is the same as before 5.0.2.

If the column cannot take NULL as the value, MySQL defines the column with no explicit DEFAULT clause. Exception: If the column is defined as part of a PRIMARY KEY but not explicitly as NOT NULL, MySQL creates it as a NOT NULL column (because PRIMARY KEY columns must be NOT NULL), but also assigns it a DEFAULT clause using the implicit default value. To prevent this, include an explicit NOT NULL in the definition of any PRIMARY KEY column.

For data entry into a NOT NULL column that has no explicit DEFAULT clause, if an INSERT or REPLACE statement includes no value for the column, or an UPDATE statement sets the column to NULL, MySQL handles the column according to the SQL mode in effect at the time:

  • If strict SQL mode is enabled, an error occurs for transactional tables and the statement is rolled back. For nontransactional tables, an error occurs, but if this happens for the second or subsequent row of a multiple-row statement, the preceding rows will have been inserted.
  • If strict mode is not enabled, MySQL sets the column to the implicit default value for the column data type.

We found that our code was sometimes writing NULLs to a NOT NULL column on a server that was not running in strict mode. This in turn caused our NULLs to silently get changed to blanks as this was the column default value. Mystery solved.


Using environment variables in migrations

Recently we had to run a migration that was so slow we couldn’t afford the downtime it would cause. In order to get around this, it was decided to put two code paths in the migration: one that was slow and thorough, and one that was quick but didn’t perform any safety checks.

The first path would be run on a recent database dump, whereas the latter would be executed directly on the live database once the first had finished without error. This was a lot less crazy than it might sound as the particular table under modification had very infrequent changes.

It was decided to use environment variables to allow for easy switching between code paths. This is what the code ended up looking like.

class MyDangerousMigration < ActiveRecord::Migration
  def change
    if ENV['skip_checks'] == 'true'
      # code without safety checks
    else
      # code with safety checks
    end
  end
end

This could then be run like so.

skip_checks=true bundle exec rake db:migrate

Getting connection information with lsof

The lsof command is one of those super useful commands for figuring out what connections are taking place on your machine. While the lsof command technically just lists open files, just about everything in linux (even sockets) is a file!

Some useful commands:

List all network connections

$ lsof -i

COMMAND     PID     USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
Spotify   36908 vaneyckt   53u  IPv4 0x2097c8deb175c0dd      0t0  TCP localhost:4381 (LISTEN)
Spotify   36908 vaneyckt   54u  IPv4 0x2097c8deab18027d      0t0  TCP localhost:4371 (LISTEN)
Spotify   36908 vaneyckt   71u  IPv4 0x2097c8deba747c1d      0t0  UDP *:57621
Spotify   36908 vaneyckt   72u  IPv4 0x2097c8deb18ef4cf      0t0  TCP *:57621 (LISTEN)
Spotify   36908 vaneyckt   77u  IPv4 0x2097c8deb993b255      0t0  UDP ip-192-168-0-101.ec2.internal:61009
Spotify   36908 vaneyckt   90u  IPv4 0x2097c8dea8c4a66d      0t0  TCP ip-192-168-0-101.ec2.internal:62432->lon3-accesspoint-a57.lon3.spotify.com:https (ESTABLISHED)
Spotify   36908 vaneyckt   91u  IPv4 0x2097c8de8d029f2d      0t0  UDP ip-192-168-0-101.ec2.internal:52706

List all network connections on port 4381

$ lsof -i :4381

COMMAND   PID     USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
Spotify 36908 vaneyckt   53u  IPv4 0x2097c8deb175c0dd      0t0  TCP localhost:4381 (LISTEN)

Find ports listening for connections

$ lsof -i | grep -i LISTEN

Spotify   36908 vaneyckt   53u  IPv4 0x2097c8deb175c0dd      0t0  TCP localhost:4381 (LISTEN)
Spotify   36908 vaneyckt   54u  IPv4 0x2097c8deab18027d      0t0  TCP localhost:4371 (LISTEN)
Spotify   36908 vaneyckt   72u  IPv4 0x2097c8deb18ef4cf      0t0  TCP *:57621 (LISTEN)

Find established connections

$ lsof -i | grep -i ESTABLISHED

Spotify   36908 vaneyckt   90u  IPv4 0x2097c8dea8c4a66d      0t0  TCP ip-192-168-0-101.ec2.internal:62432->lon3-accesspoint-a57.lon3.spotify.com:https (ESTABLISHED)

Show all files opened by a given process

$ lsof -p 36908

COMMAND   PID     USER   FD     TYPE             DEVICE  SIZE/OFF     NODE NAME
Spotify 36908 vaneyckt   90u    IPv4 0x2097c8dea8c4a66d       0t0      TCP ip-192-168-0-101.ec2.internal:62432->lon3-accesspoint-a57.lon3.spotify.com:https (ESTABLISHED)
Spotify 36908 vaneyckt   91u    IPv4 0x2097c8de8d029f2d       0t0      UDP ip-192-168-0-101.ec2.internal:52706
Spotify 36908 vaneyckt   92u     REG                1,4   9389456 59387889 /Users/vaneyckt/Library/Caches/com.spotify.client/Data/4a/4a5a23cf1e9dc4210b3c801d57a899098dc12418.file
Spotify 36908 vaneyckt   93u     REG                1,4   8658944 58471210 /private/var/folders/xv/fjmwzr9x5mq_s7dchjq87hjm0000gn/T/.org.chromium.Chromium.6b0Vzp
Spotify 36908 vaneyckt   94u     REG                1,4    524656 54784499 /Users/vaneyckt/Library/Caches/com.spotify.client/Browser/index
Spotify 36908 vaneyckt   95u     REG                1,4     81920 54784500 /Users/vaneyckt/Library/Caches/com.spotify.client/Browser/data_0
Spotify 36908 vaneyckt   96u     REG                1,4    532480 54784501 /Users/vaneyckt/Library/Caches/com.spotify.client/Browser/data_1
Spotify 36908 vaneyckt   97u     REG                1,4   2105344 54784502 /Users/vaneyckt/Library/Caches/com.spotify.client/Browser/data_2
Spotify 36908 vaneyckt   98u     REG                1,4  12591104 54784503 /Users/vaneyckt/Library/Caches/com.spotify.client/Browser/data_3
Spotify 36908 vaneyckt   99r     REG                1,4    144580    28952 /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/Resources/HIToolbox.rsrc

Carefully converting your MySQL database to utf8

Converting all the data in your database can be a nail-biting experience. As you can see from the code below we are doing our best to be super careful. We convert each table separately and before each conversion we store the table’s column types and an MD5 hash of every row in the table (we were lucky enough to not have enormous tables). After converting the table we check that no column types or rows were changed. It goes without saying that we do a trial run on a database dump first.

require 'set'
require 'digest/md5'

CHARACTER_SET = 'utf8'
COLLATION = 'utf8_unicode_ci'

class ConvertAllTablesToUtf8 < ActiveRecord::Migration
  def up
    ActiveRecord::Base.connection.tables.each do |table|
      ActiveRecord::Base.transaction do
        ActiveRecord::Base.connection.execute("LOCK TABLES #{table} WRITE")
          say "starting work on table: #{table}"

          model = table.classify.constantize
          say "associated model: #{model}"

          say 'storing column types information before converting table to unicode'
          column_types_before = model.columns_hash.each_with_object({}) do |(column_name, column_info), column_types_before|
            column_types_before[column_name] = [column_info.sql_type, column_info.type]
          end

          say 'storing set of table data hashes before converting table to unicode'
          table_data_before = Set.new
          model.find_each do |datum|
            table_data_before << Digest::MD5.hexdigest(datum.inspect)
          end

          say 'converting table to unicode'
          execute("ALTER TABLE #{table} CONVERT TO CHARACTER SET #{CHARACTER_SET} COLLATE #{COLLATION}")
          execute("ALTER TABLE #{table} DEFAULT CHARACTER SET #{CHARACTER_SET} COLLATE #{COLLATION}")

          say 'getting column types information after conversion to unicode'
          column_types_after = model.columns_hash.each_with_object({}) do |(column_name, column_info), column_types_after|
            column_types_after[column_name] = [column_info.sql_type, column_info.type]
          end

          say 'getting set of table data hashes after conversion to unicode'
          table_data_after = Set.new
          model.find_each do |datum|
            table_data_after << Digest::MD5.hexdigest(datum.inspect)
          end

          say "checking that column types haven't changed"
          if column_types_before != column_types_after
            raise "Column types of the #{table} table have changed"
          end

          say "checking that data hasn't changed"
          if table_data_before != table_data_after
            raise "Data in the #{table} table has changed"
          end
        ActiveRecord::Base.connection.execute('UNLOCK TABLES')
      end
    end

    execute("ALTER DATABASE #{ActiveRecord::Base.connection.current_database} DEFAULT CHARACTER SET #{CHARACTER_SET} COLLATE #{COLLATION}")
  end

  def down
    raise ActiveRecord::IrreversibleMigration
  end
end

Note how we lock each table before converting it. If we didn’t lock it then new data could be written to the table while we are busy storing MD5 hashes of the rows in preparation for the actual conversion. This, in turn, would cause our migration to complain that new data was present after the conversion had taken place.

We also wrap each table conversion inside a transaction. I’ve talked before about how converting a table will cause an implicit commit, meaning that a rollback won’t undo any of the changes made by the conversion. So why have a transaction here then? Imagine that an exception were to be raised during our migration. In that case we want to ensure our table lock gets dropped as soon as possible. The transaction guarantees this behavior.

Also, if we weren’t so paranoid about checking the before and after data as part of our migration, we could simplify this code quite a bit.

CHARACTER_SET = 'utf8'
COLLATION = 'utf8_unicode_ci'

class ConvertAllTablesToUtf8 < ActiveRecord::Migration
  def up
    ActiveRecord::Base.connection.tables.each do |table|
      say 'converting table to unicode'
      execute("ALTER TABLE #{table} CONVERT TO CHARACTER SET #{CHARACTER_SET} COLLATE #{COLLATION}")
      execute("ALTER TABLE #{table} DEFAULT CHARACTER SET #{CHARACTER_SET} COLLATE #{COLLATION}")
    end

    execute("ALTER DATABASE #{ActiveRecord::Base.connection.current_database} DEFAULT CHARACTER SET #{CHARACTER_SET} COLLATE #{COLLATION}")
  end

  def down
    raise ActiveRecord::IrreversibleMigration
  end
end

Notice how we can drop the lock as the ALTER TABLE command will prevent all writes to the table while simultaneously allowing all reads.

In most cases, ALTER TABLE makes a temporary copy of the original table. MySQL waits for other operations that are modifying the table, then proceeds. It incorporates the alteration into the copy, deletes the original table, and renames the new one. While ALTER TABLE is executing, the original table is readable by other sessions. Updates and writes to the table that begin after the ALTER TABLE operation begins are stalled until the new table is ready, then are automatically redirected to the new table without any failed updates. The temporary copy of the original table is created in the database directory of the new table. This can differ from the database directory of the original table for ALTER TABLE operations that rename the table to a different database.

Furthermore, since we now no longer have a lock on our table we can also drop the transaction. This gives us the much-simplified code shown above.


Character set vs collation

There’s a surprising amount of confusion about the difference between these two terms. The best explanation I’ve found is here.

A character set is a subset of all written glyphs. A character encoding specifies how those characters are mapped to numeric values. Some character encodings, like UTF-8 and UTF-16, can encode any character in the Universal Character Set. Others, like US-ASCII or ISO-8859-1 can only encode a small subset, since they use 7 and 8 bits per character, respectively. Because many standards specify both a character set and a character encoding, the term “character set” is often substituted freely for “character encoding”.

A collation comprises rules that specify how characters can be compared for sorting. Collations rules can be locale-specific: the proper order of two characters varies from language to language.

Choosing a character set and collation comes down to whether your application is internationalized or not. If not, what locale are you targeting?

In order to choose what character set you want to support, you have to consider your application. If you are storing user-supplied input, it might be hard to foresee all the locales in which your software will eventually be used. To support them all, it might be best to support the UCS (Unicode) from the start. However, there is a cost to this; many western European characters will now require two bytes of storage per character instead of one.

Choosing the right collation can help performance if your database uses the collation to create an index, and later uses that index to provide sorted results. However, since collation rules are often locale-specific, that index will be worthless if you need to sort results according to the rules of another locale.

The only thing I’d like to add is that some collations are more cpu intensive than others. For example, utf8_general_ci treats À, Á, and Å as being equal to A when doing comparisons. This is in contrast to utf8_unicode_ci which uses about 10% more cpu, but differentiates between these characters.


MySQL write locks also prevent reads

Locking a table with

table_name = 'widgets'
ActiveRecord::Base.connection.execute("LOCK TABLES #{table_name} WRITE")

ensures that only the current connection can access that table. Other connections cannot even read from this table while it is locked!


Rails migrations and the dangers of implicit commits

I recently came across the migration below. At first sight it looks like everything is okay, but there is actually a very dangerous assumption being made here.

# migration to convert table to utf8
class ConvertWidgetsTableToUtf8Unicode < ActiveRecord::Migration
  def up
    ActiveRecord::Base.transaction do
      table_name = 'widgets'
      say "converting #{table_name} table to utf8_unicode_ci"

      execute("ALTER TABLE #{table_name} CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci")
      execute("ALTER TABLE #{table_name} DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci")
    end
  end
end

Notice how the utf8 conversion code is wrapped inside a transaction. The assumption here is that if something goes wrong the transaction will trigger a rollback. However, an ALTER TABLE command in MySQL causes an implicit commit. This means that the rollback will not undo any changes introduced by the ALTER TABLE command!


Iterating over a hash containing arrays

Last week I was implementing some auditing functionality in a rails app. At some point I was writing a page that would display how the attributes of a given ActiveRecord object had been changed. One of my colleagues spotted this and pointed out the following neat bit of syntactic sugar in Ruby.

changes = {:attribute_a => [1, 2], :attribute_b => [3, 4]}

changes.each do |attribute, (before, after)|
  puts "#{attribute}: #{before} - #{after}"
end

I later learned you can even do things like this.

data = {:foo => [[1, 2], 3]}

data.each do |key, ((a, b), c)|
  puts "#{key}: #{a} - #{b} - #{c}"
end

URI.js and URL manipulation in rails

Manipulating urls in javascript often ends up being an exercise in string interpolation. This rarely produces good looking code. Recently we’ve started enforcing the use of the URI.js library to combat this.

Our new approach has us embed any necessary urls in hidden input fields on the web page in question. Rather than hardcoding these urls, we use the named route functionality offered by rails as this provides more flexibility. When the page gets rendered, these named routes are converted to actual urls through ERB templating. The embedded urls can then be fetched by javascript code and manipulated with URI.js.

It’s no silver bullet, but the resulting code is a lot more readable.


The css !important keyword

Today I learned about the css !important keyword. I was trying to change the way code snippets (gists) were being displayed on a site, but found my css rules being ignored.

As it turned out, the javascript snippets used for embedding gists were adding an additional css stylesheet to the page. Since this stylesheet was getting added after my own stylesheet, its rules had priority over my own. The solution was to add !important to my own rules.

.gist-data {
  border-bottom: 1px !important;
}

Finding models from strings with rails

Imagine you have a Widget model that stores data in a table ‘widgets’. At some point in your rails app you find yourself being given a string ‘Widget’ and are asked to find the Widget model. This can be done like shown here.

str = 'Widget'
model = str.constantize

However, things get a bit harder when you have multiple Widget model subclasses (Widget::A, Widget::B), all of which are stored in the widgets table. This time around you’re given the string ‘Widget::A’ and are asked to get the Widget model.

In order to solve this we’ll need to ask the Widget::A model to give us its table name. If you’re following rails conventions you can then in turn use the table name to get the model you need.

str = 'Widget'
model = str.constantize.table_name.classify.constantize

Note that the above will only work if you’ve followed rails naming conventions though :).


Retrieving data in a time range with rails

I’m writing this mostly as a reminder to myself, since I keep forgetting this :)

Instead of:

widgets = Widget.where("? <= created_at AND created_at <= ?", time_from, time_to)

do this:

widgets = Widget.where(:created_at => time_from .. time_to)

GET vs POST

Today I was looking into why a particular GET request was failing on IE. As it turned out this was due to IE not appreciating long query strings. While going through our nginx logs, we also found nginx had a default query string limit that was being hit sporadically by some other customers as well. The solution in both cases was to move the affected calls from GET to POST.

The above problem prompted me to take a closer look at the differences between GET and POST requests. You probably use these all the time, but do you know how each of them functions?

GET requests

  • can be bookmarked
  • can be cached for faster response time on subsequent request
  • request is stored in browser history
  • uses query strings to send data. There is a limit to the allowable length of a query string.
  • have their url and query strings stored in plaintext in server logs. This is why you should never send passwords over GET requests!
  • use these for actions that retrieve data. For example, you don’t want to use GET requests for posting comments on your blog. Otherwise an attacker could copy a url that posts a specific comment and put it on twitter. Every time someone were to click this link, a comment would now be posted on your blog.

POST requests

  • cannot be bookmarked
  • cannot be cached
  • request will not be stored in browser history
  • uses POST body to send data. There is no limit to the amount of data sent due to the multipart content-type spreading your data across multiple messages when necessary.
  • have their url stored in plaintext in server logs. The data itself will not be logged though.
  • use these for actions that alter data

The dig command

Today I learned of the existence of the dig command. A very useful little tool for DNS lookups. Here’s an example of it in action.

$ dig www.google.com

; <<>> DiG 9.8.3-P1 <<>> www.google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4868
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.google.com.			IN	A

;; ANSWER SECTION:
www.google.com.		72	IN	A	74.125.24.105
www.google.com.		72	IN	A	74.125.24.103
www.google.com.		72	IN	A	74.125.24.104
www.google.com.		72	IN	A	74.125.24.99
www.google.com.		72	IN	A	74.125.24.147
www.google.com.		72	IN	A	74.125.24.106

;; Query time: 11 msec
;; SERVER: 192.168.0.1#53(192.168.0.1)
;; WHEN: Sat Aug 29 13:38:48 2015
;; MSG SIZE  rcvd: 128

Profiling rails assets precompilation

Assets precompilation on rails can take a fair bit of time. This is especially annoying in scenarios where you want to deploy your app multiple times a day. Let’s see if we can come up with a way to actually figure out where all this time is being spent. Also, while I will be focusing on rails 3.2 in this post, the general principle should be easy enough to apply to other rails versions.

Our first call of action is finding the assets precompilation logic. A bit of digging will turn up the assets.rake file for rails 3.2. The relevant code starts on lines 59-67 and from there on out invokes methods throughout the entire file.

# lines 59-67 of assets.rake
task :all do
  Rake::Task["assets:precompile:primary"].invoke
  # We need to reinvoke in order to run the secondary digestless
  # asset compilation run - a fresh Sprockets environment is
  # required in order to compile digestless assets as the
  # environment has already cached the assets on the primary
  # run.
  if Rails.application.config.assets.digest
    ruby_rake_task("assets:precompile:nondigest", false)
  end
end

When we follow the calls made by the code above we can see that the actual compilation takes place on lines 50-56 of assets.rake and is done by the compile method of the Sprockets::StaticCompiler class.

# compile method of Sprockets::StaticCompiler class
def compile
  manifest = {}
  env.each_logical_path(paths) do |logical_path|
    if asset = env.find_asset(logical_path)
      digest_path = write_asset(asset)
      manifest[asset.logical_path] = digest_path
      manifest[aliased_path_for(asset.logical_path)] = digest_path
    end
  end
  write_manifest(manifest) if @manifest
end

Now that we know which code does the compiling, we can think of two ways to add some profiling to this. We could checkout the rails repo from Github, modify it locally and point our Gemfile to our modified local version of rails. Or, we could create a new rake task and monkey patch the compile method of the Sprockets::StaticCompiler class. We’ll go with the second option here as it is the more straightforward to implement.

We’ll create a new rake file in the /lib/tasks folder of our rails app and name it profile_assets_precompilation.rake. We then copy the contents of assets.rake into it, and wrap this code inside a new ‘profile’ namespace so as to avoid conflicts. At the top of this file we’ll also add our monkey patched compile method so as to make it output profiling info. The resulting file should look like shown below.

namespace :profile do
  # monkey patch the compile method to output compilation times
  module Sprockets
    class StaticCompiler
      def compile
        manifest = {}
        env.each_logical_path(paths) do |logical_path|
          start_time = Time.now.to_f

          if asset = env.find_asset(logical_path)
            digest_path = write_asset(asset)
            manifest[asset.logical_path] = digest_path
            manifest[aliased_path_for(asset.logical_path)] = digest_path
          end

          # our profiling code
          duration = Time.now.to_f - start_time
          puts "#{logical_path} - #{duration.round(3)} seconds"
        end
        write_manifest(manifest) if @manifest
      end
    end
  end

  # contents of assets.rake
  namespace :assets do
    def ruby_rake_task(task, fork = true)
      env    = ENV['RAILS_ENV'] || 'production'
      groups = ENV['RAILS_GROUPS'] || 'assets'
      args   = [$0, task,"RAILS_ENV=#{env}","RAILS_GROUPS=#{groups}"]
      args << "--trace" if Rake.application.options.trace
      if $0 =~ /rake\.bat\Z/i
        Kernel.exec $0, *args
      else
        fork ? ruby(*args) : Kernel.exec(FileUtils::RUBY, *args)
      end
    end

    # We are currently running with no explicit bundler group
    # and/or no explicit environment - we have to reinvoke rake to
    # execute this task.
    def invoke_or_reboot_rake_task(task)
      if ENV['RAILS_GROUPS'].to_s.empty? || ENV['RAILS_ENV'].to_s.empty?
        ruby_rake_task task
      else
        Rake::Task[task].invoke
      end
    end

    desc "Compile all the assets named in config.assets.precompile"
    task :precompile do
      invoke_or_reboot_rake_task "assets:precompile:all"
    end

    namespace :precompile do
      def internal_precompile(digest=nil)
        unless Rails.application.config.assets.enabled
          warn "Cannot precompile assets if sprockets is disabled. Please set config.assets.enabled to true"
          exit
        end

        # Ensure that action view is loaded and the appropriate
        # sprockets hooks get executed
        _ = ActionView::Base

        config = Rails.application.config
        config.assets.compile = true
        config.assets.digest  = digest unless digest.nil?
        config.assets.digests = {}

        env      = Rails.application.assets
        target   = File.join(Rails.public_path, config.assets.prefix)
        compiler = Sprockets::StaticCompiler.new(env,
                                                 target,
                                                 config.assets.precompile,
                                                 :manifest_path => config.assets.manifest,
                                                 :digest => config.assets.digest,
                                                 :manifest => digest.nil?)
        compiler.compile
      end

      task :all do
        Rake::Task["assets:precompile:primary"].invoke
        # We need to reinvoke in order to run the secondary digestless
        # asset compilation run - a fresh Sprockets environment is
        # required in order to compile digestless assets as the
        # environment has already cached the assets on the primary
        # run.
        ruby_rake_task("assets:precompile:nondigest", false) if Rails.application.config.assets.digest
      end

      task :primary => ["assets:environment", "tmp:cache:clear"] do
        internal_precompile
      end

      task :nondigest => ["assets:environment", "tmp:cache:clear"] do
        internal_precompile(false)
      end
    end

    desc "Remove compiled assets"
    task :clean do
      invoke_or_reboot_rake_task "assets:clean:all"
    end

    namespace :clean do
      task :all => ["assets:environment", "tmp:cache:clear"] do
        config = Rails.application.config
        public_asset_path = File.join(Rails.public_path, config.assets.prefix)
        rm_rf public_asset_path, :secure => true
      end
    end

    task :environment do
      if Rails.application.config.assets.initialize_on_precompile
        Rake::Task["environment"].invoke
      else
        Rails.application.initialize!(:assets)
        Sprockets::Bootstrap.new(Rails.application).run
      end
    end
  end
end

Now we can run bundle exec rake profile:assets:precompile to precompile our assets while outputting profiling info. Hopefully we can now finally figure out why this is always taking so long :).


Regarding if statement scope in Ruby

I recently learned that if statements in Ruby do not introduce scope. This means that you can write code like shown below and it’ll work fine.

# perfectly valid Ruby code
if true
  foo = 5
end

puts foo

At first this seemed a bit weird to me. It wasn’t until I read this that I realized Ruby was even more versatile than I had first thought. As it turns out, it is this somewhat unconventional scoping rule that allows us to conditionally replace methods.

if foo == 5
  def some_method
    # do something
  end
else
  def some_method
    # do something else
  end
end

As well as conditionally modify implementations.

if foo == 5
  class someClass
    # ...
  end
else
  module someModule
    # ...
  end
end

And that’s amazing!


EC2 instance cost comparison

Amazon’s pricing scheme for its ec2 instances never struck me as particularly transparent. Until recently some of my DevOps colleagues even estimated cost by cross-referencing instance details with pricing information. While this approach gives reasonable results for finding the cost of a given instance type, it doesn’t lend itself very well to comparing prices across a range of different types.

When talking to an ex-colleague of mine about the hardships encountered for such a common task, he pointed me to this absolutely brilliant page. It’s so unbelievably simple and well thought-out that I can’t help getting ever so slightly annoyed with whomever is in charge of communicating Amazon’s pricing structure and the subpar job they are doing.