Ruby concurrency: in praise of the mutex

When reading about Ruby you will inevitably be introduced to the Global Interpreter Lock. This mechanism tends to come up in explanations of why Ruby threads run concurrently on a single core, rather than being scheduled across multiple cores in true parallel fashion. This single core scheduling approach also explains why adding threads to a Ruby program does not necessarily result in faster execution times.

This post will start by explaining some of the details behind the GIL. Next up, we’ll take a look at the three crucial concepts of concurrency: atomicity, visibility, and ordering. While most developers are familiar with atomicity, the concept of visibility is often not very well understood. We will be going over these concepts in quite some detail and will illustrate how to address their needs through correct usage of the mutex data structure.

Parallelism and the GIL

Ruby’s Global Interpreter Lock is a global lock around the execution of Ruby code. Before a Ruby thread can execute any code, it first needs to acquire this lock. A thread holding the GIL will be forced to release it after a certain amount of time, at which point the kernel can hand the GIL to another Ruby thread. As the GIL can only be held by one thread at a time, it effectively prevents two Ruby threads from being executed at the same time.

Luckily Ruby comes with an optimization that forces threads to let go off the GIL when they find themselves waiting on blocking IO to complete. Such threads will use the ppoll system call to be notified when their blocking IO has finished. Only then will they make an attempt to reacquire the GIL again. This type of behavior holds true for all blocking IO calls, as well as backtick and system calls. So even with the Global Interpreter Lock, Ruby is still able to have moments of true parallelism.

Note that the GIL is specific to the default Ruby interpreter (MRI) which relies on a global lock to protect its internals from race conditions. The GIL also makes it possible to safely interface the MRI interpreter with C libraries that may not be thread-safe themselves. Other interpreters have taken different approaches to the concept of a global lock; Rubinius opts for a collection of fine-grained locks instead of a single global one, whereas JRuby does not use global locking at all.

Concurrency and the Mutex

There are three crucial concepts to concurrency: atomicity, visibility, and ordering. We’ll be taking a look at how Ruby’s mutex data structure addresses these. It is worth pointing out that different languages tackle these concepts in different ways. As such, the mutex-centric approach described here is only guaranteed to work in Ruby.

Atomicity

Atomicity is probably the best-known concurrency concept. A section of code is said to atomically modify the state of an object if all other threads are unable to see any of the intermediate states of the object being modified. These other threads either see the state as it was before the operation, or they see the state as it is after the operation.

In the example below we have created a counters array that holds ten entries, each of which is set to zero. This array represents an object that we want to modify, and its entries represent its internal state. Let’s say we have five threads, each of which executes a loop for 100.000 iterations that increments every entry by one. Intuitively we’d expect the output of this to be an array with each entry set to 500.000. However, as we can see below, this is not the case.

# atomicity.rb
counters = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

threads = 5.times.map do
  Thread.new do
    100000.times do
      counters.map! { |counter| counter + 1 }
    end
  end
end
threads.each(&:join)

puts counters.to_s
# => [500000, 447205, 500000, 500000, 500000, 500000, 203656, 500000, 500000, 500000]

The reason for this unexpected output is that counters.map! { |counter| counter + 1 } is not atomic. For example, imagine that our first thread has just read the value of the first entry, incremented it by one, and is now getting ready to write this incremented value to the first entry of our array. However, before our thread can write this incremented value, it gets interrupted by the second thread. This second thread then goes on to read the current value of the first entry, increments it by one, and succeeds in writing the result back to the first entry of our array. Now we have a problem!

We have a problem because the first thread got interrupted before it had a chance to write its incremented value to the array. When the first thread resumes, it will end up overwriting the value that the second thread just placed in the array. This will cause us to essentially lose an increment operation, which explains why our program output has entries in it that are less than 500.000.

It should hopefully be clear that none of this would have happened if we had made sure that counters.map! { |counter| counter + 1 } was atomic. This would have made it impossible for the second thread to just come in and modify the intermediate state of the counters array.

# atomicity.rb
mutex = Mutex.new
counters = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

threads = 5.times.map do
  Thread.new do
    100000.times do
      mutex.synchronize do
        counters.map! { |counter| counter + 1 }
      end
    end
  end
end
threads.each(&:join)

puts counters.to_s
# => [500000, 500000, 500000, 500000, 500000, 500000, 500000, 500000, 500000, 500000]

Atomicity can be accomplished by using a mutex as a locking mechanism that ensures no two threads can simultaneously execute the same section of code. The code above shows how we can prevent a thread executing counters.map! { |counter| counter + 1 } from being interrupted by other threads wanting to execute the same code. Also, be sure to note that mutex.synchronize only prevents a thread from being interrupted by others wanting to execute code wrapped inside the same mutex variable!

Visibility

Visibility determines when the results of the actions performed by a thread become visible to other threads. For example, when a thread wants to write an updated value to memory, that updated value may end up being put in a cache for a while until the kernel decides to flush it to main memory. Other threads that read from that memory will therefore end up with a stale value!

The code below shows an example of the visibility problem. Here we have several threads flipping the boolean values in the flags array over and over again. The code responsible for changing these values is wrapped inside a mutex, so we know the intermediate states of the flags array won’t be visible to other threads. We would thus expect the output of this program to contain the same boolean value for every entry of this array. However, we shall soon see that this does not always hold true.

# visibility.rb
mutex = Mutex.new
flags = [false, false, false, false, false, false, false, false, false, false]

threads = 50.times.map do
  Thread.new do
    100000.times do
      puts flags.to_s
      mutex.synchronize do
        flags.map! { |f| !f }
      end
    end
  end
end
threads.each(&:join)
$ ruby visibility.rb > visibility.log
$ grep -Hnri 'true, false' visibility.log | wc -l
    30

This code will produce five million lines of output. We’ll use the > operator to write all these lines to a file. Having done this, we can then grep for inconsistencies in the output. We would expect every line of the output to contain an array with all its entries set to the same boolean value. However, it turns out that this only holds true for 99.9994% of all lines. Sometimes the flipped boolean values don’t get written to memory fast enough, causing other threads to read stale data. This is a great illustration of the visibility problem.

Luckily we can solve this problem by using a memory barrier. A memory barrier enforces an ordering constraint on memory operations thereby preventing the possibility of reading stale data. In Ruby, a mutex not only acts as an atomic lock, but also functions as a memory barrier. When wanting to read the value of a variable being modified by multiple threads, a memory barrier will effectively tell your program to wait until all in-flight memory writes are complete. In practice this means that if we use a mutex when writing to a variable, we need to use this same mutex when reading from that variable as well.

# visibility.rb
mutex = Mutex.new
flags = [false, false, false, false, false, false, false, false, false, false]

threads = 50.times.map do
  Thread.new do
    100000.times do
      mutex.synchronize do
        puts flags.to_s
      end
      mutex.synchronize do
        flags.map! { |f| !f }
      end
    end
  end
end
threads.each(&:join)
$ ruby visibility.rb > visibility.log
$ grep -Hnri 'true, false' visibility.log | wc -l
    0

As expected, this time we found zero inconsistencies in the output data due to us using the same mutex for both reading and writing the boolean values of the flags array. Do keep in mind that not all languages allow for using a mutex as a memory barrier, so be sure to check the specifics of your favorite language before going off to write concurrent code.

Ordering

As if dealing with visibility isn’t hard enough, the Ruby interpreter is also allowed to change the order of the instructions in your code in an attempt at optimization. Before I continue I should point out that there is no official specification for the Ruby language. This can make it hard to find information about topics such as this. So I’m just going to describe how I think instruction reordering currently works in Ruby.

Your Ruby code gets compiled to bytecode by the Ruby interpreter. The interpreter is free to reorder your code in an attempt to optimize it. This bytecode will then generate a set of CPU instructions, which the CPU is free to reorder as well. I wasn’t able to come up with example code that actually showcases this reordering behavior, so this next bit is going to be somewhat hand-wavy. Let’s say we were given the code shown below (original source).

# ordering.rb
a = false
b = false
threads = []

thr1 = Thread.new do
  a = true
  b = true
end

thr2 = Thread.new do
  r1 = b # could see true
  r2 = a # could see false
  r3 = a # could see true
  puts (r1 && !r2) && r3 # could print true
end

thr1.join
thr2.join

Since there are a lot of ways for instruction reordering to take place, it is not impossible for b = true to be executed before a = true. In theory, this could therefore allow for thr2 to end up outputting true. This is rather counterintuitive, as this would only be possible if the variable b had changed value before the variable a.

Luckily there is no need to worry too much about this. When looking at the code above, it should be obvious that code reordering is going to be the least of its problems. The lack of any kind of synchronization to help deal with atomicity and visibility issues in this threaded program is going to cause way bigger headaches than code reordering ever could.

Those synchronization issues can be fixed by using a mutex. By introducing a mutex we are explicitly telling the interpreter and CPU how our code should behave, thus preventing any problematic code reordering from occurring. Dealing with atomicity and visibility issues will therefore implicitly prevent any dangerous code reordering.

Conclusion

I hope this post has helped show just how easy it can be to introduce bugs in concurrent code. In my experience, the concept of memory barriers is often poorly understood, which can result in introducing some incredibly hard to find bugs. Luckily, as we saw in this post, the mutex data structure can be a veritable panacea for addressing these issues in Ruby.

Please feel free to contact me if you think I got anything wrong. While all of the above is correct to the best of my knowledge, the lack of an official Ruby specification can make it hard to locate information that is definitively without error.


How to write your own rspec retry mechanism

Imagine you have an rspec test suite filled with system tests. Each system test simulates how a real user would interact with your app by opening a browser session through which it fills out text fields, clicks on buttons, and sends data to public endpoints. Unfortunately, browser drivers are not without bugs and sometimes your tests will fail because of these. Wouldn’t it be nice if we could automatically retry these failed tests?

This article starts by investigating how rspec formatters can be used to help us keep track of failed tests. Next, we’ll use this information to take a first stab at creating a rake task that can automatically retry failed tests. Lastly, we’ll explore how to further improve our simple rake task so as to make it ready for use in production.

Note that any code shown in this post is only guaranteed to work with rspec 3.3. In the past I’ve written similar code for other rspec versions as well though. So don’t worry, it shouldn’t be too hard to get all of this to work on whatever rspec version you find yourself using.

Rspec formatters

Rspec generates its command line output by relying on formatters that receive messages on events like example_passed and example_failed. We can use these hooks to help us keep track of failed tests by having them write the descriptions of failed tests to a text file named tests_failed. Our FailureFormatter class does just that.

# failure_formatter.rb
require 'rspec/core/formatters/progress_formatter'

class FailureFormatter < RSpec::Core::Formatters::ProgressFormatter
  RSpec::Core::Formatters.register self, :example_failed

  def example_failed(notification)
    super
    File.open('tests_failed', 'a') do |file|
      file.puts(notification.example.full_description)
    end
  end
end

We’ll soon have a look at how tests behave when we try to run them with the formatter shown above. But first, let’s prepare some example tests. We’ll create two tests. One of which will always pass, and another one which will always fail.

# my_fake_tests_spec.rb
describe 'my fake tests', :type => :feature do

  it 'this scenario should pass' do
    expect(true).to eq true
  end

  it 'this scenario should fail' do
    expect(false).to eq true
  end
end

Having done that, we can now run our tests with the FailureFormatter we wrote earlier. As you can see below, we’ll have to pass both --require and --format params in order to get our formatter to work. I’m also using the --no-fail-fast flag so as to prevent our test suite from exiting upon encountering its first failure.

$ bundle exec rspec --require ./spec/formatters/failure_formatter.rb --format FailureFormatter --no-fail-fast
.F

Failures:

  1) my fake tests this scenario should fail
     Failure/Error: expect(false).to eq true

       expected: true
            got: false

       (compared using ==)
     # ./spec/my_fake_tests_spec.rb:8:in `block (2 levels) in <top (required)>'

Finished in 0.02272 seconds (files took 0.0965 seconds to load)
2 examples, 1 failure

Failed examples:

rspec ./spec/my_fake_tests_spec.rb:7 # my fake tests this scenario should fail

After running this, we should now have a tests_failed file that contains a single line describing our failed test. As we can see in the snippet below, this is indeed the case.

$ cat tests_failed

my fake tests this scenario should fail

Take a moment to reflect on what we have just done. By writing just a few lines of code we have effectively created a logging mechanism that will help us keep track of failed tests. In the next section we will look at how we can make use of this mechanism to automatically rerun failed tests.

First pass at creating the retry task

In this section we will create a rake task that runs our rspec test suite and automatically retries any failed tests. The finished rake task is shown below. For now, have a look at this code and then we’ll go over its details in the next few paragraphs.

require 'fileutils'

task :rspec_with_retries, [:max_tries] do |_, args|
  max_tries = args[:max_tries].to_i

  # construct initial rspec command
  command = 'bundle exec rspec --require ./spec/formatters/failure_formatter.rb --format FailureFormatter --no-fail-fast'

  max_tries.times do |t|
    puts "\n"
    puts '##########'
    puts "### STARTING TEST RUN #{t + 1} OUT OF A MAXIMUM OF #{max_tries}"
    puts "### executing command: #{command}"
    puts '##########'

    # delete tests_failed file left over by previous run
    FileUtils.rm('tests_failed', :force => true)

    # run tests
    puts `#{command}`

    # early out
    exit 0 if $?.exitstatus.zero?
    exit 1 if (t == max_tries - 1)

    # determine which tests need to be run again
    failed_tests = []
    File.open('tests_failed', 'r') do |file|
      failed_tests = file.readlines.map { |line| "\"#{line.strip}\"" }
    end

    # construct command to rerun just the failed tests
    command  = ['bundle exec rspec']
    command += Array.new(failed_tests.length, '-e').zip(failed_tests).flatten
    command += ['--require ./spec/formatters/failure_formatter.rb --format FailureFormatter --no-fail-fast']
    command = command.join(' ')
  end
end

The task executes the bundle exec rspec command a max_tries number of times. The first iteration runs the full rspec test suite with the FailureFormatter class and writes the descriptions of failed tests to a tests_failed file. Subsequent iterations read from this file and use the -e option to rerun the tests listed there.

Note that these subsequent iterations use the FailureFormatter as well. This means that any tests that failed during the second iteration will get written to the tests_failed file to be retried by the third iteration. This continues until we reach the max number of iterations or until one of our iterations has all its tests pass.

Every iteration deletes the tests_failed file from the previous iteration. For this we use the FileUtils.rm method with the :force flag set to true. This flag ensures that the program doesn’t crash in case the tests_failed file doesn’t exist. The above code relies on backticks to execute the bundle exec rspec subprocess. Because of this we need to use the global variable $? to access the exit status of this subprocess.

Below you can see the output of a run of our rake task. Notice how the first iteration runs both of our tests, whereas the second and third iterations rerun just the failed test. This shows our retry mechanism is indeed working as expected.

$ rake rspec_with_retries[3]

##########
### STARTING TEST RUN 1 OUT OF A MAXIMUM OF 3
### executing command: bundle exec rspec --require ./spec/formatters/failure_formatter.rb --format FailureFormatter --no-fail-fast
##########
.F

Failures:

  1) my fake tests this scenario should fail
     Failure/Error: expect(false).to eq true

       expected: true
            got: false

       (compared using ==)
     # ./spec/my_fake_tests_spec.rb:8:in `block (2 levels) in <top (required)>'

Finished in 0.02272 seconds (files took 0.0965 seconds to load)
2 examples, 1 failure

Failed examples:

rspec ./spec/my_fake_tests_spec.rb:7 # my fake tests this scenario should fail


##########
### STARTING TEST RUN 2 OUT OF A MAXIMUM OF 3
### executing command: bundle exec rspec -e "my fake tests this scenario should fail" --require ./spec/formatters/failure_formatter.rb --format FailureFormatter --no-fail-fast
##########
Run options: include {:full_description=>/my\ fake\ tests\ this\ scenario\ should\ fail/}
F

Failures:

  1) my fake tests this scenario should fail
     Failure/Error: expect(false).to eq true

       expected: true
            got: false

       (compared using ==)
     # ./spec/my_fake_tests_spec.rb:8:in `block (2 levels) in <top (required)>'

Finished in 0.02286 seconds (files took 0.09094 seconds to load)
1 example, 1 failure

Failed examples:

rspec ./spec/my_fake_tests_spec.rb:7 # my fake tests this scenario should fail


##########
### STARTING TEST RUN 3 OUT OF A MAXIMUM OF 3
### executing command: bundle exec rspec -e "my fake tests this scenario should fail" --require ./spec/formatters/failure_formatter.rb --format FailureFormatter --no-fail-fast
##########
Run options: include {:full_description=>/my\ fake\ tests\ this\ scenario\ should\ fail/}
F

Failures:

  1) my fake tests this scenario should fail
     Failure/Error: expect(false).to eq true

       expected: true
            got: false

       (compared using ==)
     # ./spec/my_fake_tests_spec.rb:8:in `block (2 levels) in <top (required)>'

Finished in 0.02378 seconds (files took 0.09512 seconds to load)
1 example, 1 failure

Failed examples:

rspec ./spec/my_fake_tests_spec.rb:7 # my fake tests this scenario should fail

The goal of this section was to introduce the general idea behind our retry mechanism. There are however several shortcomings in the code that we’ve shown here. The next section will focus on identifying and fixing these.

Perfecting the retry task

The code in the previous section isn’t all that bad, but there are a few things related to the bundle exec rspec subprocess that we can improve upon. In particular, using backticks to initiate subprocesses has several downsides:

  • the standard output stream of the subprocess gets written into a buffer which we cannot print until the subprocess finishes
  • the standard error stream does not even get written to this buffer
  • the backticks approach does not return the id of the subprocess to us

This last downside is especially bad as not having the subprocess id makes it hard for us to cancel the subprocess in case the rake task gets terminated. This is why I prefer to use the childprocess gem for handling subprocesses instead.

require 'fileutils'
require 'childprocess'

task :rspec_with_retries, [:max_tries] do |_, args|
  max_tries = args[:max_tries].to_i

  # exit hook to ensure rspec process gets stopped when CTRL+C (SIGTERM is pressed)
  # needs to be set outside the times loop as otherwise each iteration would add its
  # own at_exit hook
  process = nil
  at_exit do
    process.stop unless process.nil?
  end

  # construct initial rspec command
  command = ['bundle', 'exec', 'rspec', '--require', './spec/formatters/failure_formatter.rb', '--format', 'FailureFormatter', '--no-fail-fast']

  max_tries.times do |t|
    puts "\n"
    puts '##########'
    puts "### STARTING TEST RUN #{t + 1} OUT OF A MAXIMUM OF #{max_tries}"
    puts "### executing command: #{command}"
    puts '##########'

    # delete tests_failed file left over by previous run
    FileUtils.rm('tests_failed', :force => true)

    # run tests in separate process
    process = ChildProcess.build(*command)
    process.io.inherit!
    process.start
    process.wait

    # early out
    exit 0 if process.exit_code.zero?
    exit 1 if (t == max_tries - 1)

    # determine which tests need to be run again
    failed_tests = []
    File.open('tests_failed', 'r') do |file|
      failed_tests = file.readlines.map { |line| line.strip }
    end

    # construct command to rerun just the failed tests
    command  = ['bundle', 'exec', 'rspec']
    command += Array.new(failed_tests.length, '-e').zip(failed_tests).flatten
    command += ['--require', './spec/formatters/failure_formatter.rb', '--format', 'FailureFormatter', '--no-fail-fast']
  end
end

As we can see from the line process = ChildProcess.build(*command), this gem makes it trivial to obtain the subprocess id. This then allows us to write an at_exit hook that shuts this subprocess down upon termination of our rake task. For example, using ctrl+c to cease the rake task will now cause the rspec subprocess to stop as well.

This gem also makes it super easy to inherit the stdout and stderr streams from the parent process (our rake task). This means that anything that gets written to the stdout and stderr streams of the subprocess will now be written directly to the stdout and stderr streams of our rake task. Or in other words, our rspec subprocess is now able to output directly to the rake task’s terminal session. Having made these improvements, our rspec_with_retries task is now ready for use in production.

Conclusion

I hope this post helped some people out there who find themselves struggling to deal with flaky tests. Please note that a retry mechanism such as this is really only possible because of rspec’s powerful formatters. Get in touch if you have any examples of other cool things built on top of this somewhat underappreciated feature!


The disaster that is Ruby's timeout method

On paper, Ruby’s timeout method looks like an incredibly useful piece of code. Ever had a network request occasionally slow down your entire program because it just wouldn’t finish? That’s where timeout comes in. It provides a hard guarantee that a block of code will be finished within a specified amount of time.

require 'timeout'

timeout(5) do
  # block of code that should be interrupted if it takes more than 5 seconds
end

There’s one thing the documentation doesn’t tell you though. If any of the lines in that block of code introduces side effects that rely on the execution of later lines of code to leave things in a stable state, then using the timeout method is a great way to introduce instability in your program. Examples of this include pretty much any program that is not entirely without stateful information. Let’s have a closer look at this method to try and figure out what’s going on here exactly.

Exceptions absolutely anywhere

The problem with timeout is that it relies upon Ruby’s questionable ability to have one thread raise an exception absolutely anywhere in an entirely different thread. The idea is that when you place code inside a timeout block, this code gets wrapped inside a new thread that executes in the background while the main thread goes to sleep for 5 seconds. Upon waking, the main thread grabs the background thread and forcefully stops it by raising a Timeout::Error exception on it (actual implementation).

# raising_exceptions.rb
# threads can raise exceptions in other threads
thr = Thread.new do
  puts '...initializing resource'
  sleep 1

  puts '...using resource'
  sleep 1

  puts '...cleaning resource'
  sleep 1
end

sleep 1.5
thr.raise('raising an exception in the thread')
$ ruby raising_exeptions.rb

...initializing resource
...using resource

The problem with this approach is that the main thread does not care what code the background thread is executing when it raises the exception. This means that the engineer responsible for the code that gets executed by the background thread needs to assume an exception can get thrown from absolutely anywhere within her code. This is madness! No one can be expected to place exception catchers around every single block of code!

The following code further illustrates the problem of being able to raise an exception absolutely anywhere. Turns out that absolutely anywhere includes locations like the inside of ensure blocks. These locations are generally not designed for handling any exceptions at all. I hope you weren’t using an ensure block to terminate your database connection!

# ensure_block.rb
# raising exceptions inside an ensure block of another thread
# note how we never finish cleaning the resource here
thr = Thread.new do
  begin
    puts '...initializing resource'
    sleep 1

    raise 'something went wrong'

    puts '...using resource'
    sleep 1
  ensure
    puts '...started cleaning resource'
    sleep 1
    puts '...finished cleaning resource'
  end
end

sleep 1.5
thr.raise('raising an exception in the thread')

# prevent program from immediately terminating after raising exception
sleep 5
$ ruby ensure_blocks.rb

...initializing resource
...started cleaning resource

Real world example

Recently, I spent a lot of time working with the curb http client. I ended up wrapping quite a few of my curb calls within timeout blocks because of tight time constraints. However, this caused great instability within the system I was working on. Sometimes a call would work, whereas other times that very same call would throw an exception about an invalid handle. It was this that caused me to start investigating the timeout method.

After having a bit of think, I came up with a proof of concept that showed beyond a doubt that the timeout method was introducing instability in the very internals of my http client. The finished proof of concept code can look a bit complex, so rather than showing the final concept code straightaway, I’ll run you through my thought process instead.

Let’s start with the basics and write some code that uses the http client to fetch a random google page. A randomized parameter is added to the google url in order to circumvent any client-side caching. The page fetch itself is wrapped inside a timeout block as we are interested in testing whether the timeout method is corrupting the http client.

# basics.rb
# timeout doesn't get triggered
require 'curb'
require 'timeout'

timeout(1) do
  Curl.get("http://www.google.com?foo=#{rand}")
end

This code will rarely timeout as a page fetch generally takes way less than one second to complete. This is why we’re going to wrap our page fetch inside an infinite while loop.

# infinite_loop.rb
# timeout gets triggered and Timeout::Error exception gets thrown
require 'curb'
require 'timeout'

timeout(1) do
  while true
    Curl.get("http://www.google.com?foo=#{rand}")
  end
end
$ ruby infinite_loop.rb

/Users/vaneyckt/.rvm/gems/ruby-2.0.0-p594/gems/curb-0.8.8/lib/curl/easy.rb:68:
  in 'perform': execution expired (Timeout::Error)

The above code is now timing out and throwing a Timeout::Error exception. Next we want to determine whether the timing out of a page fetch could corrupt the internal state of the http client, thereby causing problems for a subsequent page fetch. We’ll need to make lots of page fetches to test this, so we’re going to wrap all of our current code inside another infinite while loop. Furthermore, we don’t want any Timeout::Error exceptions to break us out of this while loop, so we’re going to catch and ignore these exceptions inside the while loop we just created. This gives us our finished proof of concept code.

# proof_of_concept.rb
# timeout corrupts the very internals of the curb http client
require 'curb'
require 'timeout'

while true
  begin
    timeout(1) do
      while true
        Curl.get("http://www.google.com?foo=#{rand}")
      end
    end
  rescue Timeout::Error => e
  end
end
$ ruby proof_of_concept.rb

/Users/vaneyckt/.rvm/gems/ruby-2.0.0-p594/gems/curb-0.8.8/lib/curl/easy.rb:67:
  in 'add': CURLError: The easy handle is already added to a multi handle
  (Curl::Err::MultiAddedAlready)

Running the above program will result in an exception being thrown after a few seconds. At some point, the timeout method is causing a Timeout::Error exception to be raised inside a critical code path of the http client. This badly timed Timeout::Error exception leaves the client in an invalid state, which in turn causes the next page fetch to fail with the exception shown above. Hopefully this illustrates why you should avoid creating programs that can have Timeout::Error exceptions pop up absolutely anywhere.

Conclusion

I hope this has convinced you there is nothing you can do to prevent timeout from doing whatever it wants to your program’s internal state. There is just no way a program can deal with Timeout::Error exceptions being able to potentially pop up absolutely anywhere. The only time you can really get away with using timeouts is when writing functional code that does not rely on any state. In all other cases, it is best to just avoid timeouts entirely.


A javascript closures recap

Javascript closures have always been one those things that I used to navigate by intuition. Recently however, upon stumbling across some code that I did not quite grok, it became clear I should try and obtain a more formal understanding. This post is mainly intended as a quick recap for my future self. It won’t go into all the details about closures; instead it will focus on the bits that I found most helpful.

There seem to be very few step-by-step overviews of javascript closures. As a matter of fact, I only found two. Luckily they are both absolute gems. You can find them here and here. I heartily recommend both these articles to anyone wanting to gain a more complete understanding of closures.

Closure basics

I’m going to shamelessly borrow a few lines from the first of the two articles linked above to illustrate the basic concept of a closure.

function doSome() {
  var x = 10;

  function f(y) {
    return x + y;
  }
  return f;
}

var foo = doSome();
foo(20); // returns 30
foo(30); // returns 40

In the above example, the function f creates a closure. If you just look at f, it seems that the variable x is not defined. Actually, x is caught from the enclosing function. A closure is a function which closes (or survives) variables of the enclosing function. In the above example, the function f creates a closure because it closes the variable x into the scope of itself. If the closure object, a Function instance, is still alive, the closed variable x keeps alive. It’s like that the scope of the variable x is extended.

This is really all you need to know about closures: they refer to variables declared outside the scope of the function and by doing so keep these variables alive. Closure behavior can be entirely explained just by keeping these two things in mind.

Closures and primitive data types

The rest of this post will go over some code examples to illustrate the behavior of closures for both primitive and object params. In this section, we’ll have a look at the behavior of a closure with a primitive data type param.

Example 1

The code below will be our starting point for studying closures. Be sure to take a good look at it, as all our examples will be a variation of this code. Throughout this post, we are going to try and understand closures by examining the values returned by the foo() function.

var prim = 1;

var foo = function(p) {
  var f = function() {
    return p;
  }
  return f;
}(prim);

foo();    // returns 1
prim = 3;
foo();    // returns 1

When the javascript runtime wants to resolve the value returned by return p;, it finds that this p variable is the same as the p variable from var foo = function(p) {. In other words, there is no direct link between the p from return p; and the variable prim from var prim = 1;. We see this is true because assigning a new value to prim does not cause the value returned by foo() to change.

Example 2

Now let’s have a look at what happens when we make a small change to the previous code sample by adding the line p = 2; to it.

var prim = 1;

var foo = function(p) {
  var f = function() {
    return p;
  }
  p = 2;
  return f;
}(prim);

foo();    // returns 2
prim = 3;
foo();    // returns 2

The code above is interesting in that it shows that the p variable from return p; is indeed the same as the p variable from var foo = function(p) {. Even though the variable f gets created at a time when p is set to 1, the act of setting p to 2 does indeed cause the value returned by foo() to change. This is a great example of a closure keeping a closed variable alive.

Example 3

This sample shows code similar to the first, but this time we made the closure close over the prim variable.

var prim = 1;

var foo = function() {
  return prim;
}

foo();    // returns 1
prim = 3;
foo();    // returns 3

Here too we can make a similar deduction as we did for the previous samples. When the javascript runtime wants to resolve the value returned by return prim;, it finds that this prim variable is the same as the prim variable from var prim = 1;. This explains why setting prim to 3 causes the value returned by foo() to change.

Closures and objects

In this section we’ll see what happens when we take our code samples and change the param from a primitive data type to an object.

Example 1.a

The code below is interesting because in the previous section we saw that a similar example using a primitive param had both calls to foo() return the same value. So what’s different here? Let’s inspect how the runtime resolves the variables involved.

var obj = ["a"];

var foo = function(o) {
  var f = function() {
    return o.length;
  }
  return f;
}(obj);

foo();        // returns 1
obj[1] = "b"; // modifies the object pointed to by the obj var
obj[2] = "c"; // modifies the object pointed to by the obj var
foo();        // returns 3

When the runtime tries to resolve the variable o from return o.length;, it finds that this variable o is the same as the variable o from var foo = function(o) {. We saw this exact same thing in the previous section. Unlike the previous section, the variable o now contains a reference to an array object. This causes our closure to have a direct link to this array object, and thus any changes to it will get reflected in the output of foo(). This explains why the second call to foo() gives a different output than the first.

A good rule of thumb goes like this:

  • if a closed variable contains a value, then the closure links to that variable
  • if a closed variable contains a reference to an object, then the closure links to that object, and will pick up on any changes made to it

Example 1.b

Note that the closure will only pick up on changes made to the particular object that was present when the closure was created. Assigning a new object to the obj variable after the closure was created will have no effect. The code below illustrates this.

var obj = ["a"];

var foo = function(o) {
  var f = function() {
    return o.length;
  }
  return f;
}(obj);

foo();                 // returns 1
obj = ["a", "b", "c"]; // assign a new array object to the obj variable
foo();                 // returns 1

In fact, this code is practically identical to the code from Example 1 of the previous section.

Example 2

We’ll now modify the previous code sample a bit. This time we’ll take a look at what happens when we add the line o[1] = "b";.

var obj = ["a"];

var foo = function(o) {
  var f = function() {
    return o.length;
  }
  o[1] = "b";
  return f;
}(obj);

foo();        // returns 2
obj[1] = "b";
obj[2] = "c";
foo();        // returns 3

Once again, we can start by reasoning about how the runtime resolves the variable o from return o.length;. As you probably know by now, this variable o is the same as the variable o from var foo = function(o) {. And since it contains a reference to an object, any changes to this object will get reflected in the output of foo(). This explains why the first call to foo() now returns 2, whereas previously it was returning 1.

Example 3

If you managed to make it this far, this last bit of code should hold no surprises for you.

var obj = ["a"];

var foo = function() {
  return obj.length;
}

foo();        // returns 1
obj[1] = "b";
obj[2] = "c";
foo();        // returns 3

The runtime will resolve the variable obj from return obj.length; to be the same as the variable obj from var obj = ["a"];. As a result, any changes to the obj variable will have an effect on the output of foo().

Conclusion

Hopefully this post has demystified closures a bit. Time and time again, we’ve shown how following a few simple steps will lead you to understand their behavior. Just keep in mind these rules of thumb and you should be good to go:

  • if a closed variable contains a value, then the closure links to that variable
  • if a closed variable contains a reference to an object, then the closure links to that object, and will pick up on any changes made to it

Ideally, this is going to become my go-to post for providing an introduction to closures. So please let me know any suggestions you might have to improve this post.


Understanding iostat

I’ve been spending a lot of time lately looking at I/O performance and reading up about the iostat command. While this command provides a wealth of I/O performance data, the sheer amount of it all can make it hard to see the forest for the trees. In this post, we’ll talk about interpreting this data. Before we continue, I would first like to thank the authors of the blog posts mentioned below, as each of these has helped me understand iostat and its many complexities just a little bit better.

The iostat command can display both basic and extended metrics. We’ll take a look at the basic metrics first before moving on to extended metrics in the remainder of this post. Note that this post will not go into detail about every last metric. Instead, I have decided to focus on just those metrics that I found to be especially useful, as well as those that seem to be often misunderstood.

Basic iostat metrics

The iostat command lists basic metrics by default. The -m parameter causes metrics to be displayed in megabytes per second instead of blocks or kilobytes per second. Using the 5 parameter causes iostat to recalculate metrics every 5 seconds, thereby making the numbers an average over this interval.

$ iostat -m 5

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.84    0.16    3.91    7.73    0.04   79.33

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
xvdap1           46.34         0.33         1.03    2697023    8471177
xvdb              0.39         0.00         0.01       9496      71349
xvdg             65.98         1.34         0.97   11088426    8010609
xvdf            205.17         1.62         2.68   13341297   22076001
xvdh             51.16         0.64         1.43    5301463   11806257

The tps number here is the number of I/O Operations Per Second (IOPS). Wikipedia has a nice list of average IOPS for different storage devices. This should give you a pretty good idea of the I/O load on your machine.

Some people put a lot of faith in the %iowait metric as an indicator of I/O performance. However, %iowait is first and foremost a CPU metric that measures the percentage of time the CPU is idle while waiting for an I/O operation to complete. This metric is heavily influenced by both your CPU speed and CPU load and is therefore easily misinterpreted.

For example, consider a system with just two processes: the first one heavily I/O intensive, the second one heavily CPU intensive. As the second process will prevent the CPU from going idle, the %iowait metric will stay low despite the first process’s high I/O utilization. Other examples illustrating the deceptive nature of %iowait can be found here (mirror). The only thing %iowait really tells us is that the CPU occasionally idles while there is an outstanding I/O request, and could thus be made to handle more computational work.

Extended iostat metrics

Let’s now take a look at the extended metrics by calling the iostat -x command.

$ iostat -mx 5

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.84    0.16    3.91    7.73    0.04   79.33

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.57     6.38   20.85   25.49     0.33     1.03    59.86     0.27   17.06   13.15   20.25   1.15   5.33
xvdb              0.00     1.93    0.10    0.29     0.00     0.01    51.06     0.00    7.17    0.33    9.66   0.09   0.00
xvdg              0.55     4.69   42.04   23.94     1.34     0.97    71.89     0.44    6.63    6.82    6.28   1.16   7.67
xvdf              7.33    41.35  132.66   72.52     1.62     2.68    42.87     0.49    2.37    2.79    1.59   0.36   7.42
xvdh              0.00     4.54   15.54   35.63     0.64     1.43    83.04     0.00   10.22    8.39   11.02   1.30   6.68

The r/s and w/s numbers show the amount of read and write requests issued to the I/O device per second. These numbers provide a more detailed breakdown of the tps metric we saw earlier, as tps = r/s + w/s.

The avgqu-sz metric is an important value. Its name is rather poorly chosen as it doesn’t actually show the number of operations that are queued but not yet serviced. Instead, it shows the number of operations that were either queued, or being serviced. Ideally, you’d want to have an idea of this value during normal operations for use as a baseline number for when trouble occurs. Single digit numbers with the occasional double digit spike are safe(ish) values. Triple digit numbers generally are not.

The await metric is the average time from when a request was put in the queue to when the request was completed. This is the sum of the time a request was waiting in the queue and the time our storage device was working on servicing the request. This metric is highly dependent on the number of items in the queue. Much like avgqu-sz, you’ll want to have an idea of the value of this metric during normal operations for use as a baseline.

Our next metric is svctm. You’ll find a lot of older blog posts that go into quite some detail about this one. However, man iostat makes it quite clear that this metric has since been deprecated and should no longer be trusted.

Our last metric is %util. Just like svctm, this metric has been touched by the progress of technology as well. The man iostat pages contain the information shown below.

%util

Percentage of elapsed time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100% for devices serving requests serially. But for devices serving requests in parallel, such as RAID arrays and modern SSDs, this number does not reflect their performance limits.

It’s common to assume that the closer a device gets to 100% utilization, the more saturated it becomes. This is true when the storage device corresponds to a single magnetic disk as such a device can only serve one request at a time. However, a single SSD or a RAID array consisting of multiple disks can serve multiple requests simultaneously. For such devices, %util essentially indicates the percentage of time that the device was busy serving one or more requests. Unfortunately, this value tells us absolutely nothing about the maximum number of simultaneous requests such a device can handle. This metric should therefore not be treated as a saturation indicator for either SSDs or RAID arrays.

Conclusion

By now it should be clear that iostat is an incredibly powerful tool, the metrics of which can take some experience to interpret correctly. In a perfect world your machines should regularly be writing these metrics to a monitoring service, so you’ll always have access to good baseline numbers. In an imperfect world, just knowing your baseline IOPS values will already go a long way when trying to diagnose whether a slowdown is I/O related.


Safer bash scripts with 'set -euxo pipefail'

Often times developers go about writing bash scripts the same as writing code in a higher-level language. This is a big mistake as higher-level languages offer safeguards that are not present in bash scripts by default. For example, a Ruby script will throw an error when trying to read from an uninitialized variable, whereas a bash script won’t. In this article, we’ll look at how we can improve on this.

The bash shell comes with several builtin commands for modifying the behavior of the shell itself. We are particularly interested in the set builtin, as this command has several options that will help us write safer scripts. I hope to convince you that it’s a really good idea to add set -euxo pipefail to the beginning of all your future bash scripts.

set -e

The -e option will cause a bash script to exit immediately when a command fails. This is generally a vast improvement upon the default behavior where the script just ignores the failing command and continues with the next line. This option is also smart enough to not react on failing commands that are part of conditional statements. Moreover, you can append a command with || true for those rare cases where you don’t want a failing command to trigger an immediate exit.

Before

#!/bin/bash

# 'foo' is a non-existing command
foo
echo "bar"

# output
# ------
# line 4: foo: command not found
# bar

After

#!/bin/bash
set -e

# 'foo' is a non-existing command
foo
echo "bar"

# output
# ------
# line 5: foo: command not found

Prevent immediate exit

#!/bin/bash
set -e

# 'foo' is a non-existing command
foo || true
echo "bar"

# output
# ------
# line 5: foo: command not found
# bar

set -o pipefail

The bash shell normally only looks at the exit code of the last command of a pipeline. This behavior is not ideal as it causes the -e option to only be able to act on the exit code of a pipeline’s last command. This is where -o pipefail comes in. This particular option sets the exit code of a pipeline to that of the rightmost command to exit with a non-zero status, or zero if all commands of the pipeline exit successfully.

Before

#!/bin/bash
set -e

# 'foo' is a non-existing command
foo | echo "a"
echo "bar"

# output
# ------
# a
# line 5: foo: command not found
# bar

After

#!/bin/bash
set -eo pipefail

# 'foo' is a non-existing command
foo | echo "a"
echo "bar"

# output
# ------
# a
# line 5: foo: command not found

set -u

This option causes the bash shell to treat unset variables as an error and exit immediately. This brings us much closer to the behavior of higher-level languages.

Before

#!/bin/bash
set -eo pipefail

echo $a
echo "bar"

# output
# ------
#
# bar

After

#!/bin/bash
set -euo pipefail

echo $a
echo "bar"

# output
# ------
# line 5: a: unbound variable

set -x

The -x option causes bash to print each command before executing it. This can be of great help when you have to try and debug a bash script failure through its logs. Note that arguments get expanded before a command gets printed. This causes our logs to display the actual argument values at the time of execution!

#!/bin/bash
set -euxo pipefail

a=5
echo $a
echo "bar"

# output
# ------
# + a=5
# + echo 5
# 5
# + echo bar
# bar

And that’s it. I hope this post showed you why using set -euxo pipefail is such a good idea. If you have any other options you want to suggest, then please let me know and I’ll be happy to add them to this list.


An introduction to javascript promises

I recently had to write some javascript code that required the sequential execution of half a dozen asynchronous requests. I figured this was the perfect time to learn a bit more about javascript promises. This post is a recap of what I read in these three amazing write-ups.

What are promises?

A Promise object represents a value that may not be available yet, but will be resolved at some point in future. This abstraction allows you to write asynchronous code in a more synchronous fashion. For example, you can use a Promise object to represent data that will eventually be returned by a call to a remote web service. The then and catch methods can be used to attach callbacks that will be triggered once the data arrives. We’ll take a closer look at these two methods in the next sections. For now, let’s write a simple AJAX request example that prints a random joke.

var promise = new Promise(function(resolve, reject) {
  $.ajax({
    url: "http://api.icndb.com/jokes/random",
    success: function(result) {
      resolve(result["value"]["joke"]);
    }
  });
});

promise.then(function(result) {
  console.log(result);
});

Note how the Promise object is just a wrapper around the AJAX request and how we’ve instructed the success callback to trigger the resolve method. We’ve also attached a callback to our Promise object with the then method. This callback gets triggered when the resolve method gets called. The result variable of this callback will contain the data that was passed to the resolve method.

Before we take a closer look at the resolve method, let’s first investigate the Promise object a bit more. A Promise object can have one of three states:

  • fulfilled - the action relating to the Promise succeeded
  • rejected - the action relating to the Promise failed
  • pending - the Promise hasn’t been fulfilled or rejected yet

A pending Promise object can be fulfilled or rejected by calling resolve or reject on it. Once a Promise is fulfilled or rejected, this state gets permanently associated with it. The state of a fulfilled Promise also includes the data that was passed to resolve, just as the state of a rejected Promise also includes the data that was passed to reject. In summary, we can say that a Promise executes only once and stores the result of its execution.

var promise = new Promise(function(resolve, reject) {
  $.ajax({
    url: "http://api.icndb.com/jokes/random",
    success: function(result) {
      resolve(result["value"]["joke"]);
    }
  });
});

promise.then(function(result) {
  console.log(result);
});

promise.then(function(result) {
  console.log(result);
});

We can test whether a Promise only ever executes once by adding a second callback to the previous example. In this case, we see that only one AJAX request gets made and that the same joke gets printed to the console twice. This clearly shows that our Promise was only executed once.

The then method and chaining

The then method takes two arguments: a mandatory success callback and an optional failure callback. These callbacks are called when the Promise is settled (i.e. either fulfilled or rejected). If the Promise was fulfilled, the success callback will be fired with the data you passed to resolve. If the Promise was rejected, the failure callback will be called with the data you passed to reject. We’ve already covered most of this in the previous section.

The real magic with the then method happens when you start chaining several of them together. This chaining allows you to express your logic in separate stages, each of which can be made responsible for transforming data passed on by the previous stage or for running additional asynchronous requests. The code below shows how data returned by the success callback of the first then method becomes available to the success callback of the second then method.

var promise = new Promise(function(resolve, reject) {
  $.ajax({
    url: "http://api.icndb.com/jokes/random",
    success: function(result) {
      resolve(result["value"]["joke"]);
    }
  });
});

promise.then(function(result) {
  return result;
}).then(function(result) {
  console.log(result);
});

This chaining is possible because the then method returns a new Promise object that will resolve to the return value of the callback. Or in other words, by calling return result; we cause the creation of an anonymous Promise object that looks something like shown below. Notice that this particular anonymous Promise object will resolve immediately, as it does not make any asynchronous requests.

new Promise(function(resolve, reject) {
  resolve(result);
});

Now that we understand that the then method always returns a Promise object, let’s take a look at what happens when we tell the callback of a then method to explicitly return a Promise object.

function getJokePromise() {
  return new Promise(function(resolve, reject) {
    $.ajax({
      url: "http://api.icndb.com/jokes/random",
      success: function(result) {
        resolve(result["value"]["joke"]);
      }
    });
  });
}

getJokePromise().then(function(result) {
  console.log(result);
  return getJokePromise();
}).then(function(result) {
  console.log(result);
});

In this case, we end up sequentially executing two asynchronous requests. When the first Promise is resolved, the first joke is printed and a new Promise object is returned by the then method. This new Promise object then has then called on it. When the Promise succeeds, the then success callback is triggered and the second joke is printed.

The takeaway from all this is that calling return in a then callback will always result in returning a Promise object. It is this that allows for then chaining!

Error handling

We mentioned in the previous section how the then method can take an optional failure callback that gets triggered when reject is called. It is customary to reject with an Error object as they capture a stack trace, thereby facilitating debugging.

var promise = new Promise(function(resolve, reject) {
  $.ajax({
    url: "http://random.url.com",
    success: function(result) {
      resolve(result["value"]["joke"]);
    },
    error: function(jqxhr, textStatus) {
      reject(Error("The AJAX request failed."));
    }
  });
});

promise.then(function(result) {
  console.log(result);
}, function(error) {
  console.log(error);
  console.log(error.stack);
});

Personally, I find this a bit hard to read. Luckily we can use the catch method to make this look a bit nicer. There’s nothing special about the catch method. In fact, it’s just sugar for then(undefined, func), but it definitely makes code easier to read.

var promise = new Promise(function(resolve, reject) {
  $.ajax({
    url: "http://random.url.com",
    success: function(result) {
      resolve(result["value"]["joke"]);
    },
    error: function(jqxhr, textStatus) {
      reject(Error("The AJAX request failed."));
    }
  });
});

promise.then(function(result) {
  console.log(result);
}).then(function(result) {
  console.log("foo"); // gets skipped
}).then(function(result) {
  console.log("bar"); // gets skipped
}).catch(function(error) {
  console.log(error);
  console.log(error.stack);
});

Aside from illustrating improved readability, the above code showcases another aspect of the reject method in that Promise rejections will cause your code to skip forward to the next then method that has a rejection callback (or the next catch method, since this is equivalent). It is this fallthrough behavior that causes this code to not print “foo” or “bar”!

As a final point, it is useful to know that a Promise is implicitly rejected if an error is thrown in its constructor callback. This means it’s useful to do all your Promise related work inside the Promise constructor callback, so errors automatically become rejections.

var promise = new Promise(function(resolve, reject) {
  // JSON.parse throws an error if you feed it some
  // invalid JSON, so this implicitly rejects
  JSON.parse("This ain't JSON");
});

promise.then(function(result) {
  console.log(result);
}).catch(function(error) {
  console.log(error);
});

The above code will cause the Promise to be rejected and an error to be printed because it will fail to parse the invalid JSON string.


Unwanted spot instance termination in multi-AZ ASG

An auto scaling group is an AWS abstraction that facilitates increasing or decreasing the number of EC2 instances within your application’s architecture. Spot instances are unused AWS servers that are auctioned off for little money. The combination of these two allows for large auto scaling groups at low costs. However, you can lose your spot instances at a moment’s notice as soon as someone out there wants to pay more than you do.

Knowing all this, I recently found myself looking into why AWS was terminating several of our spot instances every day. We were bidding 20% over the average price, so it seemed unlikely that this was being caused by a monetary issue. Nevertheless, we kept noticing multiple spot instances disappearing on a daily basis.

It took a while to get to the bottom of things, but it turned out that this particular problem was being caused by an unfortunate combination of:

  • our auto scaling group spanning multiple availability zones
  • our scaling code making calls to TerminateInstanceInAutoScalingGroup

The step-by-step explanation of this issue was as follows:

  • our scaling code was asking AWS to put 10 instances in our auto scaling group
  • AWS obliged and put 5 instances in availability zone A and another 5 in zone B
  • some time later our scaling code would decide that 2 specific instances were no longer needed. A call would be made to TerminateInstanceInAutoScalingGroup to have just these 2 specific instances terminated.
  • if these 2 instances happened to be in the same availability zone, then one zone would now have 3 instances, while the other one would now have 5
  • AWS would detect that both zones were no longer balanced and would initiate a rebalancing action. This rebalancing action would terminate one of the instances in the zone with 5 instances, and spin up another instance in the zone with 3 instances.

So while this action did indeed end up rebalancing the instances across the different availability zones, it also inadvertently ended up terminating a running instance.

The relevant entry from the AWS Auto Scaling docs is shown below.

Instance Distribution and Balance across Multiple Zones

Auto Scaling attempts to distribute instances evenly between the Availability Zones that are enabled for your Auto Scaling group. Auto Scaling attempts to launch new instances in the Availability Zone with the fewest instances. If the attempt fails, however, Auto Scaling will attempt to launch in other zones until it succeeds.

Certain operations and conditions can cause your Auto Scaling group to become unbalanced. Auto Scaling compensates by creating a rebalancing activity under any of the following conditions:

  • You issue a request to change the Availability Zones for your group.
  • You call TerminateInstanceInAutoScalingGroup, which causes the group to become unbalanced.
  • An Availability Zone that previously had insufficient capacity recovers and has additional capacity available.

Auto Scaling always launches new instances before attempting to terminate old ones, so a rebalancing activity will not compromise the performance or availability of your application.

Multi-Zone Instance Counts when Approaching Capacity

Because Auto Scaling always attempts to launch new instances before terminating old ones, being at or near the specified maximum capacity could impede or completely halt rebalancing activities. To avoid this problem, the system can temporarily exceed the specified maximum capacity of a group by a 10 percent margin during a rebalancing activity (or by a 1-instance margin, whichever is greater). The margin is extended only if the group is at or near maximum capacity and needs rebalancing, either as a result of user-requested rezoning or to compensate for zone availability issues. The extension lasts only as long as needed to rebalance the group—typically a few minutes.

I’m not sure about the best way to deal with this behavior. In our case, we just restricted our auto scaling group to one availability zone. This was good enough for us as none of the work done by our spot instances is critical. Going through the docs, it seems one approach might be to disable the AZRebalance process. However, I have not had the chance to try this, so I cannot guarantee a lack of unexpected side effects.


Creating an EC2 Instance in a VPC with the AWS CLI

Setting up an EC2 instance on AWS used to be as straightforward as provisioning a machine and SSHing into it. However, this process has become a bit more complicated now that Amazon VPC has become the standard for managing machines in the cloud.

So what exactly is a Virtual Private Cloud? Amazon defines a VPC as ‘a logically isolated section of the AWS Cloud’. Instances inside a VPC can by default only communicate with other instances in the same VPC and are therefore invisible to the rest of the internet. This means they will not accept SSH connections coming from your computer, nor will they respond to any http requests. In this article we’ll look into changing these default settings into something more befitting a general purpose server.

Setting up your VPC

Start by installing the AWS Command Line Interface on your machine if you haven’t done so already. With this done, we can now create our VPC.

$ vpcId=`aws ec2 create-vpc --cidr-block 10.0.0.0/28 --query 'Vpc.VpcId' --output text`

There are several interesting things here:

  • the --cidr-block parameter specifies a /28 netmask that allows for 16 IP addresses. This is the smallest supported netmask.
  • the create-vpc command returns a JSON string. We can filter out specific fields from this string by using the --query and --output parameters.

The next step is to overwrite the default VPC DNS settings. As mentioned earlier, instances launched inside a VPC are invisible to the rest of the internet by default. AWS therefore does not bother assigning them a public DNS name. Luckily this can be changed easily.

$ aws ec2 modify-vpc-attribute --vpc-id $vpcId --enable-dns-support "{\"Value\":true}"
$ aws ec2 modify-vpc-attribute --vpc-id $vpcId --enable-dns-hostnames "{\"Value\":true}"

Adding an Internet Gateway

Next we need to connect our VPC to the rest of the internet by attaching an internet gateway. Our VPC would be isolated from the internet without this.

$ internetGatewayId=`aws ec2 create-internet-gateway --query 'InternetGateway.InternetGatewayId' --output text`
$ aws ec2 attach-internet-gateway --internet-gateway-id $internetGatewayId --vpc-id $vpcId

Creating a Subnet

A VPC can have multiple subnets. Since our use case only requires one, we can reuse the cidr-block specified during VPC creation so as to get a single subnet that spans the entire VPC address space.

$ subnetId=`aws ec2 create-subnet --vpc-id $vpcId --cidr-block 10.0.0.0/28 --query 'Subnet.SubnetId' --output text`

While this --cidr-block parameter specifies a subnet that can contain 16 IP addresses (10.0.0.1 - 10.0.0.16), AWS will reserve 5 of those for private use. While this doesn’t really have an impact on our use case, it is still good to be aware of such things.

Configuring the Route Table

Each subnet needs to have a route table associated with it to specify the routing of its outbound traffic. By default every subnet inherits the default VPC route table which allows for intra-VPC communication only.

Here we add a route table to our subnet so as to allow traffic not meant for an instance inside the VPC to be routed to the internet through the internet gateway we created earlier.

$ routeTableId=`aws ec2 create-route-table --vpc-id $vpcId --query 'RouteTable.RouteTableId' --output text`
$ aws ec2 associate-route-table --route-table-id $routeTableId --subnet-id $subnetId
$ aws ec2 create-route --route-table-id $routeTableId --destination-cidr-block 0.0.0.0/0 --gateway-id $internetGatewayId

Adding a Security Group

Before we can launch an instance, we first need to create a security group that specifies which ports should allow traffic. For now we’ll just allow anyone to try and make an SSH connection by opening port 22 to any IP address.

$ securityGroupId=`aws ec2 create-security-group --group-name my-security-group --description "my-security-group" --vpc-id $vpcId --query 'GroupId' --output text`
$ aws ec2 authorize-security-group-ingress --group-id $securityGroupId --protocol tcp --port 22 --cidr 0.0.0.0/0

Launching your Instance

All that’s left to do is to create an SSH key pair and then launch an instance secured by this. Let’s generate this key pair and store it locally with the correct permissions.

$ aws ec2 create-key-pair --key-name my-key --query 'KeyMaterial' --output text > ~/.ssh/my-key.pem
$ chmod 400 ~/.ssh/my-key.pem

We can now launch a single t2.micro instance based on the public AWS Ubuntu image.

$ instanceId=`aws ec2 run-instances --image-id ami-9eaa1cf6 --count 1 --instance-type t2.micro --key-name my-key --security-group-ids $securityGroupId --subnet-id $subnetId --associate-public-ip-address --query 'Instances[0].InstanceId' --output text`

After a few minutes your instance should be up and running. You should now be able to obtain the url of your active instance and SSH into it.

$ instanceUrl=`aws ec2 describe-instances --instance-ids $instanceId --query 'Reservations[0].Instances[0].PublicDnsName' --output text`
$ ssh -i ~/.ssh/my-key.pem ubuntu@$instanceUrl

And that’s it. It’s really not all that hard. There’s just an awful lot of concepts that you need to get your head around which can make it a bit daunting at first. Be sure to check out the free Amazon Virtual Private Cloud User Guide if you want to learn more about VPCs.


Finding and deleting old tags in a Github repository

It’s very easy for a Github repository to accumulate lots of tags over time. This onslaught of tags tends to be tolerated until it starts impacting git performance. It is at this point, when you have well in excess of tens of thousands of tags, that a call to action tends to be made. In this article, we’ll look at two approaches to rid yourself of these old tags.

The cut-off tag approach

This approach has us specify a cut-off tag. All tags that can trace their ancestry back to this cut-off tag will be allowed to remain. All others will get deleted. This is especially useful for when you have just merged a new feature, and now you want to delete all tags that were created before this merge. In this scenario, all you have to do is tag the merge commit and then use this as the cut-off tag.

The sequence of commands below deletes all tags that do not have the release-5 tag as an ancestor. Most of these commands are pretty self-explanatory, except for the one in the middle. The remainder of this section will focus on explaining this command.

# fetch all tags from the remote
git fetch

# delete all tags on the remote that do not have the release-5 tag as an ancestor
comm -23 <(git tag | sort) <(git tag --contains release-5 | sort) | xargs git push --delete origin

# delete all local tags that are no longer present on the remote
git fetch --prune origin +refs/tags/*:refs/tags/*

The comm command is used to compare two sorted files line by line. Luckily, we can avoid having to create any actual files by relying on process substitution instead.

comm -23 <(command to act as file 1) <(command to act as file 2) | xargs git push --delete origin

The -23 flag tells comm to suppress any lines that are unique to file 2, as well as any lines that appear in both files. In other words, it causes comm to return just those lines that only appear in file 1. Looking back at our sequence of commands above, it should be clear that this will cause us to obtain all tags that do not have the release-5 tag as an ancestor. Piping this output to xargs git push --delete origin will then remove these tags from Github.

The cut-off date approach

While the cut-off tag approach works great in a lot of scenarios, sometimes you just want to delete all tags that were created before a given cut-off date instead. Unfortunately, git doesn’t have any built-in functionality for accomplishing this. This is why we are going to make use of a Ruby script here.

# CUT_OFF_DATE needs to be of YYYY-MM-DD format
CUT_OFF_DATE = "2015-05-10"

def get_old_tags(cut_off_date)  
  `git log --tags --simplify-by-decoration --pretty="format:%ai %d"`
  .split("\n")
  .each_with_object([]) do |line, old_tags|
    if line.include?("tag: ")
      date = line[0..9]
      tags = line[28..-2].gsub(",", "").concat(" ").scan(/tag: (.*?) /).flatten
      old_tags.concat(tags) if date < cut_off_date
    end
  end
end

# fetch all tags from the remote
`git fetch`

# delete all tags on the remote that were created before the CUT_OFF_DATE
get_old_tags(CUT_OFF_DATE).each_slice(100) do |batch|
  system("git", "push", "--delete", "origin", *batch)
end

# delete all local tags that are no longer present on the remote
`git fetch --prune origin +refs/tags/*:refs/tags/*`

This Ruby script should be pretty straightforward. The get_old_tags method might stand out a bit here. It can look pretty complex, but most of it is just string manipulation to get the date and tags of each line outputted by the git log command, and storing old tags in the old_tags array. Note how we invoke the system method with an array of arguments for those calls that require input. This protects us against possible shell injection.

Be careful, as running this exact script inside your repository will delete all tags created before 2015-05-10. Also, be sure to specify your cut-off date in YYYY-MM-DD format!


Adding a post-execution hook to the db:migrate task

A few days ago we discovered that our MySQL database’s default character set and collation had been changed to the wrong values. Worse yet, it looked like this change had happened many months ago; something which we had been completely unaware of until now! In order to make sure this didn’t happen again, we looked into adding a post-execution hook to the rails db:migrate task.

Our first attempt is shown below. Here, we append a post-execution hook to the existing db:migrate task by creating a new db:migrate task. In rake, when a task is defined twice, the behavior of the new task gets appended to the behavior of the old task. So even though the code below may give the impression of overwriting the rails db:migrate task, we are actually just appending a call to the post_execution_hook method to it.

namespace :db do
  def post_execution_hook
    puts 'This code gets run after the rails db:migrate task.'
    puts 'However, it only runs if the db:migrate task does not throw an exception.'
  end

  task :migrate do
    post_execution_hook
  end
end

However, the above example only runs the appended code if the original db:migrate task does not throw any exceptions. Luckily we can do better than that by taking a slightly different approach. Rather than appending code, we are going to have a go at prepending it instead.

namespace :db do
  def post_execution_hook
    puts 'This code gets run after the rails db:migrate task.'
    puts 'It will ALWAYS run.'
  end

  task :attach_hook do
    at_exit { post_execution_hook }
  end
end

Rake::Task['db:migrate'].enhance(['db:attach_hook'])

Here we make use of the enhance method to add db:attach_hook as a prerequisite task to db:migrate. This means that calling db:migrate will now cause the db:attach_hook task to get executed before db:migrate gets run. The db:attach_hook task creates an at_exit hook that will trigger our post-execution code upon exit of the db:migrate task. Hence, our post-execution hook will now get called even when db:migrate raises an exception!


Installing chromedriver

Some time ago I needed to install chromedriver on a ubuntu machine. While this wasn’t too hard, I was nevertheless surprised by the number of open StackOverflow questions on this topic. So I decided to leave some notes for my future self.

First of all, let’s install chromedriver.

$ LATEST_RELEASE=$(curl http://chromedriver.storage.googleapis.com/LATEST_RELEASE)
$ wget http://chromedriver.storage.googleapis.com/$LATEST_RELEASE/chromedriver_linux64.zip
$ unzip chromedriver_linux64.zip
$ rm chromedriver_linux64.zip
$ sudo mv chromedriver /usr/local/bin

Let’s see what happens when we try and run it.

$ chromedriver

chromedriver: error while loading shared libraries: libgconf-2.so.4:
cannot open shared object file: No such file or directory

That’s a bit unexpected. Luckily we can easily fix this.

$ sudo apt-get install libgconf-2-4

Now that we have a functioning chromedriver, the only thing left to do is to install Chrome. After all, chromedriver can’t function without Chrome.

$ wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
$ sudo sh -c 'echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list'
$ sudo apt-get update
$ sudo apt-get install google-chrome-stable

And that’s it. You should be good to go now.


Programmatically rotating the Android screen

A lot of digital ink has been spilled on this subject, so I figured it might be worth to briefly talk about this. You can either change the orientation through ADB or through an app. While the ADB approach is the easiest, it might not work on all devices or on all Android versions. For example, the dumpsys output of a Kindle Fire is different than that of a Samsung Galaxy S4, so you might need to tweak the grepping of the output.

# get current orientation
adb shell dumpsys input | grep SurfaceOrientation | awk '{print $2}'

# change orientaton to portait
adb shell content insert --uri content://settings/system --bind name:s:accelerometer_rotation --bind value:i:0
adb shell content insert --uri content://settings/system --bind name:s:user_rotation --bind value:i:0

# change orientation to landscape
adb shell content insert --uri content://settings/system --bind name:s:accelerometer_rotation --bind value:i:0
adb shell content insert --uri content://settings/system --bind name:s:user_rotation --bind value:i:1

If you don’t want to use ADB and prefer to change the orientation through an Android app instead, then you can just use these commands.

// get current orientation
final int orientation = myActivity.getResources().getConfiguration().orientation;

// change orientation to portrait
myActivity.setRequestedOrientation(ActivityInfo.SCREEN_ORIENTATION_PORTRAIT);

// change orientation to landscape
myActivity.setRequestedOrientation(ActivityInfo.SCREEN_ORIENTATION_LANDSCAPE);

Programmatically creating Android touch events

Recent versions of Android have the adb shell input touch functionality to simulate touch events on an Android device or simulator. However, older Android versions (like 2.3) do not support this command. Luckily it is possible to recreate this functionality by running adb shell getevent to capture events as they are being generated. These events can then later be replayed using the adb shell sendevent command.

Running adb shell getevent when touching the screen might get you something like shown below. Notice how the output is in hexadecimal.

/dev/input/event7: 0001 014a 00000001
/dev/input/event7: 0003 003a 00000001
/dev/input/event7: 0003 0035 000001ce
/dev/input/event7: 0003 0036 00000382
/dev/input/event7: 0000 0002 00000000
/dev/input/event7: 0000 0000 00000000
/dev/input/event7: 0001 014a 00000000
/dev/input/event7: 0003 003a 00000000
/dev/input/event7: 0003 0035 000001ce
/dev/input/event7: 0003 0036 00000382
/dev/input/event7: 0000 0002 00000000
/dev/input/event7: 0000 0000 00000000

However, the adb shell sendevent command expect all of its input to be in decimal. So if we wanted to replay the above events, we’d need to do something like shown below. Note that 462 and 898 are the x and y coordinates of this particular touch event.

adb shell sendevent /dev/input/event7: 1 330 1
adb shell sendevent /dev/input/event7: 3 58 1
adb shell sendevent /dev/input/event7: 3 53 462
adb shell sendevent /dev/input/event7: 3 54 898
adb shell sendevent /dev/input/event7: 0 2 0
adb shell sendevent /dev/input/event7: 0 0 0
adb shell sendevent /dev/input/event7: 1 330 0
adb shell sendevent /dev/input/event7: 3 58 0
adb shell sendevent /dev/input/event7: 3 53 462
adb shell sendevent /dev/input/event7: 3 54 898
adb shell sendevent /dev/input/event7: 0 2 0
adb shell sendevent /dev/input/event7: 0 0 0

Some lesser known Github API functionality

One of our automation tools occasionally needs to interact with our Github repositories. Unfortunately, the current implementation of this tool leaves something to be desired as it requires cloning these repositories to local disk. Changes against these local repositories are then made on local branches, after which these branches get pushed to Github.

However, in order to save on disk space this tool will only ever create a single local copy of each repository. This makes it unsafe to run multiple instances of this tool as multiple instances simultaneously executing sequences of git commands against the same local repositories might lead to these commands inadvertently getting interpolated, thereby leaving the local repositories in an undefined state.

The solution to this complexity was to completely remove the need for local repositories and instead aim to have everything done through the wonderful Github API. This article is a reminder to myself about some API functionality that I found while looking into this.

Checking if a branch contains a commit

While the Github API does not have an explicit call to check whether a given commit is included in a branch, we can nevertheless use the compare call for just this purpose. This call takes two commits as input and returns a large JSON response of comparison data. We can use the status field of the response to ascertain if a given commit is behind or identical to the HEAD commit of a branch. If so, then the branch contains that commit.

We can use the Ruby octokit gem to implement this as follows.

require 'octokit'

class GithubClient < Octokit::Client
  def branch_contains_sha?(repo, branch, sha)
    ['behind', 'identical'].include?(compare(repo, branch, sha).status)
  end
end

Creating a remote branch from a remote commit

Sometimes you’ll want to create a remote branch by branching from a remote commit. We can use the create_reference call to accomplish this. Note that the ref parameter of this call needs to be set to refs/heads/#{branch} when creating a remote branch.

require 'octokit'

class GithubClient < Octokit::Client
  def create_branch_from_sha(repo, branch, sha)
    # create_ref internally transforms "heads/#{branch}" into "refs/heads/#{branch}"
    # as mentioned above, this is required by the Github API
    create_ref(repo, "heads/#{branch}", sha)
  end
end

Setting the HEAD of a remote branch to a specific remote commit

You can even forcefully set the HEAD of a remote branch to a specific remote commit by using the update_reference call. As mentioned earlier, the ref parameter needs to be set to refs/heads/#{branch}. Be careful when using this functionality though as it essentially allows you to overwrite the history of a remote branch!

require 'octokit'

class GithubClient < Octokit::Client
  def update_branch_to_sha(repo, branch, sha, force = true)
    # update_ref internally transforms "heads/#{branch}" into "refs/heads/#{branch}"
    # as mentioned earlier, this is required by the Github API
    update_ref(repo, "heads/#{branch}", sha, force)
  end
end

The amazing bitwise XOR operator

One of my colleagues recently mentioned this interview question to me.

Imagine there is an array which contains 2n+1 elements, n of which have exactly one duplicate. Can you find the one unique element in this array?

This seemed simple enough and I quickly came up with the Ruby solution below.

> array = [3, 5, 4, 5, 3]
# => [3, 5, 4, 5, 3]
> count = array.each_with_object(Hash.new(0)) { |number, hash| hash[number] += 1 }
# => {3=>2, 5=>2, 4=>1}
> count.key(1)
# => 4

I thought that would be the end of it, but instead I was asked if I could see a way to solve the problem in a significantly more performant way using the XOR operator.

XOR characteristics

In order to solve this problem with the XOR operator, we first need to understand some of its characteristics. This operator obeys the following rules:

  • commutativity: A^B=B^A
  • associativity: (A^B)^C=A^(B^C)
  • the identity element is 0: A^0=A
  • each element is its own inverse: A^A=0

Now imagine an array with the elements [3, 5, 4, 5, 3]. Using the above rules, we can show that XORing all these elements will leave us with the array’s unique element.

accum = 3 ^ 5 ^ 4 ^ 5 ^ 3
accum = 0 ^ 3 ^ 5 ^ 4 ^ 5 ^ 3    # 0 is the identity element
accum = 0 ^ 3 ^ 3 ^ 4 ^ 5 ^ 5    # commutativity and associativity rules
accum = 0 ^ 0 ^ 4 ^ 0            # A^A = 0
accum = 4                        # 0 is the identity element

Putting this approach in code would give us something like this.

> array = [3, 5, 4, 5, 3]
# => [3, 5, 4, 5, 3]
> accum = 0
# => 0
> array.each { |number| accum = accum ^ number }
# => [3, 5, 4, 5, 3]
> accum
# => 4

Benchmarks

Let’s use Ruby’s Benchmark module to do a comparison of both approaches.

require 'benchmark'

array = [-1]
1000000.times do |t|
  array << t
  array << t
end

Benchmark.measure do
  count = array.each_with_object(Hash.new(0)) { |number, hash| hash[number] += 1 }
  count.key(1)
end
# => #<Benchmark::Tms:0x007f83fa0279e0 @label="", @real=0.83534, @cstime=0.0, @cutime=0.0, @stime=0.010000000000000009, @utime=0.8300000000000005, @total=0.8400000000000005>

Benchmark.measure do
  accum = 0
  array.each { |number| accum = accum ^ number }
  accum
end
# => #<Benchmark::Tms:0x007f83fa240ba0 @label="", @real=0.136726, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.13999999999999968, @total=0.13999999999999968>

So there you have it. Given an array that contains two million elements, the XOR operator approach turns out to be more than 6 times faster than utilizing a hashmap. That’s quite a nice performance improvement!


A visual explanation of SQL joins

I admit that I find myself going to this article every time I need to write some joins. Hopefully putting it here will save me from always having to google it.


Check the order of your rescue_from handlers!

Our rescue_from handlers used to be defined like shown below. This might look okay to you. At first glance everything looks fine, right?

class WidgetsController < ActionController::Base
  rescue_from ActionController::RoutingError, :with => :render_404
  rescue_from Exception,                      :with => :render_500
end

Turns out it’s not okay at all. Handlers are searched from bottom to top. This means that they should always be defined in order of most generic to most specific. Or in other words, the above code is exactly the wrong thing to do. Instead, we need to write our handlers like shown here.

class WidgetsController < ActionController::Base
  rescue_from Exception,                      :with => :render_500
  rescue_from ActionController::RoutingError, :with => :render_404
end

The javascript event loop

Sometimes you come across an article that is so well written you can’t do anything but link to it. So if you’ve ever wondered why the javascript runtime is so good at asynchronous operations, then you should definitely give this article a read.

Some snippets:

JavaScript runtimes contain a message queue which stores a list of messages to be processed and their associated callback functions. These messages are queued in response to external events (such as a mouse being clicked or receiving the response to an HTTP request) given a callback function has been provided. If, for example a user were to click a button and no callback function was provided – no message would have been enqueued.

In a loop, the queue is polled for the next message (each poll referred to as a “tick”) and when a message is encountered, the callback for that message is executed.

The calling of this callback function serves as the initial frame in the call stack, and due to JavaScript being single-threaded, further message polling and processing is halted pending the return of all calls on the stack.

As well as:

Using Web Workers enables you to offload an expensive operation to a separate thread of execution, freeing up the main thread to do other things. The worker includes a separate message queue, event loop, and memory space independent from the original thread that instantiated it. Communication between the worker and the main thread is done via message passing, which looks very much like the traditional, evented code-examples we’ve already seen.


Bug hunting with git bisect

Today I was looking into what I thought was going to be a simple bug. The problem seemed straightforward enough, so I did a quick grep of the codebase, found three pieces of code that looked like likely culprits, made some modifications, triggered the bug, and found that absolutely nothing had changed. Half an hour and a lot of additional digging later I was stumped. I had no idea what was going on.

It was at this point that I remembered git bisect. This git command asks you to specify two commits: one where things are working, and another one where things are broken. It then does a binary search across the range of commits in between these two. Each search step asks you whether the current commit contains broken code or not, after which it automatically selects the next commit for you. There’s a great tutorial over here.

$ git bisect start
$ git bisect good rj6y4j3
$ git bisect bad 2q7f529

It took me all of five minutes to discover the source of the bug this way. I can safely say that it would have taken me ages to track down this particular bit of offending code as it was located in a custom bug fix for a popular third party library (I’m looking at you Sentry).


Why is MySQL converting my NULLs to blanks?

A while ago I ran into an issue where some records were showing a blank value in a given column. This was a bit weird as a blank value had never been written to that column. After a bit of searching we found that we had a bug that had inadvertently been writing the occasional NULL value to that particular column though. So how did those NULLs get turned into blanks?

It turns out that MySQL can operate in different server modes. You can check your server mode by running one of the two commands below. Note that your server mode will be blank by default.

SHOW GLOBAL VARIABLES where Variable_name = 'sql_mode';
SHOW SESSION VARIABLES where Variable_name = 'sql_mode'

Now that we know about server modes we can talk about data type defaults. Basically, each MySQL column has an implicit default value assigned to it. Under certain circumstances this default value might be used instead of the value you were expecting.

As of MySQL 5.0.2, if a column definition includes no explicit DEFAULT value, MySQL determines the default value as follows:

If the column can take NULL as a value, the column is defined with an explicit DEFAULT NULL clause. This is the same as before 5.0.2.

If the column cannot take NULL as the value, MySQL defines the column with no explicit DEFAULT clause. Exception: If the column is defined as part of a PRIMARY KEY but not explicitly as NOT NULL, MySQL creates it as a NOT NULL column (because PRIMARY KEY columns must be NOT NULL), but also assigns it a DEFAULT clause using the implicit default value. To prevent this, include an explicit NOT NULL in the definition of any PRIMARY KEY column.

For data entry into a NOT NULL column that has no explicit DEFAULT clause, if an INSERT or REPLACE statement includes no value for the column, or an UPDATE statement sets the column to NULL, MySQL handles the column according to the SQL mode in effect at the time:

  • If strict SQL mode is enabled, an error occurs for transactional tables and the statement is rolled back. For nontransactional tables, an error occurs, but if this happens for the second or subsequent row of a multiple-row statement, the preceding rows will have been inserted.
  • If strict mode is not enabled, MySQL sets the column to the implicit default value for the column data type.

We found that our code was sometimes writing NULLs to a NOT NULL column on a server that was not running in strict mode. This in turn caused our NULLs to silently get changed to blanks as this was the column default value. Mystery solved.


Using environment variables in migrations

Recently we had to run a migration that was so slow we couldn’t afford the downtime it would cause. In order to get around this, it was decided to put two code paths in the migration: one that was slow and thorough, and one that was quick but didn’t perform any safety checks.

The first path would be run on a recent database dump, whereas the latter would be executed directly on the live database once the first had finished without error. This was a lot less crazy than it might sound as the particular table under modification had very infrequent changes.

It was decided to use environment variables to allow for easy switching between code paths. This is what the code ended up looking like.

class MyDangerousMigration < ActiveRecord::Migration
  def change
    if ENV['skip_checks'] == 'true'
      # code without safety checks
    else
      # code with safety checks
    end
  end
end

This could then be run like so.

skip_checks=true bundle exec rake db:migrate

Getting connection information with lsof

The lsof command is one of those super useful commands for figuring out what connections are taking place on your machine. While the lsof command technically just lists open files, just about everything in linux (even sockets) is a file!

Some useful commands:

List all network connections

$ lsof -i

COMMAND     PID     USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
Spotify   36908 vaneyckt   53u  IPv4 0x2097c8deb175c0dd      0t0  TCP localhost:4381 (LISTEN)
Spotify   36908 vaneyckt   54u  IPv4 0x2097c8deab18027d      0t0  TCP localhost:4371 (LISTEN)
Spotify   36908 vaneyckt   71u  IPv4 0x2097c8deba747c1d      0t0  UDP *:57621
Spotify   36908 vaneyckt   72u  IPv4 0x2097c8deb18ef4cf      0t0  TCP *:57621 (LISTEN)
Spotify   36908 vaneyckt   77u  IPv4 0x2097c8deb993b255      0t0  UDP ip-192-168-0-101.ec2.internal:61009
Spotify   36908 vaneyckt   90u  IPv4 0x2097c8dea8c4a66d      0t0  TCP ip-192-168-0-101.ec2.internal:62432->lon3-accesspoint-a57.lon3.spotify.com:https (ESTABLISHED)
Spotify   36908 vaneyckt   91u  IPv4 0x2097c8de8d029f2d      0t0  UDP ip-192-168-0-101.ec2.internal:52706

List all network connections on port 4381

$ lsof -i :4381

COMMAND   PID     USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
Spotify 36908 vaneyckt   53u  IPv4 0x2097c8deb175c0dd      0t0  TCP localhost:4381 (LISTEN)

Find ports listening for connections

$ lsof -i | grep -i LISTEN

Spotify   36908 vaneyckt   53u  IPv4 0x2097c8deb175c0dd      0t0  TCP localhost:4381 (LISTEN)
Spotify   36908 vaneyckt   54u  IPv4 0x2097c8deab18027d      0t0  TCP localhost:4371 (LISTEN)
Spotify   36908 vaneyckt   72u  IPv4 0x2097c8deb18ef4cf      0t0  TCP *:57621 (LISTEN)

Find established connections

$ lsof -i | grep -i ESTABLISHED

Spotify   36908 vaneyckt   90u  IPv4 0x2097c8dea8c4a66d      0t0  TCP ip-192-168-0-101.ec2.internal:62432->lon3-accesspoint-a57.lon3.spotify.com:https (ESTABLISHED)

Show all files opened by a given process

$ lsof -p 36908

COMMAND   PID     USER   FD     TYPE             DEVICE  SIZE/OFF     NODE NAME
Spotify 36908 vaneyckt   90u    IPv4 0x2097c8dea8c4a66d       0t0      TCP ip-192-168-0-101.ec2.internal:62432->lon3-accesspoint-a57.lon3.spotify.com:https (ESTABLISHED)
Spotify 36908 vaneyckt   91u    IPv4 0x2097c8de8d029f2d       0t0      UDP ip-192-168-0-101.ec2.internal:52706
Spotify 36908 vaneyckt   92u     REG                1,4   9389456 59387889 /Users/vaneyckt/Library/Caches/com.spotify.client/Data/4a/4a5a23cf1e9dc4210b3c801d57a899098dc12418.file
Spotify 36908 vaneyckt   93u     REG                1,4   8658944 58471210 /private/var/folders/xv/fjmwzr9x5mq_s7dchjq87hjm0000gn/T/.org.chromium.Chromium.6b0Vzp
Spotify 36908 vaneyckt   94u     REG                1,4    524656 54784499 /Users/vaneyckt/Library/Caches/com.spotify.client/Browser/index
Spotify 36908 vaneyckt   95u     REG                1,4     81920 54784500 /Users/vaneyckt/Library/Caches/com.spotify.client/Browser/data_0
Spotify 36908 vaneyckt   96u     REG                1,4    532480 54784501 /Users/vaneyckt/Library/Caches/com.spotify.client/Browser/data_1
Spotify 36908 vaneyckt   97u     REG                1,4   2105344 54784502 /Users/vaneyckt/Library/Caches/com.spotify.client/Browser/data_2
Spotify 36908 vaneyckt   98u     REG                1,4  12591104 54784503 /Users/vaneyckt/Library/Caches/com.spotify.client/Browser/data_3
Spotify 36908 vaneyckt   99r     REG                1,4    144580    28952 /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/Resources/HIToolbox.rsrc

Carefully converting your MySQL database to utf8

Converting all the data in your database can be a nail-biting experience. As you can see from the code below we are doing our best to be super careful. We convert each table separately and before each conversion we store the table’s column types and an MD5 hash of every row in the table (we were lucky enough to not have enormous tables). After converting the table we check that no column types or rows were changed. It goes without saying that we do a trial run on a database dump first.

require 'set'
require 'digest/md5'

CHARACTER_SET = 'utf8'
COLLATION = 'utf8_unicode_ci'

class ConvertAllTablesToUtf8 < ActiveRecord::Migration
  def up
    ActiveRecord::Base.connection.tables.each do |table|
      ActiveRecord::Base.transaction do
        ActiveRecord::Base.connection.execute("LOCK TABLES #{table} WRITE")
          say "starting work on table: #{table}"

          model = table.classify.constantize
          say "associated model: #{model}"

          say 'storing column types information before converting table to unicode'
          column_types_before = model.columns_hash.each_with_object({}) do |(column_name, column_info), column_types_before|
            column_types_before[column_name] = [column_info.sql_type, column_info.type]
          end

          say 'storing set of table data hashes before converting table to unicode'
          table_data_before = Set.new
          model.find_each do |datum|
            table_data_before << Digest::MD5.hexdigest(datum.inspect)
          end

          say 'converting table to unicode'
          execute("ALTER TABLE #{table} CONVERT TO CHARACTER SET #{CHARACTER_SET} COLLATE #{COLLATION}")
          execute("ALTER TABLE #{table} DEFAULT CHARACTER SET #{CHARACTER_SET} COLLATE #{COLLATION}")

          say 'getting column types information after conversion to unicode'
          column_types_after = model.columns_hash.each_with_object({}) do |(column_name, column_info), column_types_after|
            column_types_after[column_name] = [column_info.sql_type, column_info.type]
          end

          say 'getting set of table data hashes after conversion to unicode'
          table_data_after = Set.new
          model.find_each do |datum|
            table_data_after << Digest::MD5.hexdigest(datum.inspect)
          end

          say "checking that column types haven't changed"
          if column_types_before != column_types_after
            raise "Column types of the #{table} table have changed"
          end

          say "checking that data hasn't changed"
          if table_data_before != table_data_after
            raise "Data in the #{table} table has changed"
          end
        ActiveRecord::Base.connection.execute('UNLOCK TABLES')
      end
    end

    execute("ALTER DATABASE #{ActiveRecord::Base.connection.current_database} DEFAULT CHARACTER SET #{CHARACTER_SET} COLLATE #{COLLATION}")
  end

  def down
    raise ActiveRecord::IrreversibleMigration
  end
end

Note how we lock each table before converting it. If we didn’t lock it then new data could be written to the table while we are busy storing MD5 hashes of the rows in preparation for the actual conversion. This, in turn, would cause our migration to complain that new data was present after the conversion had taken place.

We also wrap each table conversion inside a transaction. I’ve talked before about how converting a table will cause an implicit commit, meaning that a rollback won’t undo any of the changes made by the conversion. So why have a transaction here then? Imagine that an exception were to be raised during our migration. In that case we want to ensure our table lock gets dropped as soon as possible. The transaction guarantees this behavior.

Also, if we weren’t so paranoid about checking the before and after data as part of our migration, we could simplify this code quite a bit.

CHARACTER_SET = 'utf8'
COLLATION = 'utf8_unicode_ci'

class ConvertAllTablesToUtf8 < ActiveRecord::Migration
  def up
    ActiveRecord::Base.connection.tables.each do |table|
      say 'converting table to unicode'
      execute("ALTER TABLE #{table} CONVERT TO CHARACTER SET #{CHARACTER_SET} COLLATE #{COLLATION}")
      execute("ALTER TABLE #{table} DEFAULT CHARACTER SET #{CHARACTER_SET} COLLATE #{COLLATION}")
    end

    execute("ALTER DATABASE #{ActiveRecord::Base.connection.current_database} DEFAULT CHARACTER SET #{CHARACTER_SET} COLLATE #{COLLATION}")
  end

  def down
    raise ActiveRecord::IrreversibleMigration
  end
end

Notice how we can drop the lock as the ALTER TABLE command will prevent all writes to the table while simultaneously allowing all reads.

In most cases, ALTER TABLE makes a temporary copy of the original table. MySQL waits for other operations that are modifying the table, then proceeds. It incorporates the alteration into the copy, deletes the original table, and renames the new one. While ALTER TABLE is executing, the original table is readable by other sessions. Updates and writes to the table that begin after the ALTER TABLE operation begins are stalled until the new table is ready, then are automatically redirected to the new table without any failed updates. The temporary copy of the original table is created in the database directory of the new table. This can differ from the database directory of the original table for ALTER TABLE operations that rename the table to a different database.

Furthermore, since we now no longer have a lock on our table we can also drop the transaction. This gives us the much-simplified code shown above.


Character set vs collation

There’s a surprising amount of confusion about the difference between these two terms. The best explanation I’ve found is here.

A character set is a subset of all written glyphs. A character encoding specifies how those characters are mapped to numeric values. Some character encodings, like UTF-8 and UTF-16, can encode any character in the Universal Character Set. Others, like US-ASCII or ISO-8859-1 can only encode a small subset, since they use 7 and 8 bits per character, respectively. Because many standards specify both a character set and a character encoding, the term “character set” is often substituted freely for “character encoding”.

A collation comprises rules that specify how characters can be compared for sorting. Collations rules can be locale-specific: the proper order of two characters varies from language to language.

Choosing a character set and collation comes down to whether your application is internationalized or not. If not, what locale are you targeting?

In order to choose what character set you want to support, you have to consider your application. If you are storing user-supplied input, it might be hard to foresee all the locales in which your software will eventually be used. To support them all, it might be best to support the UCS (Unicode) from the start. However, there is a cost to this; many western European characters will now require two bytes of storage per character instead of one.

Choosing the right collation can help performance if your database uses the collation to create an index, and later uses that index to provide sorted results. However, since collation rules are often locale-specific, that index will be worthless if you need to sort results according to the rules of another locale.

The only thing I’d like to add is that some collations are more cpu intensive than others. For example, utf8_general_ci treats À, Á, and Å as being equal to A when doing comparisons. This is in contrast to utf8_unicode_ci which uses about 10% more cpu, but differentiates between these characters.


MySQL write locks also prevent reads

Locking a table with

table_name = 'widgets'
ActiveRecord::Base.connection.execute("LOCK TABLES #{table_name} WRITE")

ensures that only the current connection can access that table. Other connections cannot even read from this table while it is locked!


Rails migrations and the dangers of implicit commits

I recently came across the migration below. At first sight it looks like everything is okay, but there is actually a very dangerous assumption being made here.

# migration to convert table to utf8
class ConvertWidgetsTableToUtf8Unicode < ActiveRecord::Migration
  def up
    ActiveRecord::Base.transaction do
      table_name = 'widgets'
      say "converting #{table_name} table to utf8_unicode_ci"

      execute("ALTER TABLE #{table_name} CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci")
      execute("ALTER TABLE #{table_name} DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci")
    end
  end
end

Notice how the utf8 conversion code is wrapped inside a transaction. The assumption here is that if something goes wrong the transaction will trigger a rollback. However, an ALTER TABLE command in MySQL causes an implicit commit. This means that the rollback will not undo any changes introduced by the ALTER TABLE command!


Iterating over a hash containing arrays

Last week I was implementing some auditing functionality in a rails app. At some point I was writing a page that would display how the attributes of a given ActiveRecord object had been changed. One of my colleagues spotted this and pointed out the following neat bit of syntactic sugar in Ruby.

changes = {:attribute_a => [1, 2], :attribute_b => [3, 4]}

changes.each do |attribute, (before, after)|
  puts "#{attribute}: #{before} - #{after}"
end

I later learned you can even do things like this.

data = {:foo => [[1, 2], 3]}

data.each do |key, ((a, b), c)|
  puts "#{key}: #{a} - #{b} - #{c}"
end

URI.js and URL manipulation in rails

Manipulating urls in javascript often ends up being an exercise in string interpolation. This rarely produces good looking code. Recently we’ve started enforcing the use of the URI.js library to combat this.

Our new approach has us embed any necessary urls in hidden input fields on the web page in question. Rather than hardcoding these urls, we use the named route functionality offered by rails as this provides more flexibility. When the page gets rendered, these named routes are converted to actual urls through ERB templating. The embedded urls can then be fetched by javascript code and manipulated with URI.js.

It’s no silver bullet, but the resulting code is a lot more readable.


The css !important keyword

Today I learned about the css !important keyword. I was trying to change the way code snippets (gists) were being displayed on a site, but found my css rules being ignored.

As it turned out, the javascript snippets used for embedding gists were adding an additional css stylesheet to the page. Since this stylesheet was getting added after my own stylesheet, its rules had priority over my own. The solution was to add !important to my own rules.

.gist-data {
  border-bottom: 1px !important;
}

Finding models from strings with rails

Imagine you have a Widget model that stores data in a table ‘widgets’. At some point in your rails app you find yourself being given a string ‘Widget’ and are asked to find the Widget model. This can be done like shown here.

str = 'Widget'
model = str.constantize

However, things get a bit harder when you have multiple Widget model subclasses (Widget::A, Widget::B), all of which are stored in the widgets table. This time around you’re given the string ‘Widget::A’ and are asked to get the Widget model.

In order to solve this we’ll need to ask the Widget::A model to give us its table name. If you’re following rails conventions you can then in turn use the table name to get the model you need.

str = 'Widget'
model = str.constantize.table_name.classify.constantize

Note that the above will only work if you’ve followed rails naming conventions though :).


Retrieving data in a time range with rails

I’m writing this mostly as a reminder to myself, since I keep forgetting this :)

Instead of:

widgets = Widget.where("? <= created_at AND created_at <= ?", time_from, time_to)

do this:

widgets = Widget.where(:created_at => time_from .. time_to)

GET vs POST

Today I was looking into why a particular GET request was failing on IE. As it turned out this was due to IE not appreciating long query strings. While going through our nginx logs, we also found nginx had a default query string limit that was being hit sporadically by some other customers as well. The solution in both cases was to move the affected calls from GET to POST.

The above problem prompted me to take a closer look at the differences between GET and POST requests. You probably use these all the time, but do you know how each of them functions?

GET requests

  • can be bookmarked
  • can be cached for faster response time on subsequent request
  • request is stored in browser history
  • uses query strings to send data. There is a limit to the allowable length of a query string.
  • have their url and query strings stored in plaintext in server logs. This is why you should never send passwords over GET requests!
  • use these for actions that retrieve data. For example, you don’t want to use GET requests for posting comments on your blog. Otherwise an attacker could copy a url that posts a specific comment and put it on twitter. Every time someone were to click this link, a comment would now be posted on your blog.

POST requests

  • cannot be bookmarked
  • cannot be cached
  • request will not be stored in browser history
  • uses POST body to send data. There is no limit to the amount of data sent due to the multipart content-type spreading your data across multiple messages when necessary.
  • have their url stored in plaintext in server logs. The data itself will not be logged though.
  • use these for actions that alter data

The dig command

Today I learned of the existence of the dig command. A very useful little tool for DNS lookups. Here’s an example of it in action.

$ dig www.google.com

; <<>> DiG 9.8.3-P1 <<>> www.google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4868
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.google.com.			IN	A

;; ANSWER SECTION:
www.google.com.		72	IN	A	74.125.24.105
www.google.com.		72	IN	A	74.125.24.103
www.google.com.		72	IN	A	74.125.24.104
www.google.com.		72	IN	A	74.125.24.99
www.google.com.		72	IN	A	74.125.24.147
www.google.com.		72	IN	A	74.125.24.106

;; Query time: 11 msec
;; SERVER: 192.168.0.1#53(192.168.0.1)
;; WHEN: Sat Aug 29 13:38:48 2015
;; MSG SIZE  rcvd: 128

Profiling rails assets precompilation

Assets precompilation on rails can take a fair bit of time. This is especially annoying in scenarios where you want to deploy your app multiple times a day. Let’s see if we can come up with a way to actually figure out where all this time is being spent. Also, while I will be focusing on rails 3.2 in this post, the general principle should be easy enough to apply to other rails versions.

Our first call of action is finding the assets precompilation logic. A bit of digging will turn up the assets.rake file for rails 3.2. The relevant code starts on lines 59-67 and from there on out invokes methods throughout the entire file.

# lines 59-67 of assets.rake
task :all do
  Rake::Task["assets:precompile:primary"].invoke
  # We need to reinvoke in order to run the secondary digestless
  # asset compilation run - a fresh Sprockets environment is
  # required in order to compile digestless assets as the
  # environment has already cached the assets on the primary
  # run.
  if Rails.application.config.assets.digest
    ruby_rake_task("assets:precompile:nondigest", false)
  end
end

When we follow the calls made by the code above we can see that the actual compilation takes place on lines 50-56 of assets.rake and is done by the compile method of the Sprockets::StaticCompiler class.

# compile method of Sprockets::StaticCompiler class
def compile
  manifest = {}
  env.each_logical_path(paths) do |logical_path|
    if asset = env.find_asset(logical_path)
      digest_path = write_asset(asset)
      manifest[asset.logical_path] = digest_path
      manifest[aliased_path_for(asset.logical_path)] = digest_path
    end
  end
  write_manifest(manifest) if @manifest
end

Now that we know which code does the compiling, we can think of two ways to add some profiling to this. We could checkout the rails repo from Github, modify it locally and point our Gemfile to our modified local version of rails. Or, we could create a new rake task and monkey patch the compile method of the Sprockets::StaticCompiler class. We’ll go with the second option here as it is the more straightforward to implement.

We’ll create a new rake file in the /lib/tasks folder of our rails app and name it profile_assets_precompilation.rake. We then copy the contents of assets.rake into it, and wrap this code inside a new ‘profile’ namespace so as to avoid conflicts. At the top of this file we’ll also add our monkey patched compile method so as to make it output profiling info. The resulting file should look like shown below.

namespace :profile do
  # monkey patch the compile method to output compilation times
  module Sprockets
    class StaticCompiler
      def compile
        manifest = {}
        env.each_logical_path(paths) do |logical_path|
          start_time = Time.now.to_f

          if asset = env.find_asset(logical_path)
            digest_path = write_asset(asset)
            manifest[asset.logical_path] = digest_path
            manifest[aliased_path_for(asset.logical_path)] = digest_path
          end

          # our profiling code
          duration = Time.now.to_f - start_time
          puts "#{logical_path} - #{duration.round(3)} seconds"
        end
        write_manifest(manifest) if @manifest
      end
    end
  end

  # contents of assets.rake
  namespace :assets do
    def ruby_rake_task(task, fork = true)
      env    = ENV['RAILS_ENV'] || 'production'
      groups = ENV['RAILS_GROUPS'] || 'assets'
      args   = [$0, task,"RAILS_ENV=#{env}","RAILS_GROUPS=#{groups}"]
      args << "--trace" if Rake.application.options.trace
      if $0 =~ /rake\.bat\Z/i
        Kernel.exec $0, *args
      else
        fork ? ruby(*args) : Kernel.exec(FileUtils::RUBY, *args)
      end
    end

    # We are currently running with no explicit bundler group
    # and/or no explicit environment - we have to reinvoke rake to
    # execute this task.
    def invoke_or_reboot_rake_task(task)
      if ENV['RAILS_GROUPS'].to_s.empty? || ENV['RAILS_ENV'].to_s.empty?
        ruby_rake_task task
      else
        Rake::Task[task].invoke
      end
    end

    desc "Compile all the assets named in config.assets.precompile"
    task :precompile do
      invoke_or_reboot_rake_task "assets:precompile:all"
    end

    namespace :precompile do
      def internal_precompile(digest=nil)
        unless Rails.application.config.assets.enabled
          warn "Cannot precompile assets if sprockets is disabled. Please set config.assets.enabled to true"
          exit
        end

        # Ensure that action view is loaded and the appropriate
        # sprockets hooks get executed
        _ = ActionView::Base

        config = Rails.application.config
        config.assets.compile = true
        config.assets.digest  = digest unless digest.nil?
        config.assets.digests = {}

        env      = Rails.application.assets
        target   = File.join(Rails.public_path, config.assets.prefix)
        compiler = Sprockets::StaticCompiler.new(env,
                                                 target,
                                                 config.assets.precompile,
                                                 :manifest_path => config.assets.manifest,
                                                 :digest => config.assets.digest,
                                                 :manifest => digest.nil?)
        compiler.compile
      end

      task :all do
        Rake::Task["assets:precompile:primary"].invoke
        # We need to reinvoke in order to run the secondary digestless
        # asset compilation run - a fresh Sprockets environment is
        # required in order to compile digestless assets as the
        # environment has already cached the assets on the primary
        # run.
        ruby_rake_task("assets:precompile:nondigest", false) if Rails.application.config.assets.digest
      end

      task :primary => ["assets:environment", "tmp:cache:clear"] do
        internal_precompile
      end

      task :nondigest => ["assets:environment", "tmp:cache:clear"] do
        internal_precompile(false)
      end
    end

    desc "Remove compiled assets"
    task :clean do
      invoke_or_reboot_rake_task "assets:clean:all"
    end

    namespace :clean do
      task :all => ["assets:environment", "tmp:cache:clear"] do
        config = Rails.application.config
        public_asset_path = File.join(Rails.public_path, config.assets.prefix)
        rm_rf public_asset_path, :secure => true
      end
    end

    task :environment do
      if Rails.application.config.assets.initialize_on_precompile
        Rake::Task["environment"].invoke
      else
        Rails.application.initialize!(:assets)
        Sprockets::Bootstrap.new(Rails.application).run
      end
    end
  end
end

Now we can run bundle exec rake profile:assets:precompile to precompile our assets while outputting profiling info. Hopefully we can now finally figure out why this is always taking so long :).


Regarding if statement scope in Ruby

I recently learned that if statements in Ruby do not introduce scope. This means that you can write code like shown below and it’ll work fine.

# perfectly valid Ruby code
if true
  foo = 5
end

puts foo

At first this seemed a bit weird to me. It wasn’t until I read this that I realized Ruby was even more versatile than I had first thought. As it turns out, it is this somewhat unconventional scoping rule that allows us to conditionally replace methods.

if foo == 5
  def some_method
    # do something
  end
else
  def some_method
    # do something else
  end
end

As well as conditionally modify implementations.

if foo == 5
  class someClass
    # ...
  end
else
  module someModule
    # ...
  end
end

And that’s amazing!


EC2 instance cost comparison

Amazon’s pricing scheme for its ec2 instances never struck me as particularly transparent. Until recently some of my DevOps colleagues even estimated cost by cross-referencing instance details with pricing information. While this approach gives reasonable results for finding the cost of a given instance type, it doesn’t lend itself very well to comparing prices across a range of different types.

When talking to an ex-colleague of mine about the hardships encountered for such a common task, he pointed me to this absolutely brilliant page. It’s so unbelievably simple and well thought-out that I can’t help getting ever so slightly annoyed with whomever is in charge of communicating Amazon’s pricing structure and the subpar job they are doing.