zero-downtime deployments with unicorn and supervisord

In this post I’ll discuss zero-downtime deployments using unicorn and supervisord. There’s a lot more to zero-downtime deployments then just keeping your website available. Listen to Ruby Rogues Ep. 71 or search google for a broader discussion of the problems involved.

When running a web application in production you should strive for 100% reachability. Down-times are normally perceived as errors in your application; and rightfully so. If you deploy often your users might stop using your app because of 502er they encounter.

Since I like to use supervisord in my production setup the most widely used unicorn setup for zero-downtime deployments does not work out of the box. Supervisord requires the unicorn process to not daemonize. Also sending SIGUSR2 to unicorn causes the old master to die. Since supervisord watches the old master this will cause it to consider the application as exited, even tho it’s running with a new process id. Finally Supervisor will try to restart the application, and fail to do so because all sockets are in use by the new unicorn master.

Luckily, there’s an utility called unicornherder. Unicornherder does not daemonize itself and keeps an eye on the unicorn pid file to check if unicorn is still alive. All messages sent to unicornherder are forwarded to the unicorn process. If unicorn quits, unicornherder quits too.

So, in order to use SIGUSR2 and preload_app for zero-downtime deployments we need to install unicornherder.


# assuming you are running Ubuntu:
$ sudo apt-get install python-dev
$ pip install unicornherder
$ which unicornherder # => /usr/local/bin/unicornherder

Unicornherder itself does not require an additional configuration file. All required arguments are passed to the command line.

Next we need to configure supervisord:

Supervisord

Supervisord watches unicornherder, and unicornherder starts unicorn as a daemon. So all we need to do is to properly start unicornherder and make sure it keeps running.

Here’s a sample supervisord configuration file I generated using foreman export


[program:myapp-unicornherder-1]
command=/home/webapp/.rvm/bin/app_bundle exec unicornherder -u unicorn -p tmp/pids/unicorn.pid -- -c config/unicorn.rb
autostart=true
autorestart=true
stopsignal=QUIT
stdout_logfile=/home/webapp/shared/log/unicornherder-1.log
stderr_logfile=/home/webapp/shared/log/unicornherder-1.error.log
user=webapp
directory=/home/webapp/current
environment=RAILS_ENV="production",APP_PATH="/home/webapp/current",SHARED_PATH="/home/webapp/shared",TEMP_PATH="/home/webapp/shared/tmp",PORT="8619"

[group:myapp]
programs=myapp-unicornherder-1

The details:

unicornherder is passed the path to the unicorn pidfile using the -p flag
supervisord will send the QUIT signal to unicornherder if we want to stop unicorn.
unicorn is executed in an RVM managed environment, and I’m using a RVM wrapper to load the correct ruby version and gemset.
basic unicorn configuration settings are exported into the environment

Unicorn

The unicorn configuration follows:


worker_processes ((ENV['RAILS_ENV'] == 'development') ? 2 : 8)

working_directory ENV["APP_PATH"]

listen ENV["PORT"].to_i, :tcp_nopush => true

timeout 30

pid (ENV["TEMP_PATH"] + "/pids/unicorn.pid")

stderr_path ENV["SHARED_PATH"] + "/log/unicorn.stderr.log"
stdout_path ENV["SHARED_PATH"] + "/log/unicorn.stdout.log"

preload_app true

before_fork do |server, worker|
  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.connection.disconnect!
  end

  old_pid = ENV["TEMP_PATH"] + '/pids/unicorn.pid.oldbin'
  if File.exists?(old_pid) && server.pid != old_pid
    begin
      Process.kill("QUIT", File.read(old_pid).to_i)
    rescue Errno::ENOENT, Errno::ESRCH
      # someone else did our job for us
    end
  end
end

after_fork do |server, worker|
  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.establish_connection
  end
end

The important points here is that we close any connections to external resources as the master has no use for them; Also note that we kill the old master as soon as the preloading is done.

Mina

If we deploy using Mina, we can use the following configuration to perform a zero-downtime deploy:


desc "Deploys the current version to the server."
task :deploy => :environment do
  deploy do
    # omitted

    to :launch do
      queue %[kill -s USR2 $(sudo supervisorctl status | grep unicornherder | cut -d' ' -f7 | cut -d',' -f1)]
    end
  end
end

and starting, stopping of unicorn is handled with supervisord:


desc "stop the application"
task :down do
  queue "sudo supervisorctl stop myapp:*"
end

desc "start the application"
task :up do
  queue "sudo supervisorctl start myapp:*"
end

Verify we got a zero-downtime deployment

Now it’s time to verify our setup is actually working.

Running ab -c 2 -n 100 http://www.example.com/ while restarting our application should not result in ANY dropped connections. Note that this largly depends on how long your application needs to start up. We could further amplify the effects by adding fake calls to sleep in our application.rb.

Anyway, here it goes:

With restarts

This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking example.com (be patient).....done


Server Software:        nginx/1.2.4
Server Hostname:        example.com
Server Port:            80

Document Path:          /
Document Length:        22527 bytes

Concurrency Level:      2
Time taken for tests:   10.947 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      2319600 bytes
HTML transferred:       2252700 bytes
Requests per second:    9.13 [#/sec] (mean)
Time per request:       218.949 [ms] (mean)
Time per request:       109.475 [ms] (mean, across all concurrent requests)
Transfer rate:          206.92 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       55   58   1.8     57      69
Processing:   137  160  27.4    145     263
Waiting:       69   82  21.0     74     148
Total:        193  218  27.4    204     320

Percentage of the requests served within a certain time (ms)
  50%    204
  66%    215
  75%    242
  80%    249
  90%    265
  95%    271
  98%    274
  99%    320
 100%    320 (longest request)

Without restarts

This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking example.com (be patient).....done


Server Software:        nginx/1.2.4
Server Hostname:        example.com
Server Port:            80

Document Path:          /
Document Length:        22527 bytes

Concurrency Level:      2
Time taken for tests:   10.584 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      2319600 bytes
HTML transferred:       2252700 bytes
Requests per second:    9.45 [#/sec] (mean)
Time per request:       211.686 [ms] (mean)
Time per request:       105.843 [ms] (mean, across all concurrent requests)
Transfer rate:          214.02 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       55   58   1.5     58      65
Processing:   137  153  18.3    145     207
Waiting:       68   76   5.8     75     102
Total:        195  211  18.4    202     265

Percentage of the requests served within a certain time (ms)
  50%    202
  66%    204
  75%    215
  80%    219
  90%    248
  95%    251
  98%    252
  99%    265
 100%    265 (longest request)

No failed requests. It works! And the response times with multiple restarts are only slightly worse. Great!

I hope this blog post helped clarifing how to use unicorn and supervisord together while using zero-downtime deployments of your app server to keep serving requests.

Wrapping up:

unicorn requires unicornherder for zero-downtime deployments, if you are using supervisord
unicorn spawns a second master when sent SIGUSR2 which means you’ll be running twice as mean workers as you specified during restarts

That’s it! Happy hacking!

November 28, 2012