February 6, 2013

phased-restarts using puma

In my original post about zero downtime deployments I wanted to use puma. At the time of writing puma did not support zero downtime restarts: while the connection was kept alive, all workers were killed at once so no requests could be served until the new workers had fully started up.

This changed as of puma v2.0.0.b6. Now you can send SIGUSR1 to the puma master process and puma will phase out old workers while starting new workers one at a time.

Note that this process takes longer than unicorns SIGUSR2 + preload_app restarts because unicorn spawns your new workers all at the same time; this means that puma requires roughly n-times your apps launch time to complete a phased restart.

During your relaunch process you’ll end up with workers running old and new code at the same time. Just make sure you don’t break your old workers by running incompatible migrations immediatly :)

If you are using supervisord, foreman and mina, here’s a short description on how I got it working:

Setup

First, you’ll need to be running puma in clustered mode. In this example I’ll spawn one master process and three worker processes:

# Procfile
app: puma -p 8619 --workers 3

The good thing about puma is that we do not need a wrapper like unicornherder to handle changes in PID since the master always stays around.

Running foreman export supervisord will leave use with something like this:

# /etc/supervisor/conf.d/app.conf
[program:app-1]
command=bundle exec puma -p 8619 --workers 3 --dir /home/app/current
autostart=true
autorestart=true
stdout_logfile=/home/app/shared/log/website-1.log
stderr_logfile=/home/app/shared/log/website-1.error.log
user=app
directory=/home/app/current
environment=RAILS_ENV="production"

[group:app]
programs=app-1

As you can see I just generated a supervisord configuration which directly starts puma in clustered mode.

Note It’s important that you add the --dir /path/to/current option, since puma won’t pick up changes to your code base otherwise.

Deployment with mina

Assuming this is not our very first deployment, we need to restart workers using minas to :launch directive. To issue a phased restart we need to do the following:

  1. finding pumas master pid by: - listing all ruby processes via ps -C ruby -F - greping for /puma (only the process spawned by supervisor will contain this line) - using awk to get the process id via awk {'print $2'}

  2. sending SIGUSR1 to the puma master process to initiate a rolling restart

Minas deploy task looks like this:

task :deploy => :environment do
  deploy do
    invoke :'git:clone'
    invoke :'deploy:link_shared_paths'
    invoke :'bundle:install'
    invoke :'rails:assets_precompile'

    to :launch do
      queue %[kill -s SIGUSR1 $(ps -C ruby -F | grep '/puma' | awk {'print $2'})]
    end
  end
end

That’s it.

Let’s try out our setup using ab:

This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking blog.nicolai86.eu (be patient).....done


Server Software:        nginx/1.2.4
Server Hostname:        blog.nicolai86.eu
Server Port:            80

Document Path:          /
Document Length:        10817 bytes

Concurrency Level:      6
Time taken for tests:   3.430 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      1163452 bytes
HTML transferred:       1096866 bytes
Requests per second:    29.15 [#/sec] (mean)
Time per request:       205.801 [ms] (mean)
Time per request:       34.300 [ms] (mean, across all concurrent requests)
Transfer rate:          331.25 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       56   60   3.1     58      73
Processing:   130  142   9.5    140     173
Waiting:       70   81   9.0     78     111
Total:        189  202  10.4    200     237

Percentage of the requests served within a certain time (ms)
  50%    200
  66%    203
  75%    206
  80%    209
  90%    218
  95%    225
  98%    230
  99%    237
 100%    237 (longest request)

As you can see - it works. Also note the log output when sending SIGUSR1 to puma:

started with pid 38012
[38012] Puma 2.0.0.b6 starting in cluster mode...
[38012] * Process workers: 3
[38012] * Min threads: 0, max threads: 16
[38012] * Environment: development
[38012] * Listening on tcp://0.0.0.0:8619
[38012] Use Ctrl-C to stop
[38012] - Worker 38016 booted, phase: 0
[38012] - Worker 38015 booted, phase: 0
[38012] - Worker 38017 booted, phase: 0
[38012] - Starting phased worker restart, phase: 1
[38012] - Stopping 38015 for phased upgrade...
[38012] - Worker 38119 booted, phase: 1
[38012] - Stopping 38016 for phased upgrade...
[38012] - Worker 38132 booted, phase: 1
[38012] - Stopping 38017 for phased upgrade...
[38012] - Worker 38146 booted, phase: 1

If you see similar output when sending SIGUSR1 your phased restarts using puma are working as expected!

Wrapping up:

  • puma supports phased restarts in clustered mode since v2.0.0.b6
  • using phased restarts we can achieve zero downtime deployments with puma
  • phased restarts work by replacing workers one by one, which takes some time to complete if you are running many workers. But it also consumes less memory than having twice the number of workers running
  • if a worker fails to start up puma master tries to restart the worker. Only if the new worker starts up successfully will puma replace the old worker.

That’s it! Happy hacking!

© Raphael Randschau 2010 - 2022 | Impressum