Deploying R Apps with Capistrano

Joe Wolf (JoeInSouthernCA) via Flickr

Here at Braintree, we're heavy users of Capistrano, a handy Ruby framework for automating deployment tasks across multiple machines. On the data team, we also lean pretty heavily on R for automated reporting, graphics, and statistical analysis. Recently, we were deploying some code written in Ruby, Python, and R and had to hack together a custom solution for the R piece. We thought we'd share it in case it's useful to anyone else.

For Ruby, Capistrano has awesome built-in support: simply adding
require "bundler/capistrano" to the top of the deployment file deploy.rb causes Capistrano to look for a Gemfile in your project directory. Capistrano then installs the proper gems in a shared location such that using bundle exec just works. For Python, there's the similarly nifty capistrano-virualenv gem, which installs Python packages listed in a requirements.txt file in your project directory and then handles the installation of a virtualenv. However, there was nothing like this for R packages. What we wanted was the ability to specify R package dependencies in a text file in our project directory and then have Capistrano install all of those packages in a shared location such that our app could use them.

Capistrano Tasks

To do this, we defined a new Capistrano task in the file config/deploy/support.rb. Remember to load 'config/deploy/support' in the main Capfile. Here's the code:

We called this function in the main deploy.rb after the code was checked out from git using the Capistrano hook after("deploy:updatecode"):

First, the task fills an array with the R pacakges we want to install by reading them from the Rpackages.txt file, the analogue of the Gemfile or requirements.txt. Next, we construct a string that holds the R command we'll execute to install the packages. Essentially, we first run R's require function to tell us if the package has already been installed in a previous deployment. If it hasn't been, we call R's install.packages function, specifying that the package should be installed in a directory called R in Capistrano's shared area, using the package mirror located at cran.cnr.berkeley.edu (I've still got a soft spot for the Bay Area). Once the full string is constructed, we call R on the command line and pass it this string to execute. Voila! A happy R deployment.

Anyone else using Capistrano with R? We'd love to hear from you.

***
Michelangelo D'Agostino Michelangelo D'Agostino was previously a Data Science Lead at Braintree. More posts by this author

You Might Also Like