Here at Braintree, we're heavy users of Capistrano, a handy Ruby framework for automating deployment tasks across multiple machines. On the data team, we also lean pretty heavily on R for automated reporting, graphics, and statistical analysis. Recently, we were deploying some code written in Ruby, Python, and R and had to hack together a custom solution for the R piece. We thought we'd share it in case it's useful to anyone else.
For Ruby, Capistrano has awesome built-in support: simply adding require "bundler/capistrano"
to the top of the deployment file deploy.rb
causes Capistrano to look for a Gemfile
in your project directory. Capistrano then installs the proper gems in a shared location such that using bundle exec
just works. For Python, there's the similarly nifty capistrano-virualenv gem, which installs Python packages listed in a requirements.txt
file in your project directory and then handles the installation of a virtualenv. However, there was nothing like this for R packages. What we wanted was the ability to specify R package dependencies in a text file in our project directory and then have Capistrano install all of those packages in a shared location such that our app could use them.
Capistrano Tasks
To do this, we defined a new Capistrano task in the file config/deploy/support.rb
. Remember to load 'config/deploy/support'
in the main Capfile
. Here's the code:
We called this function in the main deploy.rb
after the code was checked out from git using the Capistrano hook after("deploy:updatecode")
:
First, the task fills an array with the R pacakges we want to install by reading them from the Rpackages.txt
file, the analogue of the Gemfile
or requirements.txt
. Next, we construct a string that holds the R command we'll execute to install the packages. Essentially, we first run R's require
function to tell us if the package has already been installed in a previous deployment. If it hasn't been, we call R's install.packages
function, specifying that the package should be installed in a directory called R
in Capistrano's shared area, using the package mirror located at cran.cnr.berkeley.edu
(I've still got a soft spot for the Bay Area). Once the full string is constructed, we call R on the command line and pass it this string to execute. Voila! A happy R deployment.
Anyone else using Capistrano with R? We'd love to hear from you.