Capistrano Basics

This tutorial will walk you through the basics of setting up and using Capistrano. It will not introduce you to the deployment system that is bundled with Capistrano, but will instead focus on the more general areas of executing Capistrano and writing your own recipes. It will be primarily of interest to those wanting to use Capistrano in non-deployment domains, and to those who just wish to become more familiar with Capistrano itself.

Installation

I strongly recommend that you use the excellent RubyGems package management system to install Capistrano. If you have RubyGems installed, you can simply type:

1
gem install capistrano

Assumptions

Capistrano makes a few assumptions about your servers. In order to use Capistrano, you will need to comply with these assumptions:

Capistrano also makes a few assumptions about your own familiarity with computers:

Capfiles

Capistrano reads its instructions from a capfile. (For those of you familiar with the “make” or “rake” utilities, the concept is the same as a “makefile” or “rakefile”.) If you create a file called “capfile” (or “Capfile”, if you prefer), Capistrano will read that file and process the instructions in it.

The Capfile is where you will tell Capistrano about the servers you want to connect to and the tasks you want to perform on those servers. It is essentially just a Ruby script, but augmented with a large set of “helper” syntax, to make it easy to define server roles and tasks. (Using the lingo of those in the know, the Capfile is written using a custom DSL on top of Ruby.)

You can use any editor you want to write your Capfiles; they are just simple text files. I recommend something designed for programmers, like vim, emacs, TextMate, Eclipse, and so forth. Choose whatever you’re comfortable with, but make sure whatever you choose can save files as plain text, and will not automatically append a extension like ”.txt” to the filename. The Capfile should be called “capfile” or “Capfile”, without any extension.

A simple example

So, enough chit-chat. Let’s look at a very simple capfile, to see what it’s like:

1
2
3
task :search_libs, :hosts => "www.capify.org" do
  run "ls -x1 /usr/lib | grep -i xml"
end

This defines a single task, called “search_libs”, and says that it should be executed only on the “www.capify.org” host. When executed, it will display all files and subdirectories in /usr/lib that include the text “xml” in their name. By default, “run” will display all output to the console.

Assuming your capfile is in the current directory, you would execute that task like this (from the command-line):

1
cap search_libs

You can define as many tasks as you want:

1
2
3
4
5
6
7
task :search_libs, :hosts => "www.capify.org" do
  run "ls -x1 /usr/lib | grep -i xml"
end

task :count_libs, :hosts => "www.capify.org" do
  run "ls -x1 /usr/lib | wc -l"
end

Here we’ve added a second task, “count_libs”, which will display the number of entries in /usr/lib. We could add more, but even with just two tasks, having to specify the host over and over is getting a little unwieldy. Here is where “roles” come into play:

1
2
3
4
5
6
7
8
9
role :libs, "www.capify.org"

task :search_libs do
  run "ls -x1 /usr/lib | grep -i xml"
end

task :count_libs do
  run "ls -x1 /usr/lib | wc -l"
end

We’ve created one new role, called “libs”, and associated “www.capify.org” with that role. By default, a task will be executed on all servers in all roles, so we were able to drop the :hosts declaration from the tasks. Much simpler!

(Note: Capistrano’s login defaults to whatever user you are currently logged into your local machine as. If you need to log in as a different user, you can encode that username in the server definition, “joe@www.capify.org”. Alternatively, you can set the :user variable to the username you want to use. We’ll get to variables in a minute.)

Gateway servers

In the “real world”, we have to worry about bad guys trying to sneak into our servers, so many server clusters are hidden behind NATs and firewalls, to prevent direct access. Instead, you have to log into some “gateway” server, and then log into the servers you want, from there.

Capistrano supports this scenario by allowing you to define a gateway server. All subsequent connections will be tunneled through the gateway (using SSH forwarded ports). To tell Capistrano about your gateway server, you simply do:

1
2
set :gateway, "www.capify.org"
role :libs, "crimson.capify.org"

Here, the assumption is that crimson.capify.org is behind a NAT, and cannot be directly accessed. By setting the :gateway value to “www.capify.org”, we tell Capistrano that in order to access any of the other servers, it must first establish a connection to “www.capify.org”, and tunnel subsequent connections through that.

Multiple servers

So things are going great on your one server. But load climbs, and you decide you need to add another one. You’d still like to be able to query both servers from a single task, though…

Easy enough:

1
role :libs, "crimson.capify.org", "magenta.capify.org"

Now, when you execute “cap search_libs” or “cap count_libs”, the command will be executed in parallel on both servers, with the output aggregated into a single stream and displayed on your console.

Multiple roles

Down the road, let’s say you add a file server the mix (to host all those pirated mp3’s you’re serving, you naughty person, you). Because it isn’t really running anything, you don’t care so much about the libraries installed there, so you don’t want the search_libs or count_libs tasks to run there. At the same time, you have a task that shows you the free disk space that you only want run on the file server. What to do?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
role :libs, "crimson.capify.org", "magenta.capify.org"
role :files, "fuchsia.capify.org"

task :search_libs, :roles => :libs do
  run "ls -x1 /usr/lib | grep -i xml"
end

task :count_libs, :roles => :libs do
  run "ls -x1 /usr/lib | wc -l"
end

task :show_free_space, :roles => :files do
  run "df -h /"
end

That was easy enough. We just added another role (“files”), and then added :roles constraints to each task, specifying which role each is associated with. When we run “search_libs”, it will only be run against the servers defined in the “libs” role. Similarly, “show_free_space” will only be run against the servers defined in the “files” role.

(Note: you can specify that a task should run on servers in multiple roles by passing an array of role names to the :roles option.)

Documenting tasks

We’re starting to get quite a collection of tasks here! It’d be nice to see, at a glance, what tasks are available to us. Let’s try “cap -T”.

1
2
3
4
5
6
7
8
9
10
11
$ cap -T
cap invoke               # Invoke a single command on the remote servers.
cap shell                # Begin an interactive Capistrano session.

Some tasks were not listed, either because they have no description,
or because they are only used internally by other tasks. To see all
tasks, type `cap -Tv'.

Extended help may be available for these tasks.
Type `cap -e taskname' to view it.
$

Hmmm. It shows two tasks (“invoke” and “shell”, we’ll talk more about those later), but it says nothing about the three tasks we’ve defined so far. Wait, what’s that about “some tasks were not listed…”? Let’s try the recommended ‘-Tv’:

1
2
3
4
5
6
7
8
9
10
$ cap -Tv
cap count_libs      #
cap invoke          # Invoke a single command on the remote servers.
cap search_libs     #
cap shell           # Begin an interactive Capistrano session.
cap show_free_space #

Extended help may be available for these tasks.
Type `cap -e taskname' to view it.
$

Ah-ha! There they are. But it would be nice to include them in the default listing. To do that, we simply need to give them each a description.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
desc "Search /usr/lib for files named xml."
task :search_libs, :roles => :libs do
  run "ls -x1 /usr/lib | grep -i xml"
end

desc "Show the number of entries in /usr/lib."
task :count_libs, :roles => :libs do
  run "ls -x1 /usr/lib | wc -l"
end

desc "Show the amount of free disk space."
task :show_free_space, :roles => :files do
  run "df -h /"
end

Now, when we run “cap -T”, we see this:

1
2
3
4
5
6
7
8
9
10
$ cap -T
cap count_libs      # Show the number of entries in /usr/lib.
cap invoke          # Invoke a single command on the remote servers.
cap search_libs     # Search /usr/lib for files named xml.
cap shell           # Begin an interactive Capistrano session.
cap show_free_space # Show the amount of free disk space.

Extended help may be available for these tasks.
Type `cap -e taskname' to view it.
$

Very nice! Note that the description shown is all text up to the first ”.” character (or up to the first 30 characters, whichever comes first). You can write as much as you want, though, even using multiple lines and indented sections for sample usage. To display the full description, simply use the “-e” switch (“explain”) and pass the name of the task you want to describe. Voila! Instant online documentation of your tasks.

cap invoke

We got a little peek at the two standard Capistrano tasks just then: “invoke” and “shell”. Those tasks are always available, so let’s take a quick look at them.

The first one we’ll investigate is “invoke”. It lets you execute a single command on your remote servers, without having to write a task for it. That makes it great for quick one-off commands, or for integration with other scripts.

Simply specify the command you want to invoke via the COMMAND variable:


cap invoke COMMAND="df -h"

That would execute “df -h” on all defined servers, in all roles. You can restrict the command to a certain set of roles by specifying the role names as a comma-delimited list, using the ROLES variable:


cap invoke COMMAND="df -h" ROLES=libs

You can even execute commands on servers that have not been previously defined, by using the HOSTS variable:


cap invoke COMMAND="df -h" HOSTS=mauve.capify.org

You can type “cap -e invoke” at any time to read the full documentation for that task. It’s pretty useful!

cap shell

As nifty as “invoke” is, it has at least one significant drawback: every time you invoke a command, it has to reestablish connections to the servers. This isn’t a big deal at all if you’re only executing a single command, but if you have two or three that you want to run, it can quickly get annoying, having to wait for the connections to be made.

Enter “cap shell”. This gives you an interactive prompt from which you can enter adhoc commands and even execute tasks, and all connections that are established during the duration of the shell session are cached and reused. That means that if you execute a command and it has to connect to three different servers, the next time you execute a command that needs to connect to those same servers, the connections are reused. Really slick! It’s a really handy tool for all kinds of system administration tasks.

Just as with the “invoke” task, the “shell” task lets you scope commands and tasks by host or role. You can type “help” from within the shell at any time, to get more information.

Namespaces

Alright, back to our regularly scheduled tutorial… Oh, yes. We had just talked about how many tasks we were accumulating. Documenting them was a good first step, but as you get more and more tasks, you begin to wish for a way to group your tasks by functionality.

Enter namespaces. A namespace allows you to group a set of tasks (or even other namespaces) and give them a name. For instance:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
namespace :libs do
  task :search, :roles => :libs do
    run "ls -x1 /usr/lib | grep -i xml"
  end

  task :count, :roles => :libs do
    run "ls -x1 /usr/lib | wc -l"
  end
end

namespace :disk do
  task :free, :roles => :files do
    run "df -h /"
  end
end

There. We defined two namespaces, “libs” and “disk”. If we wanted to search the /usr/lib directory now, we’d do “cap libs:search”. Note that syntax: the namespace separator is a colon when used to identify a task.

Using namespaces also lets you distribute your tasks as a library, without worrying about name collisions. Just package your tasks up in a namespace and ship them off to your users.

Now, what if we wanted to have a namespace named “libs”, and a task named “libs”? Well, you can’t actually do that, but Capistrano gives you a way to make it look like you’ve done that. If you have a task in a namespace, and the task is named “default”, you can invoke it simply by giving the name of the namespace:

1
2
3
4
5
6
7
8
9
namespace :libs do
  task :default, :roles => :libs do
    run "ls -x1 /usr/lib | grep -i xml"
  end

  task :count, :roles => :libs do
    run "ls -x1 /usr/lib | wc -l"
  end
end

So, using the above, if you wanted to search the /usr/lib directory, you’d just do “cap libs”. If you wanted to count the number of entries, you’d do “cap libs:count”.

Variables

Now, let’s say you wanted to pass your new library of tasks to another person, because you found them so useful. However, that other person might want to search for a different string than “xml” in their lib directory. This is where the concept of Capistrano’s variables comes into play.

Instead of hard-coding “xml” into your task, you would just have it look at a variable:

1
2
3
task :search, :roles => :libs do
  run "ls -x1 /usr/lib | grep -i #{term}"
end

People who use your task would then be able to set the :term variable to whatever search term they wanted:


set :term, "xml"

However, if you want to be really friendly, you can actually make the “term” variable prompt for its initial value, and cache whatever the user enters:

1
2
3
4
5
set(:term) do
  print "Gimme a search term: "
  STDOUT.flush
  STDIN.gets.chomp
end

Capistrano actually makes this case simpler, by providing a helper for prompting for input:

1
2
3
set(:term) do
  Capistrano::CLI.ui.ask "Gimme a search term: "
end

Nothing else has to change! The first time any task asks for the value of the :term variable, the associated block gets executed, and its return value gets cached. Subsequent requests for the :term variable will just return the cached value. (In the lingo, these variables are “evaluated lazily”.)

The astute reader will notice that the gateway definition is just setting another variable. Other variables that Capistrano provides out-of-the box are :default_environment, for specifying environment variables that should be set for every command, :ssh_options, for setting things like agent forwarding and specifying an alternative port number, and a :logger variable, which holds a reference to the logger instance being used by Capistrano. You can define as many other variables as you like.

You can also set variables from the command-line, using the “-s” and “-S” switches. The “-s” (lower-case) switch will set the variable after all recipe files have been loaded, which makes it great for overriding variables that have been set by the recipe files themselves.


cap -s term=cap libs:search

On the other hand, sometimes you want to set a variable before any recipe files have been loaded. This is useful if the recipes themselves depend on a variable being set while they are loading (for instance, if they are doing conditional creation of tasks and so forth). To set a variable before any recipe files are loaded, use the -S switch. Note that doing this may result in your value being overwritten by a value set in a recipe file:


cap -S stage=production libs:search

Lastly, you can set environment variables from the command-line, too:


cap COMMAND="df -h" invoke

That will set an environment variable named COMMAND, which you could access in your tasks using Ruby’s ENV hash of environment variables.

Transactions

One of the original use cases for Capistrano was for deploying web applications. (This is still by far its most popular use case.) In order to make deploying these applications reliable, Capistrano needed to ensure that if something went wrong during the deployment, changes made to that point on the other servers could be rolled back, leaving each server in its original state.

If you ever need similar functionality in your own recipes, you can introduce a transaction:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
task :deploy do
  transaction do
    update_code
    symlink
  end
end

task :update_code do
  on_rollback { run "rm -rf #{release_path}" }
  source.checkout(release_path)
end

task :symlink do
  on_rollback { run "rm #{current_path}; ln -s #{previous_release} #{current_path}" }
  run "rm #{current_path}; ln -s #{release_path} #{current_path}"
end

The first task, “deploy” wraps a transaction around its invocations of “update_code” and “symlink”. If an error happens within that transaction, the “on_rollback” handlers that have been declared are all executed, in reverse order.

This does mean that transactions aren’t magical. They don’t really automatically track and revert your changes. You need to do that yourself, and register on_rollback handlers appropriately, that take the necessary steps to undo the changes that the task has made. Still, even as lo-fi as Capistrano transactions are, they can be quite powerful when used properly.

Loading files

If you’ve got a particularly complex setup, you might want to break your capfile into different files, to better encapsulate and isolate the functionality. Then, you would just use the “load” method to load the files into your capfile at runtime:

1
2
load "libs"
load "files"

By default, “load” will only look for files relative to the current directory. (Actually, it also looks relative to it’s own internal load path, but that’s less useful for custom files). If you ever want to add directory to Capistrano’s load path, just append to the load_paths collection:

1
2
load_paths << "config/stages"
load "production"

You can also specify files to load via the command-line, by specifying the “-f” switch:

cap -f libs libs:search

You can specify “-f” as many times as you need, to load as many files as you want. However, you should know that if you ever use the “-f” flag even once, Capistrano will not automatically load your Capfile. If you want it to load your Capfile in addition to the alternative files, you can either specify “-f Capfile” (to have it explicitly load the file), or just use “-F”, which means “load the default capfile”.

That’s it!

That’s the basic functionality of Capistrano. Capistrano itself has lots of other features that have not been covered (such as the helper methods you can use in your tasks, and so forth), but those will be addressed in another tutorial.