Causes Tech: Make your deploys blazing fast by making them git-aware

Posted Aug 23, 2011 by

Adam is a software engineer here at Causes and has the best personal bio on our team page.

Fast deploys are important for keeping up overall momentum on a programming team. If an entire deploy takes 40 minutes, friction is high and deploys happen less frequently — meaning that “finished” code sits around longer waiting to go into production. The holy grail of code deployment is continuous deployment, where every change is automatically tested and (if tests show no problems) pushed into production. To reach continuous deployment, you need fast tests and fast deploys. Since we already have fast tests, we decided to take some time to speed up our deployment process.

Our existing deployment process is one that I’ve seen with several engineering teams:
1. Create a tarball of the code revision you want to deploy
2. Transfer (scp/rsync) that tarball to a deployment server
3. Transfer the tarball from the deployment server to each appserver
4. Extract the tarball to a directory containing deployments
5. Switch the `current` symlink in deployments/ to point to the new directory
6. Perform a rolling restart of the appservers

There are a few bottlenecks in this process. First, you need a fast connection from your development machine to the deployment server (usually the former is home/office and the latter is in a colo) because you’re sending an entire copy of your codebase over the wire. If your codebase is a few megs, this isn’t a big deal — but as you start approaching several hundred megabytes, this becomes a major painpoint. And why do you need to send everything, anyway? The diff you’re deploying is probably only a few kilobytes different from what’s currently in production, and git knows how to apply diffs pretty well. A better method:

1. Create a clone of your repo on each appserver (this only needs to happen once)
2. Tell each appserver to fetch and reset –hard to the specified revision (be careful with submodules)
3. Use rsync to make a copy of the repo to the deployments/ directory, omitting the .git directory
4. Switch the `current` symlink in deployments/ to point to the new directory
5. Perform a rolling restart of the appservers

After building this logic into our deployment process, our average deploys went from 5-20 minutes to 2-3 minutes. Currently, the slowest part of the deploy process is the rake task that determines if a migration needs to be run (most of that is Rails startup time) and the rolling restart, which is slow on purpose (we don’t want all appservers to simultaneously be unable to serve traffic). We’ve put together a reference implementation of the code that needs to be run on each appserver. This should be used as a guideline for building your own, and is probably not robust enough as-is to use in your production environment (but it does cover submodules, which is something that seems to be lacking on other write-ups of this topic).

Tags:

  • James Brown

    rsync is actually at least at good at transferring minimal diffs as git is. Generally, it’s better. So if you just keep the previous version of the code on your deployment server and use either –copy-dest or –link-dest with rsync, you’ll find that you have a deployment process which is as fast or or faster, and no longer has a dependency on git (so can be used to manage things that aren’t in git).

  • Lann Martin

    @James: Good point about rsync – I’d love to see a performance comparison. My sense is that the transfer difference will be pretty minimal in most cases.

    Our deploy process already depends on git, so it’s simpler and more convenient to distribute the repository than to bounce the transfer through a deploy server (as with step 2 in Adam’s tarball deploy description).

    Thanks for pointing out –link-dest as well. Adding that to our deploy will make each release use less disk space and I/O.

  • James Brown

    Also, using git archive --git-dir maybe safer/faster than just copying the git directory.

    In general, the advantage of using a deploy server in this case might be that it substantially reduces pressure on your git server when you end up having more than a couple of servers. You can, of course, use a different git replica for deployment, but at that point you might as well just have a full deployment server. ;-)

  • http://www.facebook.com/people/Jay-Adkisson/1222797 <fb:name linked="false" useyou="false" uid="1222797">Jay Adkisson</fb:name>

    Awesome, glad to hear the deploy process has gotten a makeover!

    We’re using a similar strategy at GoodGuide, except that in place of reset --hard, we do git checkout -f origin/production. Using the production branch means that “merging into production” is exactly what it sounds like, and you don’t have to specify a revision when deploying. Also, while reset --hard tries to update the current local branch to point to the specified revision, checking out the remote branch does exactly what you want: namely, makes the current working tree the same as the last read of the remote repo.

    Glad to see the eng team posting here, it’s really refreshing :)

  • Dan

    Keep going and you will start having small tarballs of each piece and some scripts wrapped around them… eventually you will rebuild a package manager because you have to “keep state” of the system so you can roll back.

    Or you could use existing package manager frameworks like rpm and generate small packages automatically with something like hudson.

    Is it too easy ?

  • http://www.facebook.com/people/Jay-Adkisson/1222797 <fb:name linked="false" useyou="false" uid="1222797">Jay Adkisson</fb:name>

    @Dan, there are two ways to go with that. What Adam is saying, I think, is that their current deploy process is the complicated mess of tarballs that you mentioned.

    I think either a package manager based solution (as you mentioned) or a VCS solution (which is what Adam is writing about) are vast improvements, with different pros and cons. Both allow you to “roll back” without doing anything particularly special.

    I’ve never worked with a package-based deploy system, but one of the advantages Adam mentioned here was the performance boost of sending only incremental updates, esp. for large code repositories.

How have you used Causes to create an impact in your community?

4+1=?