Deploying Applications with zc.buildout

Background

For more than 10 years, we’ve been building Python-based content management systems and web-sites. These typically have lots of moving parts. From early on, we automated our deployments, initially for installation on customer servers and later in own hosting environment.

Our deployments included their own copy of Python. Initially this was because Python wasn’t installed on customer’s machines. Later, as Python was typically installed in host environments, we continued to include our own, as system provided Python installations are too unpredictable.

For a long time, we used make to automate deployments. This was problematic in a number of ways. Make uses shell scripts to define rules and was a huge maintenance burden.

In 2005, Benji York began a prototype build-out system that used a simple ini-style configuration with Python recipes. This version was used on a number of projects, worked very well, and was a big improvement over the make-based solutions. It was useful from development through deployment.

For years, we’d been frustrated with support for building and deploying applications with Python. The primary packaging tool, distutils, was geared toward installing individual packages into Python site packages. It didn’t deal with dependencies and generally didn’t provide the kinds of support we needed. In 2004, we began working on a packaging system on top of distutils that tried to address these problems. We used the system to generate a number of early Zope 3 releases. Around the same time, Philip Eby was working on a similar system, called setuptools. It provided many improvements over what we’d been doing, but emphasized ad hoc installation of packages into Python installations.

In 2006, the zc.buildout project was started to create a non-prototype build-out solution based on lessons learned using our prototype and to make it easier to leverage setuptools in a more controlled manner.

We love our operations staff

They get called in the middle of the night if something goes wrong. They deal with hardware issues. They provide the hardware and systems deployments we need. They deal with a lot of issues that might not be exciting to us.

It’s important to us that they can do their jobs affectively. To do that, there are a number of things we as developers need to do:

  • Support the system packaging tools our operations staff uses.
  • Make sure our applications install files where operations staff expect to find them.
  • Make sure our applications have run-control scripts so they work properly with system startup and shutdown.
  • Make sure our applications work with systems monitoring tools.

Separating software installation and configuration

We separate software installation and configuration. It might be reasonable on an end-user system to create a running useful process as part of the software install, but for production deployments, we are going to configure the software in specific ways that vary independently from the software.

We deploy applications in 2 steps:

  • Install software via RPM packages.

  • Configure the installed software with “system buildouts”.

    A system buildout is a buildout that:

    • runs as root
    • runs in offline mode and so doesn’t download anything
    • uses the buildout directory provided by an application to get needed recipes.

Example: developing and deploying a typical CMS application

To show how the techniques and tools we’ve developed work, we’ll walk through the life-cycle of a CMS application. The example is highly simplified to make the artifacts easier to follow.

Imagine you’re developing a web-bases CMS application. You’ll typically have one or more web applications, one or more databases, and possibly other tools. In our example, we have a web application and 2 databases, a main database and an indexing databases. The details aren’t important, In the example, we’ll used a bobo-based application and two ZODB databases.

Development

During development, we’ll use a buildout configuration like the following:

[buildout]
develop = .
parts = test app-sw db-sw ctl buildout-source-release

[test]
recipe = zc.recipe.testrunner
eggs = cmsapp

[app-sw]
recipe = zc.recipe.egg
eggs = ${test:eggs}
       PasteScript
       zc.zodbwsgi
interpreter = py

[db-sw]
recipe = zc.recipe.egg
eggs = ZODB3
       zc.queue
       zope.app.keyreference
       zope.minmax
       zc.catalogqueue
       zc.zlibstorage

[ctl]
recipe = zc.recipe.rhrc
dest = ${buildout:bin-directory}
parts =
    main-db-server
    index-db-server
    app-server

[db-server]
recipe = zc.zodbrecipes:server
zeo.conf =
   <zeo>
      address ${:address}
   </zeo>
   %import zc.zlibstorage
   <zlibstorage>
     <filestorage>
        path ${:path}
     </filestorage>
   </zlibstorage>

[main-db]
recipe = zc.recipe.filestorage

[main-db-server]
<= db-server
address = :8100
path = ${main-db:path}

[index-db]
recipe = zc.recipe.filestorage

[index-db-server]
<= db-server
address = :8101
path = ${index-db:path}


[paste.ini]
recipe = zc.recipe.deployment:configuration
s =
text =
  ${:s}[app:main]
  ${:s}use = egg:bobo
  ${:s}bobo_resources = cmsapp
  ${:s}filter-with = zodb
  ${:s}
  ${:s}[filter:zodb]
  ${:s}use = egg:zc.zodbwsgi
  ${:s}configuration =
  ${:s}  <zodb main>
  ${:s}     <zeoclient>
  ${:s}        server ${main-db-server:address}
  ${:s}     </zeoclient>
  ${:s}  </zodb>
  ${:s}  <zodb index>
  ${:s}     <zeoclient>
  ${:s}        server ${index-db-server:address}
  ${:s}     </zeoclient>
  ${:s}  </zodb>
  ${:s}
  ${:s}[server:main]
  ${:s}use = egg:Paste#http
  ${:s}host = localhost
  ${:s}port = 8080

[app-server]
recipe = zc.zdaemonrecipe
program = ${buildout:bin-directory}/paster serve ${paste.ini:location}

[buildout-source-release]
recipe = zc.recipe.egg:scripts
eggs = zc.sourcerelease

We’ll walk through this rather quickly.

The first section is the buildout section that defines, at a high level what is to be built. A develop option says to treat the current directory as a development project. The directory contains a setup file and source files that define the Python web application. The parts option defines the top-level parts that make up the project. These include:

  • a test script
  • application software
  • database software
  • a process control script
  • a buildout-source-release script

In buildout, parts can be defined directly, in a parts option or indirectly by reference from other parts. For example, the main-db-server part is included by virtue of being reference by the ctl part that we named in the parts option. The main-db part is included because it was referenced by the main-db-server part.

The app-sw and db-sw parts build scripts for launching the application-server and database-server processes.

The ctl part builds run control scripts for the parts named in it’s parts option and a master run control script that starts processes in the order given. The -server parts define server instances for the 2 databases and the application. The database server parts leverage buildout’s simple macro mechanism to minimize the amount of duplication. The paste.ini part used by the app-server part leverages variables defined in the database server parts to avoid repeating server addresses.

We’ll have more to say later about buildout-source-release.

With this set up, we get development databases and process management. Moreover, we can check the configuration in and recreate the buildout later by checking it out and rebuilding.

Once we have the initial configuration, we’re ready to run the buildout. If we have buildout installed in out Python (or elsewhere), we can run the buildout command. If not, we need to bootstrap our buildout. To do that, we’ll grab a copy of the bootstrap script:

wget http://svn.zope.org/*checkout*/zc.buildout/trunk/bootstrap/bootstrap.py

and run it:

python bootstrap.py

It creates a local buildout script that we can then run:

bin/buildout

Deployment

So, you’ve done initial development and you’re ready to deploy to production. let’s consider our system architecture. Our system is going to get a lot of use, so we’re going to spread the processes on separate nodes:

_images/machines1.png

What was initially a self contained project in one directory of one machine will now be spread over multiple machines. We’ll deploy the databases to the database server and deploy the application to each of the application servers.

Below, we’ll look at the database deployment. The application deployment will have much the same issues and solutions.

We don’t build on production machines

We don’t build on production machines because:

  • Production machines don’t have compilers.
  • Building is slow.
  • Can’t risk being unable to download dependencies.
  • It doesn’t play well with system packaging tools.
  • It makes operations folks cranky.

Instead, we build system binary packages. Our systems use RPMs.

Creating a source release

The first step to building a system package is creating a source release, typically as a gzipped tar archive. The buildout-source-release tool automates creating source releases from buildouts. This script can be installed separately on your system, or you can include it in your buildout as we did here.

The first step is to create a buildout configuration file for the software. We’ll create separate distributions for the database and application. Here’s the database configuration:

[buildout]
extends = buildout.cfg
parts = db-sw extra-eggs sbo
relative-paths = true
develop = metarecipe

[extra-eggs]
recipe = zc.recipe.egg:eggs
eggs = zc.recipe.rhrc
       zc.zodbrecipes
       zc.recipe.deployment
       zc.zodbdgc

[sbo]
recipe = zc.recipe.egg
eggs = zc.sbo

We extend the buildout.cfg, so we don’t have to repeat the | information we already have. We limit the parts to the db-sw part, another extra-eggs part, and an sbo part. The extra-eggs part is used to include software, typically buildout recipes, that we’ll need for configuration later.

We overrode the develop option to point a sub-directory that provides a meta-recipe that we’ll discuss later.

We added a relative-paths option to cause scripts to be generated in such a way that they still work if the buildout directory is moved. This is important because, when we build an RPM, we’ll build in a temporary directory and move the buildout to a destination directory at install time.

We use the buildout-source release script to build a source release:

bin/buildout-source-release -n cmsappdb-0.1.0 \
   svn+ssh://svn.zope.org/repos/main/Sandbox/J1m/pycon2011/dev dbsource.cfg

Buildout-source-release takes a URL, a configuration file name, and an optional name. The URL can be a subversion URL [1] or a file URL. If it’s a subversion URL, then the given subversion path is checked out into a temporary directory. Buildout is run with the given configuration to and the result used to produce a gzipped tar archive containing everything needed to produce a binary installation without network access. The name of the tar archive is taken from the name option, or from the last path segment in the URL.

The source release can be used to distribute an application in source form directly. It’s typically used to create a system package, such as an RPM.

Creating an RPM

Once we have the source release, we’re ready to create an RPM. We need an RPM specification file:

Name: cmsappdb
Version: 0.1.0
Release: 0
Summary: Database Server for my CMS
Group: Database

Requires: cleanpython26
BuildRequires: cleanpython26
%define python /opt/cleanpython26/bin/python

##########################################################################
# Lines below this point normally shouldn't change

%define source %{name}-%{version}

Vendor: Zope Corporation
Packager: Zope Corporation <sales@zope.com>
License: ZVSL
AutoReqProv: no
Source: %{source}.tgz
Prefix: /opt
BuildRoot: /tmp/%{name}

%description
%{summary}

%prep
%setup -n %{source}

%build
rm -rf %{buildroot}
mkdir %{buildroot} %{buildroot}/opt
cp -r $RPM_BUILD_DIR/%{source} %{buildroot}/opt/%{name}
cd %{buildroot}/opt/%{name}
%{python} install.py bootstrap
%{python} install.py buildout:extensions=

%{python} -m compileall -q -f -d /opt/%{name}/eggs eggs \
   > /dev/null 2>&1 || true

rm -rf release-distributions

for egglink in develop-eggs/*.egg-link
do
    sed -i "s;%{buildroot};;" ${egglink}
done

cd /tmp

%clean
rm -rf %{buildroot}
rm -rf $RPM_BUILD_DIR/%{source}

%files
%defattr(-, root, root)
/opt/%{name}

This spec file is fairly generic. The variable parts are at the top. A few things to note:

  • Many of the details are RPM specific and will vary with other system-packaging systems.

  • The section:

    %prep
    %setup -n %{source}

    Is a generic RPM macro invocation to unpack an archive.

  • The %build section contains a script for building a binary installation from the source release.

    The script we see here does most of it’s work by invoking the install.py script included in the source release.

    We recompile .py files to reflect the installed location.

    We have to edit develop-egg links to reflect the install location as well.

  • The software is built in a temporary directory (BuildRoot). When the RPM is installed, it will be copied to /opt/. Because the built software contains paths, we need to make sure the paths will work in the install directory. We can do that in a number of ways:

    1. Somehow pass alternate paths to the buildout. This is rather hard. Buildout, and, more importantly, recipes that you might use, don’t provide support for building with alternate paths.

      Also, this approach wouldn’t work if you wanted to create relocatable RPMs.

    2. Adjust paths during installation, typically by rerunning the buildout after a basic install. This is probably the most robust approach in that it will generally work with any recipes, but it complicated the spec file a fair bit.

    3. Use relative paths. Use the relative-paths buildout option to generate scripts with buildout-relative paths. This works well in most cases, and is the easiest to use.

      A downside of this approach is that some paths may not be handled correctly. For example, if your project uses develop eggs, you’ll need to adjust the paths in the egg links, either at the end of the build step, or during installation.

    Here, we’ve used the 3rd option.

  • We require cleanpython26.

    We (ZC) always use “clean” Python’s, which are Python’s built from source without any additions. We install these as separate packages alongside whatever Python is provided by the operating system.

    Of course, you can base your distributions on the operating-system-provided Python, if you like a little excitement in your life. :)

To build an RPM, we need to:

  1. Copy the source tar archive to the RPM SOURCES directory

  2. Run rpmbuild:

    rpmbuild -ba cmsappdb.spec

    This will expand the archive and run through the build steps. If all goes well, it will produce a source RPM and a binary RPM.

The source RPM combines the source tar archive and the spec file in a single file that can be used to rebuild a binary RPM later, for different operating system versions or hardware architectures. Our operations folks really like getting source RPMs. They prefer to get source RPMs and build their own binary RPMs for deployment.

The binary RPM can be used to install the software directly. I typically install the RPM on the build machine for testing:

sudo rpm -i cmsappdb-0.1.0-0.x86_64.rpm

Configuration

Once the software is installed, we’re ready to configure it. For many applications, including database servers, you may need to have multiple configurations of the software on the same machine, say for different customers. Let’s call these “deployments”.

For each deployment, we’ll use buildout to automate generation of needed configuration and runtime files and directories. To help us with this, we’ll use the system-buildout script, sbo provided by the zc.sbo project. Normally, we would install this separately and, and the sbo project repository has an “rpm” configuration for building a source release and an RPM spec file for installing the sbo script in the system path. To make this example easier to follow, we’ve included the sbo script in our source buildout.

To set up a deployment, the sbo script takes two arguments, an application name, and a configuration name. The sbo script uses the buildout in the named application install directory (/opt/APPLICATION-NAME) to run the given configuration in offline mode. It looks for the configuration in /etc/APPLICATION-NAME/CONFIG-NAME.cfg.

Let’s configure our CMS database for a customer, Ample Inc.. We’ll create /etc/cmsappdb/ample.cfg:

[buildout]
parts = ctl pack

[deployment]
recipe = zc.recipe.deployment
name = ample
user = zope

[ctl]
recipe = zc.recipe.rhrc
deployment = deployment
chkconfig = 345 99 10
parts = main index

[server]
recipe = zc.zodbrecipes:server
deployment = deployment
zeo.conf =
   <zeo>
      address ${:address}
   </zeo>
   %import zc.zlibstorage
   <zlibstorage>
     <filestorage>
        path ${:path}
        pack-gc false
     </filestorage>
   </zlibstorage>

[main]
<= server
address = :8100
path = /var/databases/ample/main.fs

[index]
<= server
address = :8200
path = /var/databases/ample/index.fs

[gc-config]
recipe = zc.recipe.deployment:configuration
deployment = deployment
text =
    <zodb main>
      <zeoclient>
         server ${main:address}
      </zeoclient>
    </zodb>
    <zodb index>
      <zeoclient>
         server ${index:address}
      </zeoclient>
    </zodb>

[pack.sh]
recipe = zc.recipe.deployment:configuration
deployment = deployment
text =
  ${buildout:bin-directory}/zeopack -d3 -t00 ${main:address} ${index:address}

  ${buildout:bin-directory}/multi-zodb-gc -d3 -lERROR ${gc-config:location}

[pack]
recipe = zc.recipe.deployment:crontab
deployment = deployment
times = 1 2 * * 6
command = sh ${pack.sh:location}

This configuration looks a lot like the original configuration except that it doesn’t define any software. Some of the section names have been made shorter, because this configuration is more focused.

Some differences to note:

  • Because this is a production installation, we specify paths explicitly rather than letting a recipe compute them.
  • Because this is a production installation, we need to provide for packing and garbage collection. There might be cross-database references, so we need to use a garbage collector. We’ll perform packing and garbage collection as two separate steps. We’ve configured the databases to not perform garbage collection while packing. We generate a pack script that calls zeopack to pack the databases and then calls multi-zodb-gc to do garbage collection.

When we run sbo:

/opt/cmsappdb/bin/sbo cmsappdb ample

The buildout is run and a number of files get created in places that system people expect:

Path Description
/etc/ample Configuration directory
/etc/ample/gc-config Garbage collection configuration
/etc/ample/index-zdaemon.conf Index daemon configuration
/etc/ample/index-zeo.conf Index server configuration
/etc/ample/main-zdaemon.conf Main daemon configuration
/etc/ample/main-zeo.conf Main server configuration
/etc/ample/pack.sh Pack shell script
/var/log/ample Log directory
/var/run/ample Run-time directory
/etc/init.d/ample-main Run-control script for main database
/etc/init.d/ample-index Run-control script for index database
/etc/init.d/ample Combined run-control script
/etc/cron.d/ample-pack Cron job for packing

This was pretty straightforward and a lot of configuration was done for us. If we made a mistake, we can supply the sbo uninstall option:

/opt/cmsappdb/bin/sbo cmsappdb -u ample

and the files we created will be uninstalled [2].

Meta-recipes

We get a contract with EX Partners and need to set up databases for them. We can create a configuration for EX Partners and it will look a lot like the configuration for Ample Inc. That’s a lot of repetition. It would be great if we could factor out all of the common parts of the configurations and just say what’s different. Something like [3]:

[buildout]
parts = ex

[ex]
recipe = cmsappdbmetarecipe
main-port = 8200
pack-time = 1 3 * * 6

The way we do this is with a meta recipe. Meta recipe is a buildout recipe that uses other recipes to get it’s work done. It uses a buildout API that allows configuration sections to be created and acted on dynamically. Because they’re written in Python they express logic far beyond the capabilities of a macro system.

Here’s a meta recipe that automates the configuration of cmsappdb deployments:

class MetaRecipe:
    def __init__(self, buildout, name, options):

        def add_section(section_name, **values):
            if section_name in buildout._raw:
                raise KeyError("already in buildout", section_name)
            buildout._raw[section_name] = values
            buildout[section_name] # cause it to be added to the working parts

        deployment = name + '-deployment'

        add_section(deployment,
                    recipe = 'zc.recipe.deployment',
                    name=name,
                    user=options['user'],
                    )

        main_port = options['main-port']
        index_port = options.get('index-port', str(int(main_port) + 1))
        ports = main_port, index_port
        dbnames = 'main', 'index'
        servers = zip(dbnames, ports)
        for dbname, port in servers:
            add_section(name+'-'+dbname,
                        recipe = 'zc.zodbrecipes:server',
                        deployment = deployment,
                        **{'zeo.conf': zeo_conf % dict(
                            port=port,
                            customer=name,
                            dbname=dbname,
                            )})

        add_section(name+'-ctl',
                    recipe = 'zc.recipe.rhrc',
                    deployment= deployment,
                    chkconfig = '345 99 10',
                    parts = ' '.join(name+'-'+dbname
                                     for (dbname, _) in servers),
                    )

        add_section(name+'-gc.conf',
                    recipe = 'zc.recipe.deployment:configuration',
                    deployment = deployment,
                    text='\n'.join(gc_conf % dict(dbname=dbname, port=port)
                                   for (dbname, port) in servers),
                    )

        add_section(name+'-pack.sh',
                    recipe = 'zc.recipe.deployment:configuration',
                    deployment = deployment,
                    text = pack_sh % dict(
                        customer=name,
                        addresses = ' '.join(':'+port for port in ports),
                        gcconf=name+'-gc.conf',
                        ),
                    )

        add_section(name+'-pack',
                    recipe = 'zc.recipe.deployment:crontab',
                    deployment = deployment,
                    times = '1 2 * * 6',
                    command = 'sh ${%s-pack.sh:location}' % name
                    )

    update = install = lambda self: ()

zeo_conf = """
<zeo>
   address :%(port)s
</zeo>
%%import zc.zlibstorage
<zlibstorage>
  <filestorage>
     path /var/databases/%(customer)s/%(dbname)s
     pack-gc false
  </filestorage>
</zlibstorage>
"""

gc_conf = """
    <zodb %(dbname)s>
      <zeoclient>
         server :%(port)s
      </zeoclient>
    </zodb>
"""

pack_sh = """
  ${buildout:bin-directory}/zeopack -d3 -t00 \
     %(addresses)s

  ${buildout:bin-directory}/multi-zodb-gc -d3 -lERROR \
     ${%(gcconf)s:location}
"""

Meta recipes do all of their work in their constructors. Buildout processes in 2 phases:

  • It initializes parts in the first phase by calling recipe constructors. This is when part data is fully formed, often by referencing data across parts and this is the phase in which parts can be added.
  • In the second phase, recipe methods are called to install or update parts.

Meta recipes are a fairly recent development and they are supported by a barely public API that was private until recently. Build objects have a _raw attribute that contains section data read directly from configuration files. A meta recipe can add parts by adding sections to this mapping object. A part is created when this data is referenced. This API isn’t pretty, but it will be supported indefinitely. In the future, cleaner APIs will be provided. In this example, we provide a small helper function to make adding sections easier.

The meta recipe adds parts corresponding to each of the parts in the original configuration. It adds it’s own name as a prefix to each of the part names to make their names unique. It didn’t create a part corresponding to the server macro because it was able to accomplish the same abstraction through a simple Python loop.

It calculates the the index database server port from the main database server port.

We created our meta recipe in a subdirectory so we could make it a separate project with its own dependencies. We did this so we wouldn’t drag along application server dependencies for the recipe to be used on the database servers. Here’s the meta recipe’s setup.py script:

from setuptools import setup
setup(
    name = 'cmsappdbmetarecipe',
    py_modules = ['cmsappdbmetarecipe'],
    entry_points = {'zc.buildout': ['default = cmsappdbmetarecipe:MetaRecipe']},
    install_requires = ['setuptools'],
    zip_safe = False,
    )

Note the use of the entry_points option. To use the meta recipe, we have to create a buildout entry point in our project.

As mentioned earlier, the dbsource.cfg file overrides the develop option to to include the meta recipe development egg.

Locking down versions

Once you get everything working the first time, you’ll typically want to lock down the versions of the projects you use. Buildout lists the versions you use when run in verbose mode and let’s you list versions in a section. Buildout will then use the listed versions.

Future work: makemea

Buildout recipes help with automating installation at a low level. There are a growing number number of recipes available on PyPI that provide a toolbox for a variety of configuration tasks. Meta recipes make it easy to specify a complex configuration at a high level.

There’s still an important problem to be solved. In this example, we deployed database servers to support an application. We can deploy the application following a similar process. Even doing so at a high level, there are a number of aspects that require humans to think hard:

  • Managing information about how the high-level parts fit together (host names and ports)
  • Automating assembly of the whole system
  • Resource allocation
  • Monitoring the system as a whole rather than just a single machine. For example, we don’t need to wake someone up if one application of many redundant application servers fails.

There are systems that can help with some of these problems. Systems like Puppet and BCFG2 can automate system assembly and avoid duplicate specification of shared configuration, such as service addresses. However, these system are expressed using generic configuration mechanism that expresses the application model indirectly. It’s up to the deployer to arrange configuration so that it implements the desired application.

We’re working on a system to automate whole-system deployments using application-defined models expressed in Python. The basic architecture is a:

  • Application-defined models express the components of a system at a high-level. For example, a CMS model expresses that a CMS has a load balancer and multiple application instances and uses multiple databases. Models include data persisted in the database and rules for implementing in a pseudo transactional way using actions and undo actions.
  • A database that models computation resources (machines, disks, processors, etc.), and deployed applications.
  • Command-line and GUI, but not web-based, configuration programs for accessing the database and affecting changes such as deploying new applications.
  • Buildout recipes and meta-recipes.
  • Transactionish.

The configuration programs will use ssh commands running with the user’s keys to install software and run system buildouts. They will assign addresses and decide where applications will run.

Summary

We’ve presented a strategy for deploying applications:

  • development buildouts
  • source distributions
  • system packages
  • system configuration

And described a vision for knitting application components to define full systems.

[1]Support for other version control systems will be added in the future. For now, you can work around this by checking out ahead of time and using a file url.
[2]Some directories, most notably log directories won’t be uninstalled if they contain files not created by the buildout.
[3]It would be even better if we didn’t have to specify the buildout section. It’s fairly common to have buildout configurations with a single section. Eventually, buildout will handle this common case by treating a configuration with a single section as a buildout with one part.
[4]This is a wildly simplified setup file. It’s just an example.