Python Packaging

Packaging applications properly is one of the most important ways in which we can improve user experience, and indeed make our own lives easier. Python offers excellent methods to easily package and deploy applications in a cross-platform way. This post demonstrates how we can package up a Python application using the current best-practices.

To illustrate this, I've made a simple Python package called kingman, which simulates the classical single-locus Kingman's Coalescent. The simulation itself is pretty trivial, and something that anyone could reproduce themselves in minutes. The package is available on PyPI and GitHub, and has documentation at ReadTheDocs.

This tutorial is based on current best practises as given in the Python Packaging User Guide. See also the example package from the Python Packaging authority for an alternative framework. This package is based on my own experience, and may not fully conform to what others may think. Please file an issue on GitHub, or send a pull request if you see a problem!

Python versions

Python 3 is here to stay, and we should aim to support it fully in any module that we release. Supporting both Python 2.7 and Python 3.x is actually quite straightforward in the majority of cases, once we follow a few simple rules. For the more complex cases, the six module provides a very useful compatability layer.

In the kingman module, every file starts with the lines

from __future__ import print_function
from __future__ import division

Including these two lines will ensure that the vast majority of the code you write is compatible with both Python 2.7 and 3.x. We also ensure a good level of compatibility by included several Python versions in our Travis Continuous integration tests (see below). See here for further information on issues regarding Python version compatibility.

Code Layout

In our git repository, we have the following files:

  • kingman The directory holding the code for the kingman package.
  • tests The directory containing the unit tests.
  • docs The directory containing the Sphinx documentation.
  • README.txt The README file for the project, written in reStructuredText format. The reason for using reStructuredText is so we can have nicely formatted information on both GitHub and PyPI generated directly from the same README.
  • README.rst A soft link to the README.txt file above for convenience.
  • LICENSE The file setting out the terms under which the software is shared. Should be an OSI approved open source license.
  • MANIFEST.in A file describing the extra files to include in the source distribution. See here for more details.
  • requirements.txt A file describing the packages dependencies. See below for further details.
  • setup.py The main setup configuration file. See below for discussion.
  • ez_setup.py A file that we copy in to the source distribution so we can bootstrap setuptools, if we need to.
  • cli_dev.py This is a simple shim to allow us work with the command line application more easily during development, so we can call python cli_dev.py from the shell to run the application during development.

The setup.py file

The setup.py file is the most important file in defining our package. Following the advice of the Python Packaging Authority, we use setuptools to create the distribution. This complicates things a little, as some users may not have setuptools installed. To work around this, we first attempt to import setuptools, and if it doesn't exist fall back on bootstrapping setuptools using ez_setup.py. This seems to work well enough in practice.

Requirements

One of the most useful aspects of using pip to install packages is the automatic installation of dependencies. To enable this we need to tell pip what packages we depend on. There are two distict scenarious in which we need to be explicit about our dependencies, which are handled in different ways:

  1. In testing and development, we need to list all the packages we require when we want to run our unit tests (e.g. on Travis). This is done using the requirements.txt file.
  2. When installing the package on a users machine, we need to tell the installer what extra packages we depend on for normal use. This is done in setup.py using the install_requires argument. See the setuptools documentation for more details.

Unit tests

Unit testing is an essential part of developing reliable software. We use the nose testing framework for Python. If you follow some simple naming rules for your test cases, nose can autodiscover and run them for you. For example, we can run:

$ nosetests -v
test_random_seed (test_cli.TestInput) ... ok
test_sample_size (test_cli.TestInput) ... ok
test_sample_size (test_cli.TestOutput) ... ok
test_bad_sample_size (test_simulate.TestInput) ... ok
test_random_seed (test_simulate.TestOutput) ... ok
test_sample_size (test_simulate.TestOutput) ... ok

----------------------------------------------------------------------
Ran 6 tests in 0.016s

One minor point to note is that nose will not run unit tests by default in files that are marked executable.

Continuous Integration Testing

Continuous integration testing is one of the most effective ways of ensuring that your package is always in a useful state. Using Travis CI we can test our code after every push to git, which means many bugs are caught earlier than they might otherwise be. To use Travis in the kingman package, we do the following:

  1. Enable Travis integration with GitHub;
  2. Create the .travis.yml file;
  3. Push an update to GitHub.

Code linting

To ensure that our code follows the PEP 8 style guide and to catch many common programming mistakes, we use the Flake8 tool. We ensure this is run every time we push code to GitHub by running it in our .travis.yml file.

Test Coverage

To ensure we maintain good test coverage, we use the nose coverage plugin. We enable this in the .travis.yml file and set a minimum coverage threshold of 85%. If our test coverage falls below this, the Travis run will fail.

Command line interface development and deployment.

A very common task in Bioinformatics is to develop a standalone program with a command line interface which performs some well defined task. Python provides some excellent tools to make developing and deploying CLIs very simple. To write our CLI, we use the argparse module from the Python standard library. This replaces the old optparse module, which should not be used in new code.

Uploading to PyPI

Uploading our package to the Python Package index makes it available for anyone to install easily. After creating a PyPI account and setting up the credentials, we first register our package:

$ python setup.py register

Once we have registered, we can then upload versions of the package as follows:

$ python setup.py sdist
$ twine upload kingman-{VERSION}.tar.gz

We use the Twine package to securely upload the package.

Documentation

A package is only as good as its documentation, and documentation needs to look good to keep users engaged. Sphinx and Read The Docs allow us to easily write beautiful looking documentation that is automatically updated when we push new updates to GitHub.

Sphinx

The documentation is built using Sphinx. See the sphinx tutorial for a quick introduction to using Sphinx. To get started, we simply run

$ sphinx-quickstart

from within the docs directory, taking the default for the majority of the options.

We then add two new reStructuredText files to the docs directory, api.rst and cli.rst, which hold the documentation for the Python API and the command line interface, respectively. For the API documentation, we use the Sphinx autodoc extension. This allows us to import the API documentation directly from the source code.

ReadTheDocs

Read The Docs allows us to automatically publish the Sphinx generated documentation to the web. It tracks changes from GitHub, so that our documentation is always up to date. See the tutorial to get started with Read The Docs. After importing the kingman project from GitHub, we then have automatically updated documentation.

Written on June 1, 2015