Thursday, May 6, 2010

Mercurial and Managing Third Party Code in Your Project

Overview

Managing third-party code in your source repository can be a tricky business. Many projects simply chose to not address the problem at all; you are required to download all the dependencies and build them yourself, or perform the equivalent installation of binaries using your distribution's package manager. It's typically up to you to ensure that you are using the correct versions.

What if you want to ensure you know exactly what's going into your application so you don't get broken by something that changes in the package you rely on? What if you know that you are going to have local customizations to those packages that are needed for your application, but may not be accepted back upstream?

The approach that I've seen used successfully is to track most if not all of the dependencies you have in your source tree. At this point several people are saying "What the hell? That will bloat my source tree significantly!" Yes it can and almost certainly will, but there are definite advantages:
  • When users download your application source, it's just all there; they don't have to get each and every dependency individually,
  • You can include local changes to those packages and not have to invent complicated workarounds for bugs,

The simplest way to manage this is with subrepositories, which isn't particularly well supported by most SCM tools at the moment. The usual workaround is to use a wrapper script that knows how to work with multiple repositories at the same time. The drawback to that approach is that people using your package now have to get the wrapper before they can correctly download everything. Also they have to remember to use your wrapper for certain operations, for example cloning, pushing, and pulling. Mercurial does have some support for subrepos, but it isn't a "fully baked" feature at this time.

There is a compromise approach that will work with standard Mercurial and not require any fancy scripts. It's not perfect, but it does cover most of the bases. In essence, you will keep pure copies of the third party source in separate clones of your repository and merge them back into your main repository. Local changes will be kept in the main repository. When you want to upgrade the third party code, you update the pure repository and merge it back into your main repository again. Your local changes are in their own changesets, and you will hopefully be able to merge them cleanly with the updated code. Without having those local changesets, and changesets for the updates to the pure code, it can be very difficult to track which change is which.

Again at this point, some people are likely freaking out and saying "Mercurial doesn't require a master repository, this violates one of the main DVCS principles!" Yes and no. Typically most projects do have a main repository that everyone gets their initial clones from and is considered the "golden master." Even for those that do not, this technique will still work reasonably well.

Note that the MQ extension to Mercurial will often do everything most people want when managing third-party package and local patches. This is an alternative approach if you feel MQ doesn't quite fit your needs or your way of doing things.

Before you begin

One of the bigger drawbacks to this approach is that you need to work from a "clean" repository.  If you already have a bunch of code, this isn't going to work very well.

At this point, you are probably saying, "But if I'm just starting out, how do I know what third party packages I'm going to need?"  Fair enough, but I'm betting that you are going to have at least one already in mind, and maybe more. To get us started, let's assume we are starting a new project and that we are going to use Lua for our scripting and configuration file needs.  This is a non-trivial package that is quite common in a large number of projects.

Basic How To

The basic series of tasks will look like:
  1. Create your main repository.
  2. Setup your pure "starter" repository called 'pure-stem' from the main repo.
  3. Create a pure package repository from the pure-stem repo.
  4. Import the third party code into the pure repo and tag it.
  5. Pull and merge the pure repo into your main repo.

Create your main repository

This will depend on your host provider, but lets assume you are managing everything locally. You need to add a simple file into the root of the repository and do a single commit. If you don't, the pure-stem repo you will create later could have a different initial changeset identifier, which is what Mercurial uses to identify whether repositories are related. If the repos aren't related, this technique doesn't work.

$ hg init myproject
$ cd myproject/
$ echo "Let's take over the world." > README
$ hg commit --addremove --message "Adding initial project README."
adding README

Setup your pure-stem repository

We need a repository that all of our third-party pure repos will derive from. This really needs to be done before pretty much anything else, as it needs to be completely empty of non-third-party source.

$ cd ..
$ ls
myproject
$ hg clone myproject pure-stem
updating to branch default
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
$ ls
myproject  pure-stem
$ cd pure-stem
$ mkdir -p src/packages
$ echo "All third-party source packages are located here." > src/packages/README
$ hg commit --addremove --message "Setup of pure-stem repository."
adding src/packages/README

To make life a little easier later, lets pull the contents of the pure-stem repo into our main repo:

$ cd ../myproject/
$ hg incoming ../pure-stem 
comparing with ../pure-stem
searching for changes
changeset:   1:2b2ed8ef6cba
tag:         tip
user:        Glenn McAllister 
date:        Thu May 06 15:57:39 2010 -0400
summary:     Setup of pure-stem repository.
$ hg pull ../pure-stem 
pulling from ../pure-stem
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files
(run 'hg update' to get a working copy)
$ hg update
1 files updated, 0 files merged, 0 files removed, 0 files unresolved

Create and Populate the Third Party Pure Repo

Our first third-party package to import will be Lua. For the sake of further examples later, we are going to use Lua 5.1.3 to start. Later on, we'll upgrade to Lua 5.1.4. Assuming we have lua-5.1.3.tar.gz downloaded already:

$ hg clone pure-stem pure-lua
updating to branch default
2 files updated, 0 files merged, 0 files removed, 0 files unresolved
$ cd pure-lua/src/packages
$ tar zxf /tmp/lua-5.1.3.tar.gz --transform s/lua-5.1.3/lua/
$ ls
lua  README
$ hg commit --addremove --message "Initial import of Lua 5.1.3"
... lots of files were added ...
$ hg tag lua-5.1.3

Note that we tag the repo with the Lua release information so we know which version we have at a given point in time.

Pull the Third Party Repo into the Main Repo

At this point, we want to pull the new pure-lua repo into our main repo:

$ cd ../../../myproject/
$ hg incoming ../pure-lua 
comparing with ../pure-lua
searching for changes
changeset:   2:ec222e7c1372
tag:         lua-5.1.3
user:        Glenn McAllister 
date:        Thu May 06 16:09:04 2010 -0400
summary:     Initial import of Lua 5.1.3

changeset:   3:3dc42e785d44
tag:         tip
user:        Glenn McAllister 
date:        Thu May 06 16:09:39 2010 -0400
summary:     Added tag lua-5.1.3 for changeset ec222e7c1372

$ hg pull ../pure-lua 
pulling from ../pure-lua
searching for changes
adding changesets
adding manifests
adding file changes
added 2 changesets with 104 changes to 104 files
(run 'hg update' to get a working copy)
$ hg update
104 files updated, 0 files merged, 0 files removed, 0 files unresolved

At this point, you now have the Lua source code in your main repository.

Making Local Changes


Now that we have some third party source, lets make some changes to it. This isn't going to be a 100% realistic example, it's just enough to show the principle. Let's change the Lua README file to add our own comment:

$ cd src/packages/lua/
$ echo "Local change in main repository." >> README
$ hg status
M src/packages/lua/README
$ hg commit --message "Local change to Lua README file."

And that's it. In theory, you can make as many changes as you want to the package. In practice, you won't. You will typically try to make the minimum changes necessary, if any, to get your project to work.

Upgrading the Third Party Package

Lua's current stable release is actually 5.1.4, not the version we are using. We've looked at the release notes, bug reports, etc. and decided that upgrading to the current version is a good idea. The basic steps we want to follow are:

  1. Blow away the existing files.
  2. Untar/unzip the update into the repo.
  3. Commit the changes and tag them.
  4. Merge the changes back into the main repo.

Rather than go into the same level of detail as in the above sections:

$ cd ../../../../pure-lua/
$ hg locate -0 --include src/packages/lua | xargs -0 rm
$ cd src/packages/
$ tar zxf /tmp/lua-5.1.4.tar.gz --transform s/lua-5.1.4/lua/
$ hg commit --addremove --message "Import Lua 5.1.4."
$ hg tag lua-5.1.4
$ hg tags
tip                                5:a71637075ea6
lua-5.1.4                          4:7ada377f7533
lua-5.1.3                          2:ec222e7c1372
$ cd ../../../myproject/
$ hg incoming ../pure-lua
comparing with ../pure-lua
searching for changes
changeset:   4:7ada377f7533
tag:         lua-5.1.4
user:        Glenn McAllister 
date:        Thu May 06 16:30:34 2010 -0400
summary:     Import Lua 5.1.4.

changeset:   5:a71637075ea6
tag:         tip
user:        Glenn McAllister 
date:        Thu May 06 16:30:53 2010 -0400
summary:     Added tag lua-5.1.4 for changeset 7ada377f7533

$ hg pull ../pure-lua
pulling from ../pure-lua
searching for changes
adding changesets
adding manifests
adding file changes
added 2 changesets with 15 changes to 15 files (+1 heads)
(run 'hg heads' to see heads, 'hg merge' to merge)
$ hg heads
changeset:   6:a71637075ea6
tag:         tip
user:        Glenn McAllister 
date:        Thu May 06 16:30:53 2010 -0400
summary:     Added tag lua-5.1.4 for changeset 7ada377f7533

changeset:   4:cf8dc98b5be3
user:        Glenn McAllister 
date:        Thu May 06 16:21:40 2010 -0400
summary:     Local change to Lua README file.

$ hg merge
15 files updated, 0 files merged, 0 files removed, 0 files unresolved
(branch merge, don't forget to commit)
$ hg status
M .hgtags
M src/packages/lua/Makefile
M src/packages/lua/doc/manual.html
M src/packages/lua/doc/readme.html
M src/packages/lua/etc/lua.pc
M src/packages/lua/src/lapi.c
M src/packages/lua/src/lbaselib.c
M src/packages/lua/src/ldebug.c
M src/packages/lua/src/loadlib.c
M src/packages/lua/src/lobject.h
M src/packages/lua/src/lstrlib.c
M src/packages/lua/src/ltablib.c
M src/packages/lua/src/lua.h
M src/packages/lua/src/luaconf.h
M src/packages/lua/src/lundump.c
$ hg commit --message "Pull in Lua 5.1.4"

The use of the hg locate command above is worth commenting on. Basically we use it to remove all the files that Mercurial knows about in that repo. However, we want to remove only the Lua related files, not the src/packages/README file. With that and the use of the --addremove option to the commit, and we can get Mercurial to do the work on figuring out what files have been added, removed, and changed.

Typically your merges are going to be more complicated. Also, when you get multiple third-party repositories merging the .hgtags file gets to be a pain. Just remember, when merging that file, take the lines in both files.

Summary

This technique allows you to manage third-party packages in a controlled manner in your main repository, and ensures your local changes will be managed correctly. It has drawbacks compared to other methods, but this is often a good solution from an end-user's perspective, as they don't have to do a bunch of work to get your project's dependencies.