Tuesday, October 26, 2010

Understanding Mercurial Subrepositories

UPDATE: Subrepositories proved to be a bit unwieldy. They worked, but it wasn't worth the hassle. What finally made all our pieces fall into place was nuget. We keep the repositories separate from each other and publish any shared binaries to our own internal nuget server. We then use nuget to pull in dependencies. It is working quite well. (We have also switched to git, but that doesn't matter too much for this post.)

My client has started slowly moving towards using Mercurial for source control. I've only used Hg for small personal projects in the past so it will be interesting to see how things work.

One hurdle that we hit pretty early was how to best structure our repository. The general recommendation is to avoid one large monolithic repository and break your source into multiple subrepositories. The suggestion is to use one subrepository per "project", but what exactly is a project?

Our approach is to divide up and organize our source code into medium size groups. Each of the groups can contains several visual studio projects -- some groups will only output one or two assemblies when built, some groups could be as large as 20 or so assemblies. Each group is its own repository. These repositories can be organized into larger main repositories.

For example, we have some shared assemblies that contain simple POCO objects that represent message classes used in different layers of our application. The visual studio projects for these message classes are in one repository. This repository is cloned as a subrepo to other repositories where needed, for example in the windows client repository, or in the silverlight client.

The whole concept of subrepositories did not quite make sense to me at first, but I think I am beginning to understand it. You have to think of each subrepository as a completely isolated versioning concept.

For example, say you have a directory/repository structure somewhat like this:
\client\src\
\client\shared\
\client\shared\Messages\

If you make a change to \client\shared\Messages\test.txt

There are two steps that occur:
1. You commit the change to the Messaging subrepository.
2. You commit that the parent repository should point at a new version of the subrepository.

(This works automatically when you commit from the command line. For some reason, I can't get it to work when using TortoiseHg.)

Step 2 was the one that was confusing to me at first. The parent repository has a list of all subrepositories and what version of that subrepo is being used. This is in a file called '.hgsubstate'. The hg tooling updates this file for you and this is how the parent repository points to a specific version of the child repository. So when you commit the parent repository in step 2, this is what is being committed. It is a little hidden from you by the tooling.

Please keep in mind I am not an expert in Mercurial, this is really the first time I am using it in a real scenario. I am still learning and if I have time I will post as we go forward with this change.