In order to properly manage a Drupal project you have to master the art of managing projects within projects. A Drupal site is really made up of numerous components such as modules, features and themes that in turn are their own projects.
The challenge becomes determining the best way to version control the code. The most straightforward way to manage a project is to make the entire site a single monolithic repo. However, this becomes tricky when you want to commit changes in modules back to their own repos, particularly if you want to maintain the commit history for that specific module.
Git is a fantastic version control system. However, it does not handle the project within a project scenario well. Git does have a submodule function that is designed to help solve this, however, the submodule approach often causes more problems than it solves.
In particular, the problem is that a submodule does not automatically become a part of the main repo. You have to remember to merge each submodule in, which creates a lot of room for errors and often leads to head-scratching discrepancies between the code on your local and target servers.
Luckily, there is a new(ish) git feature called git-subtree that solves this problem in a more elegant way. Subtree enables modules to be a part of both the parent repo and their own repo at the same time. A subtree module’s code is automatically merged in allowing you to manage the main repo, a simple homogenous project, yet you can commit/push/pull specific modules to their own repo. You can even keep distinct history so the module maintains a log of all the changes that directly pertain to it.
GIT subtree setup
There are several different ways to setup a subtree. In this example I will be taking the more robust route. While it is more steps, it will prove more convincing down the road.
To start, I am assuming we have a Drupal project setup in Git that contains Drupal core. We want to add a module to that project that we are also maintaining as a separate project. For this example I will use the Intelligence module which I have been doing a lot of work on recently.
Step 1: Remote setup (optional)
Using Git bash from the root of my Drupal project, the first step is to setup a git remote for the module using the command:
git remote add –f [remote name] [repository url]
git remote add -f intel firstname.lastname@example.org:project/intel.git
git remote add -f intel http://git.drupal.org/project/intel.git
Use the first example format if you are a maintainer and want to push back to the repo. Use the second version for anonymous checkouts.
Setting up a remote is an optional setup that essentially creates an alias shortcut so that whenever we want to push or pull in the future we can just use the remote label “intel” instead of the full remote url.
Note the –f option. This causes a fetch to executed when the remote is setup so that in the next step the local repo knows the available branches.
Step 2: Add the module using subtree
To add the module so that is can be managed as a subtree use the command:
git subtree add --prefix=[path to module] [remote] [branch] --squash
git subtree add --prefix=sites/all/modules/intel intel/7.x-1.x –-squash
This command is where the magic is. It adds the module where specified in the prefix value and in a way that it transparently acts the same as any other code in the main project repo. However, you can also push and pull the code in that module back to that module’s project using:
git subtree [push|pull] --prefix=[path to module] [remote] [branch] –-squash
git subtree push –prefix=sites/all/modules/intel intel/7.x-1.x --squash
Note the use of the –squash flag. This is added so the commit history of the module is not merged back into the main project’s commit history. This is generally what you want for multi purpose modules, however you may not want this option if your module is closely coupled with your main site, e.g. it is a module implementing custom features for that specific site.
Step 3: Splitting commits (optional)
The last step enables the module to maintain a commit history specific to only changes in its code. While the command is called split, it is really more like a filter. It creates a synthetic history for the module that includes only commits that contain changes to the module’s code.
The command for to enable this functionality is:
git subtree split --prefix=[path to module] --annotate="(annotation prefix) " --branch [branch name]
git subtree split --prefix=sites/all/modules/intel --annotate="(split) " --branch intel
This command creates a “hanging” branch in your main project where a separate synthetic history can be maintained. Now when you do a git subtree push to your module, any commit made to the main repo that include changes to the module will be added to the module repo.
So far the subtree feature seems to be a significant step in the right direction versus submodules. There are a few pain points. One of the primary problems I kept running into is sometimes it was tricky to setup the split so the history is initialized properly.
Another issue is you have to be in the root of the main git repo to push/pull the subtree modules, and it gets tedious always typing out the long paths to Drupal’s modules.
The original author of the subtree feature was working on two very handy features that seems to be abandoned. A push/pull all command which would be a big time saver and a .gitsubtrees file that was a little similar to a Drush make file providing a list of all the modules and pinned version managed as subtrees. Hopefully these features will get worked out as they would be very helpful.
One last note: There is not a ton of documentation about the subtree command. What makes things even more complex is that there is a somewhat similar concept call the Git subtree merge strategy. While similar in purpose, these techniques are different and it can make Googling for help a bit tricky.
*Cool tree image by sniffette.