If you are anything like me, then you probably have a bunch of web project’s git repositories in a folder on your computer, each with their own copy of bootstrap on them setup as a git submodule. That is all good and well, but you might not know that all of those bootstrap copies (as of writing) take up 104.8 MiB each! Which if you manage lots of sites like I do then it easily adds up to a few GiB. This article will go through a method of limiting the space that these copies take up as well as save you some time when you setup new projects.
The depth method and why it’s bad
Before we go into the details of this method I would like to outline another method I’ve seen talked about around the net. This is the depth method.
This method uses the –depth attribute when checking out a submodule (using clone or update) which makes git fetche only the number of revisions defined by the –depth attribute, creating a shallow clone of the repository.[code lang=”sh”] git submodule add https://github.com/twbs/bootstrap.git bootstrap
git submodule init
git submodule update –depth 3 bootstrap
The above example only fetches the 3 latest revisions from bootstrap.
This might seem like a great way to fix the issue when you first start your project, but if where to apply the same solution to an older project it will most likely all fall apart. This is because if your project uses an older version of bootstrap, then its submodule will be referencing a revision which is older than the latest 3 revisions (or however many you chose to use) and updating the submodule will result in a nasty error:[code lang=”sh”] $ git submodule update –depth 3 bootstrap
Cloning into ‘/home/x/repo/bootstrap’…
error: no such remote ref 0b9c4a4007c44201dce9a6cc1a38407005c26c86
Fetched in submodule path ‘bootstrap’, but it did not contain 0b9c4a4007c44201dce9a6cc1a38407005c26c86. Direct fetching of that commit failed.
This means that the depth method will only work if you intend on always keeping your submodules on the bleeding-edge, which in my case was not maintainable.
Enter the reference method
Since the depth method did not work out for me, I had to find an alternative. And what I found was that you can make a submodule use a reference repository for it’s revisions. What this means is that you only have the submodule’s referenced revision checked out in your main repository. All other revisions are stored externally, which makes it possible for multiple repositories to use the same reference repository for their submodules, thus saving a fair bit of space. In my case the bootstrap submodule with the reference took up 11.6 MiB instead of 104.8 MiB, which is almost 90% smaller.
Another benefit of using a reference repository is that you don’t have to download the whole repository each time you start a new project, which would usually not be much of an issue due to my work’s fancy internet connection, but sometimes connection speeds do drop and living in Australia makes this a far too common issue.
To setup bootstrap as a referenced submodule you first need to clone bootstrap into a common location outside of your project.[code lang=”sh”] mkdir reference-repos
git clone https://github.com/twbs/bootstrap.git bootstrap
Then if you are setting up the reference in a new project you add bootstrap as a submodule as usual[code lang=”sh”] git submodule add https://github.com/twbs/bootstrap.git bootstrap
git submodule init
Now here comes the magic. Update the bootstrap module on it’s own with the –reference option set to the reference repositorie’s path[code lang=”sh”] git submodule update –reference ../reference-repos/bootstrap bootstrap
And that’s it. The submodule is now using the reference repository as the source for it’s revisions.
Now you can update any other submodules as normal.[code lang=”sh”] git submodule update
Changing existing submodules to use reference method
It is more likely that you already have a bunch of repositories with bootstrap setup as a submodule the normal way. So you need to know how convert them over to use the reference method.
First you need to deinit the bootstrap submodule, which clears the bootstrap directory[code lang=”sh”] git submodule deinit bootstrap/
Then you need to remove any cached repository data from the submodule out of your current repository. Each submodule you have in your repository has it’s revision history and other repository data stored in the .git/modules directory. So you need to remove the bootstrap one from there, otherwise the submodule update command will just use that data again instead of there reference.[code lang=”sh”] rm -rf .git/modules/bootstrap/
And now you can just init and update the submodules as described in the previous section, remembering to update the bootstrap one separately with the –reference option.[code lang=”sh”] git submodule init
git submodule update –reference ../reference-repos/bootstrap/ bootstrap/
git submodule update
And there you have it. Your existing bootstrap submodule is now referenced as well.
Downsides with the method
When researching this, I happened upon an article which referred to this method as harmful. The reason being that the reference is made using an absolute path in a file in the submodule’s configuration directory (.git/modules). This means that if you ever do a backup of your repositories to a place where the reference repository is not present or if you ever remove the reference repository, then you will not be able to use the submodule.
To fix this, you would need to deinit the submodule and then setup the reference again or just set the submodule up like normal.
For me this is not a huge issue, because I keep all my repositories in one directory including the reference repositories and I always back up that entire directory. Yes, the place I back it up to will most likely not have the same directory structure, but if I ever need to restore stuff then it should all work. Also I am fully aware that my bootstrap submodules are setup using a reference will never remove the reference repository, at least not on purpose anyway.