Hey, there! Here at Plataformatec we like to do project rotations. It means that every three months or so, developers can swap projects. It has lots of benefits like working with different people, getting out of the comfort zone, sharing skills and knowledge, and the best one: a new developer can spot problems that people working for a longer time in the project may not see, since they are used to it.
But, it comes at a cost. Each project has its own setup process and may slow down the development. We’re using Boxen from GitHub to solve this problem. It works very well and allows us to have a project environment quickly set up.
But recently we have run into a problem that Boxen couldn’t solve. We had a project which has multiple repositories and some of them are too large. It would take some time just to git clone their > 3GB size repos.
Our first thought was creating a tar file with gzip or lzma compression. The problem with it would be when extracting, since file ownership and permissions on it could be a problem just like symlinks. So, the solution was to git clone the smallest repos and git bundle the larger ones. Git bundle is shipped with git, but only a few people know about.
The workflow we have is simple. Someone with the repo already cloned and updated to the origin, type the following command:
$ git bundle create .bundle master
It will create a file called
Now that you have the file in hands, it requires two steps to work properly. The first one is extracting the bundle into a cloned repository. This can be achieved by:
$ git clone .bundle -b master
In case you want to clone into a different path other than
$ git remote show origin
* remote origin
Fetch URL: /path/to/.bundle
Push URL: /path/to/.bundle
…
It is pointing to the bundle filename, so every time you fetch or push, it will try to do so in this bundle file. To fix that, we go to the second step which is setting the proper remote URL.
$ git remote set-url origin
That’s it. You’re ready to go. If you have multiple repositories to share, you can create a script to automate the cloning and the url setting for the origin. You can share all the repos with this script, for faster and easier setups.
Example
The Ruby language repository size is about 200MB. It is not big enough to require a bundle, but just as an example I guess it would be a nice fit.
The first step is cloning the repo:
$ time git clone https://github.com/ruby/ruby.git
Cloning into 'ruby'...
remote: Finding bitmap roots...
remote: Reusing existing pack: 269821, done.
remote: Counting objects: 2813, done.
remote: Compressing objects: 100% (1403/1403), done.
remote: Total 272634 (delta 1707), reused 2213 (delta 1390)
Receiving objects: 100% (272634/272634), 136.77 MiB | 1.20 MiB/s, done.
Resolving deltas: 100% (210263/210263), done.
Checking connectivity... done
Checking out files: 100% (4187/4187), done.
real 4m51.501s
user 1m38.837s
sys 0m18.808s
As you can see, it takes almost five minutes to clone the full repository – the time may vary depending on your bandwidth. So, now we’re gonna create a bundle file and then clone a new repo from it:
$ cd ruby/
# Just a reminder, the main branch of ruby repo is not master, it's trunk.
$ git bundle create /tmp/ruby.bundle trunk
Now that we’ve created a bundle file and placed it in /tmp, we just need to clone it:
$ cd /tmp/
$ time git clone /tmp/ruby.bundle -b trunk ruby
Cloning into 'ruby'...
Receiving objects: 100% (206590/206590), 84.16 MiB | 22.43 MiB/s, done.
Resolving deltas: 100% (158583/158583), done.
Checking connectivity... done
Checking out files: 100% (4187/4187), done.
real 0m46.490s
user 1m9.339s
sys 0m10.021s
Cloning from a bundle file was much faster and has not taken a minute. Now, in order to pull and fetch changes, you need to set the remote URL:
$ git remote set-url origin https://github.com/ruby/ruby.git
Enjoyed this post? Was it as useful for you as for us? Tell us your stories on the comments below! See you!
For comparison how long does a local git clone take? If it’s faster then the bundle packing / unpacking then you could just clone locally from another developer machine…
Hi, Petteri!
It is faster cloning from a developer machine over intranet then over internet. Although, as git bundle does not require internet connection, cloning from a bundle package is faster then cloning from another developer repository. Also, it does not require any developer to be around when setting up a new project environment.
You still need to get the bundle copied from somewhere so you can’t compare just the clone with a clone over the internet. If you keep a bundle around, (to avoid needing an other developer) in for example local network somewhere, you might as well have a clone there.
Actually, Patteri, we have a flash drive available for that. So we can clone directly from the flash drive to someone’s computer (there is a script that do all the dirty work inside this flash drive).