Sharing large repositories with your team

Hey, there! Here at Plataformatec we like to do project rotations. It means that every three months or so, developers can swap projects. It has lots of benefits like working with different people, getting out of the comfort zone, sharing skills and knowledge, and the best one: a new developer can spot problems that people working for a longer time in the project may not see, since they are used to it.

But, it comes at a cost. Each project has its own setup process and may slow down the development. We’re using Boxen from GitHub to solve this problem. It works very well and allows us to have a project environment quickly set up.

But recently we have run into a problem that Boxen couldn’t solve. We had a project which has multiple repositories and some of them are too large. It would take some time just to git clone their > 3GB size repos.

Our first thought was creating a tar file with gzip or lzma compression. The problem with it would be when extracting, since file ownership and permissions on it could be a problem just like symlinks. So, the solution was to git clone the smallest repos and git bundle the larger ones. Git bundle is shipped with git, but only a few people know about.

The workflow we have is simple. Someone with the repo already cloned and updated to the origin, type the following command:

$ git bundle create .bundle master

It will create a file called .bundle with a compressed version of the repository containing the master history. So, this file can be shared via a flash drive, AirDrop or netcat (using the internal network) among developers, and it will take way less time than cloning it.

Now that you have the file in hands, it requires two steps to work properly. The first one is extracting the bundle into a cloned repository. This can be achieved by:

$ git clone .bundle -b master

In case you want to clone into a different path other than in the current working directory, you can pass the path after the -b master option. As you can see, the repo is cloned from a bundle file just like it could be an URL or some bare git repo in your machine. That said, just take a look at your origin remote.

$ git remote show origin

* remote origin
  Fetch URL: /path/to/.bundle
  Push  URL: /path/to/.bundle
…

It is pointing to the bundle filename, so every time you fetch or push, it will try to do so in this bundle file. To fix that, we go to the second step which is setting the proper remote URL.

$ git remote set-url origin 

That’s it. You’re ready to go. If you have multiple repositories to share, you can create a script to automate the cloning and the url setting for the origin. You can share all the repos with this script, for faster and easier setups.

Example

The Ruby language repository size is about 200MB. It is not big enough to require a bundle, but just as an example I guess it would be a nice fit.

The first step is cloning the repo:

$ time git clone https://github.com/ruby/ruby.git
Cloning into 'ruby'...
remote: Finding bitmap roots...
remote: Reusing existing pack: 269821, done.
remote: Counting objects: 2813, done.
remote: Compressing objects: 100% (1403/1403), done.
remote: Total 272634 (delta 1707), reused 2213 (delta 1390)
Receiving objects: 100% (272634/272634), 136.77 MiB | 1.20 MiB/s, done.
Resolving deltas: 100% (210263/210263), done.
Checking connectivity... done
Checking out files: 100% (4187/4187), done.

real	4m51.501s
user	1m38.837s
sys	0m18.808s

As you can see, it takes almost five minutes to clone the full repository – the time may vary depending on your bandwidth. So, now we’re gonna create a bundle file and then clone a new repo from it:

$ cd ruby/
# Just a reminder, the main branch of ruby repo is not master, it's trunk.
$ git bundle create /tmp/ruby.bundle trunk

Now that we’ve created a bundle file and placed it in /tmp, we just need to clone it:

$ cd /tmp/
$ time git clone /tmp/ruby.bundle -b trunk ruby
Cloning into 'ruby'...
Receiving objects: 100% (206590/206590), 84.16 MiB | 22.43 MiB/s, done.
Resolving deltas: 100% (158583/158583), done.
Checking connectivity... done
Checking out files: 100% (4187/4187), done.

real	0m46.490s
user	1m9.339s
sys	0m10.021s

Cloning from a bundle file was much faster and has not taken a minute. Now, in order to pull and fetch changes, you need to set the remote URL:

$ git remote set-url origin https://github.com/ruby/ruby.git

Enjoyed this post? Was it as useful for you as for us? Tell us your stories on the comments below! See you!

4 responses to “Sharing large repositories with your team”

  1. Petteri Räty says:

    For comparison how long does a local git clone take? If it’s faster then the bundle packing / unpacking then you could just clone locally from another developer machine…

  2. Gustavo Dutra says:

    Hi, Petteri!

    It is faster cloning from a developer machine over intranet then over internet. Although, as git bundle does not require internet connection, cloning from a bundle package is faster then cloning from another developer repository. Also, it does not require any developer to be around when setting up a new project environment.

  3. Petteri Räty says:

    You still need to get the bundle copied from somewhere so you can’t compare just the clone with a clone over the internet. If you keep a bundle around, (to avoid needing an other developer) in for example local network somewhere, you might as well have a clone there.

  4. Gustavo Dutra says:

    Actually, Patteri, we have a flash drive available for that. So we can clone directly from the flash drive to someone’s computer (there is a script that do all the dirty work inside this flash drive).