Tapptic - Highlights of the Git Merge Conference

What did you miss ?

On February the 2nd and 3rd, The Git Merge Conference 2017 was held at the Egg in Brussels. The first day was unfortunately sold out ridiculously fast and this is on the second day that Tapptic could step in to check it out and come back to you with this summary. Cédric Goffoy and Sofiane Hassaini were present and will try to make you relive it as if you were there. A dozen speakers were present with various topics. Not all of them were relevant for us so to spare you a 3000 words, we’ll focus on what we thought could be interesting for us in the future.

Top Ten Worst Repositories to host on GitHub

Carlos Martin Nieto from GitHub was there to present us 10 of the worst repositories they’ve had to handle and how their tools and approaches evolved to match the most unexpected use-cases requirements. It’s interesting to notice that in most cases, the issue was educational and once taught proper ways of working, the issues disappeared.

In a few words, many of the issues were about data quantity and the way their clients tried to fetch it massively and regularly instead of splitting the work, organising who can access what and set up a clean framework.

Even if Tapptic projects are currently nowhere near that critically high file size/amount, we can avoid this problem in the future by starting now / continuing to apply considered good practices : smaller nested directories, pull requests, efficient fetches, load spreading.

The following speaker was directly linked to these load issues. Saeed Noursalehi from Microsoft came to talk about the out of this world size of their repos (270Gb for Windows for example).

Now let’s also take into consideration the amount of people requiring to access and modify these repos, when you reach 35000 users, what can possibly go wrong ? The answer is simple : you want to clone ? come back in 12 hours. Git status ? get a coffee, you’ve got 8 minutes to kill. You want to check out ? Congratulations, you get your morning off.

The solution for Microsoft was to develop its own solution on top of Git but in a seamless way so Git users would still feel at home. This solution is GVFS, a virtualization file system that will cut into the load and is still currently in development though you can already check it out. GVFS downloads only the files which are needed at a specific moment. Thereby, some command will be faster such as a git checkout or a git status. It is announced as open source and you can find blogs articles about it from people more qualified than us to go deeper on the subject.

Link to the GVFS GitHub repository : https://github.com/Microsoft/GVFS

What’s wrong with Git ?

Santiago Perez De Rosso from the MIT came next to educate us on the way they have tried to fix recurring architectural issues with Git. After thorough research, 3 majors issues were identified as encountered by most (novice) users without a clear and clean process to handle them.

Switch branch on a non-clean state.
Fix detached HEAD
Handle untracked files

It would be quite long to tackle each point because that’s a two hour lecture on its own and Mr Perez De Rosso already did a splendid job about it : http://people.csail.mit.edu/sperezde/onward13.pdf

This pdf illustrates the philosophy behind the tool they developed to overcome the Git misfit concepts. This tool is Gitless and as you guessed well, you can read more about it and even download and try it out. Gitless could be a way for novice to learn Git easier or for daily use. Remember that Gitless is built on top of Git, so you can always fall back to Git. It is seamlessly possible to use it even if other collaborators use classic Git.

You can find here the link to the Gitless documentation : http://www.gitless.com

Scaling Mercurial at Facebook : Insights from the Other Side

Durham Goode from Facebook saw his company deal with scaling issues of unmatched proportions. Though he could not give actual numbers, he’s confident the Facebook monorepo is the biggest out there and it comes with its wide range of challenges.

Facebook uses Mercurial but the point still stands when it comes to scaling projects. When the repo size and the amount of users increase exponentially, if you have not planned ahead, you’re up for your share of trouble.

The Facebook approach is the following :

Monorepos ;
No feature branch ;
Rebase, no merge ;
Single commit per push ;
Code review on each commit.

No more complex branching, everytime a user wants to push a commit to Master, he enters a secure process of reviewing (human reviewing and software reviewing) ensuring that when the push finally is accepted and goes through, everything will be ok and every action taken against the commit is logged.

Better than that, some checks are triggered at the commit stage, allowing the user to avoid the most generic mistakes he can be confronted to when committing his work, increasing the value of the time spent for human reviewing further down the line.

If you want a more technical and precise approach , you can find here the associated blog written by Durham Goode himself : Scaling Mercurial at Facebook.

Git Aliases of the Gods!

We didn’t see the time pass by and as we came back from the last break of the day, a lighter kind of lecture was about to start : Git and the aliases.

Even though many git commands are one liners, you may be confronted to more complex commands and Tim Pettersen from Atlassian is confident that using Aliases is the best way to deal with them. The simple example he gave was the stash command. If you didn’t know, there are 4 of them, including more or less type of files (only index, classic stash, stash including untracked and stash all).

Instead of writing “git stash –including-untracked” for example, he aliased the commands to reflect the amount of data he wants to include from the least to the most : git stsh, git stash, git staash and git staaash. That’s a silly example but you get the philosophy behind.

Add to that that you can share your aliases through a config file and you get a portable system for your team to use.

Git for Pair Programming

The last meaningful lecture of the day was about the pair programming and how Git currently fails to handle this work style. For those who may not know what’s pair programming, imagine 2 developers, one computer (nothing weird I promise). You can have 2 keyboards and 2 mice or just share one.

The philosophy here is to have a driver and a copilot and relies mostly on communication. Developer A handles the keyboard and Developer B reviews code in real time, commenting and offering alternative solutions as they come.

After a while, the two developers switch roles and so on for the day. There are immense advantages in doing that :

Real time reviewing, reducing bug apparition and bug tracking, thus QA workload ;
Knowledge sharing, allowing a junior profile to learn by watching a more experienced developer in action or even two experienced developers sharing their own different knowledge and experiences ;
Team building, by communicating a lot, allows the team members to bond ;
Time saving, that seems counter-intuitive due to having 2 people seemingly doing the work of one but practically, it’s been proved that by reducing the bug apparition and the trial and error code, the job is done quicker and cleaner.

To promote that work style, Cornelius Schumacher was there to let us sneak a peak at a feature he’s developing, allowing to co-sign commits and easily identify who was the driver and the copilot for a specific commit, it is still a WIP and if you’re interested, you can embark with him and collaborate to the project.

This tool is called Git Duet and let you specify a team inside git config file. After that for each commit you can select authors whose worked on this one. You can find here the link to the GitHub repo : https://github.com/git-duet/git-duet

Keep in mind that this is WIP, so feel free to give feedbacks, pull requests, etc.

Sofiane Hassaini

Cédric Goffoy

Android developers @ Tapptic