When Microsoft acquired GitHub, a lot of open-source GitHub users weren't happy. At least 100,000 of them were upset enough to move to a leading GitHub rival, GitLab. Now, GitLab is moving its code repositories from Microsoft Azure to Google Cloud Platform (GCP).
Andrew Newdigate, GitLab's Google Cloud Platform Migration Project Lead, explained GitLab was making the move to improve the service's performance and reliability.
Specifically, the company is making the move because it believes Kubernetes is the future. Kubernetes "makes reliability at massive scale possible." GCP was their natural choice because of this desire to run GitLab on Kubernetes. After all, Google invented Kubernetes, and GKE has the most robust and mature Kubernetes support.
TechRepublic: Kubernetes: The smart person's guide
Once the migration has taken place, GitLab will focus on "bumping up the stability and scalability of GitLab.com, by moving our worker fleet across to Kubernetes using GKE. This move will leverage our Cloud Native charts, which with GitLab 11.0 are now in beta."
To make this happen, GitLab will use its Geo product. Geo enables users to create full, read-only mirrors of GitLab instances. Geo instances can also be used for cloning, fetching projects, and, in this case, migrating GitLab projects.
GitLab is not making this move to distance itself from Microsoft. GitLab was already working on this before Microsoft bought GitHub.
Long before that deal was finalized, Newdigate wrote, "we have maintained a Geo secondary site of GitLab.com, called gprd.gitlab.com, running on Google Cloud Platform. This secondary keeps an up-to-date synchronized copy of about 200TB of Git data and 2TB of relational data in PostgreSQL. Originally we also replicated Git LFS, File Uploads and other files, but this has since been migrated to Google Cloud Storage object storage, in a parallel effort."
For logistical reasons, GitLab is using GCP's us-east1 site in South Carolina. Its current Azure datacenter is in US East 2, in Virginia. This is a round-trip distance of 800 km, or 3 light-milliseconds. In internet practice, this translates into a 30ms ping time between the two sites.
Because of the huge amount of data we need to synchronize between Azure and GCP, we were initially concerned about this additional latency and the risk it might have on our Geo transfer. However, after our initial testing, we realized that network latency and bandwidth were not bottlenecks in the transfer.
Simultaneously, GitLab is migrating all file artifacts to Google Cloud Storage (GCS), Google's managed object storage implementation. That's about 200TB of data.
Until recently, GitLab stored these files on NFS servers, with Network File System (NFS). As most of you know, NFS is a single-point-of-failure and can be difficult to scale. By switching to GCS, GitLab can leverage its built-in redundancy and multi-region capabilities. This in turn will help to improve GitLab availability and eliminate single-points-of-failure. This is part of a longer-term strategy of leaving NFS behind.
The Gitaly project, a GitLab Git RPC service, is part of the same initiative. This effort to migrate GitLab.com off NFS is also a prerequisite for the plans to move GitLab to Kubernetes.
According to Newdigate "Our absolute top priority for the failover is to ensure that we protect the integrity of our users' data. We will only conduct the failover once we are completely satisfied that all serious issues have been ironed out, that there is no risk of data loss, and that our new environment on Google Cloud Platform is ready for production workloads."
If all goes well -- and so far it has -- GitLab will be making the move on Saturday, July 28, 2018.