On August 25th, 1991, Finnish grad student Linus Torvalds told the Minix Usenet newsgroup that he was starting work on his new free operating system which would be "just a hobby, won't be big and professional like gnu for 386(486) AT clones." 29-years later, Linux rules the computing world. In the 2020 Linux Kernel History Report, The Linux Foundation tells the story of the kernel from its first days to August 2020's Linux kernel 5.8 release.
While the Foundation has issued several Linux kernel history reports before, this one is unique. That's because, thanks to the work of Dr. Daniel German and his cregit tool, it's now possible to track all three of the kernels' different development stages: Pre-version control, September 1991 until February 4, 2002; BitKeeper, February 4, 2002 to April 15, 2005; and git, April 16, 2005 to today. Cregit enables developers and researchers to track who's responsible for significant source code changes.
If you're new to Linux, you may not know that version control was a hot-button issue in the 2000s. For over a decade, Linux had no version control system (VCS) at all. You'd post your patch to the mailing list, and if Torvalds accepted it he'd apply it to his own source tree and then post a new release of the whole tree.
There were VCSs available, such as Concurrent Versions System (CVS) and Subversion, but Torvalds didn't like any of them. Thanks to community pressure, however, Torvalds finally picked one: BitKeeper.
This was, to put it mildly, not a popular choice. BitKeeper was a closed-source commercial system. Torvalds argued that free software was all well and good, but what he needed was the best possible VCS and BitKeeper was it.
For years, heated discussions continued. Eventually, lead Samba developer Andrew Tridgell reverse-engineered BitKeeper networking protocols to create an open-source BitKeeper compatible VCS. BitKeeper's creator Larry McVoy had said he'd not let Linux developers use his program if someone did this, and he did just that.
That left Linux without a VCS. In response, Torvalds made his own: Git. He didn't want to. He would say later, "I really never wanted to do source control management at all and felt that it was just about the least interesting thing in the computing world."
Boring or not, after ten days of work, Torvalds finished git. Today, git is easily open-source's most popular VCS and the foundation of such programming sites as GitHub and GitLab.
Getting back to the code, in the beginning, linux-0.01.tar.Z kernel, the operating system, which would become known as Linux, was only 88 files and 10,239 lines of code and ran on a single hardware architecture, i386. There have been some changes since then. Today, the v5.8 kernel consists of 69,325 files and 28,442,673 lines of code and it runs on over 30 major hardware architectures. Torvalds himself said "5.8 looks to be one of our biggest releases of all time."
Some of that code from that first day lives on in today's Linux. The vsprintf routine, which writes output to stdout, is still in the code. This piece, Torvalds said, "was co-written with Lars Wirzenius." This made Wirzenius, who was Torvalds's friend at university, the first Linux collaborating developer.
The maintainers' file, which lists the programmers primarily developing and maintaining the kernel, wouldn't make its first appearance until January 1996's v1.3.68 release. Then, there were only three maintainers: Alan Cox, Jon Naylor, and Linus Torvalds. Fast forward to 2020 and with 5.8, there are 1,501 maintainers.
Of course, to really dig deep into the Linux kernel's history you need to look at the early Linux development mailing lists. Linux has always been discussed and designed on mailing lists. "Unfortunately," the report notes, "only partial records of the discussions are publicly available before 1997, as Linux development took place across multiple mailing lists and USENET groups."
Even the Linux Kernel Mailing List (LKML) which was hosted on vger.rutgers.edu in its early days was only one of several important lists where work happened in the 90s. USENET groups were also important though the 90s. One archive source, Indiana University's Linux kernel archives, contains LKML archives going back to 1995, but it has key gaps. If you have access to the early Linux discussion threads, The Linux Foundation would love for you to contribute these to the Linux Kernel Archives administrators. They can add these missing discussion threads to the kernel archives.
One thing is certainly clear over the kernel's history: Change keeps coming faster and faster. From 2005 to 2008, there was an average of 2 commits per hour. By 2019, it was 9.4 commits per hour. With the latest 5.8 kernel, the average was 10.7 commits per hour. Can you say fast? I knew you could.
Despite that fast pace, the Linux release cycle has become quite predictable. Each release cycle starts with a two week "merge window." This is when new functionality is added to the git repository for the next release. Once this release is tagged rc1, the integration testing, debugging, and stabilizing cycle begins. Multiple rc candidates are tagged until Linus and his lead maintainers think it's good and stable enough for a release. After that, the cycle begins again with the next merge window.
The rumor persists to this day that Linux, and other open-source software, is written by amateurs hanging on in their mom's basement. While that may be true for a handful of programmers, most Linux kernel developers work for major IT companies. And it's been that way since at least 2008. In that year, 74.2% of the Linux kernel was written by programmers drawing a paycheck.
True, there are a significant number of volunteer Linux developers. In the last 12 years, the single largest group of developers, 11.95%, are writing for the love of the program. That said, over half, 52%, of the Linux kernel is written by companies and consultants.
From top to bottom, the top ten are: Intel, Red Hat, IBM, SUSE, Linaro, Google, Samsung, AMD, Renesas, Texas Instruments, and Oracle.
In part because companies, especially Linux distributors, such as Canonical, Red Hat, and SUSE, are vital to Linux's health, Linux started producing longterm release kernels in 2006.
These start as stable kernels, and the developers commit to maintaining them for a long time. Then, when bugs are found in the stable kernel, they're fixed upstream and backported to the long term release kernels. At the moment, these are the longterm kernel releases:
Some Linux users want to see even longer support windows. The Civil Infrastructure Platform (CIP), which works on open-source for fundamental infrastructure such as electric power generation and transport; oil and gas distribution; and water and wastewater management, are supporting 4.4 and 4.19 as Super Long-term Stable (SLTS) kernel release. These will be supported for ten or more years.
This won't be as hard as it might have been earlier because the kernel developers have been working much harder on automated debugging tools in the last few years. Static analysis tools such as sparse, smatch, and coccicheck are run on Linux kernel trees by autotest bots 0-day and Huawei's Hulk Robot daily. Fuzz testers such as Trinity and syzkaller have also become more popular. The end result is Linux's code is cleaner than ever.
"The focus of the kernel community," The Linux Foundation states, is "to maintain a common goal of having a high quality operating system with no regressions, willingness to create new processes and tools as needed to help them be more efficient." Torvalds, and his thousands of co-workers, backed by today's biggest tech companies, have been successful in reaching that goal. Not bad for "just a hobby operating system."