Cleaning up the Linux kernel's 'Dependency Hell': This developer is proposing 2,200 commit changes

Cleaning off decades of code mess isn't for the faint of heart, but leading Linux kernel developer Ingo Molnar is giving it the old college try in the open-source Linux kernel.
Written by Steven Vaughan-Nichols, Senior Contributing Editor

Last year, Linux's source code came to a whopping 27.8 million lines of code. It's only gotten bigger since then. Like any 30-year old software project, Linux has picked up its fair share of cruft over the years. Now, after months of work, senior Linux kernel developer Ingo Molnar is releasing his first stab at cleaning it up at a fundamental level with his "Fast Kernel Headers" project.

The object? No less than a comprehensive clean-up and rework of the Linux kernel's header hierarchy and header dependencies. Linux contains many header, .h, files. To be exact there are about 10,000 main .h headers in the Linux kernel with the include/ and arch/*/include/ hierarchies. As Molnar explained, "Over the last 30+ years they have grown into a complicated & painful set of cross-dependencies we are affectionately calling 'Dependency Hell'."

To bring rhyme and reason to all this, Molnar is proposing to make 2,200 commit changes to the code. That's a lot of commits! Why so many? Well, Molnar continued, it turns out there's a lot more mess in all that code than he thought there was when he started his clean-up project in late 2020. To be exact:

When I started this project, late 2020, I expected there to be maybe 50-100 patches. I did a few crude measurements that suggested that about 20% build speed improvement could be gained by reducing header dependencies, without having a substantial runtime effect on the kernel. Seemed substantial enough to justify 50-100 commits.

- But as the number of patches increased, I saw only limited performance increases. By mid-2021 I got to over 500 commits in this tree and had to throw away my second attempt (!), the first two approaches simply didn't scale, weren't maintainable and barely offered a 4% build speedup, not worth the churn of 500 patches and not worth even announcing.

- With the third attempt I introduced the per_task() machinery which brought the necessary flexibility to reduce dependencies drastically, and it was a type-clean approach that improved maintainability. But even at 1,000 commits I barely got to a 10% build speed improvement. Again this was not something I felt comfortable pushing upstream or even announcing. :-/

- But the numbers were pretty clear: 20% performance gains were very much possible. So I kept developing this tree, and most of the speedups started arriving after over 1,500 commits, in the fall of 2021. I was very surprised when it went beyond 20% speedup and more than arrived at the current 78% with my reference config. There's a clear super-linear improvement property of kernel build overhead, once the number of dependencies is reduced to the bare minimum.

So, today, his cleaned-up "fast-headers tree offers a +50-80% improvement in absolute kernel build performance on supported architectures, depending on the config. This is a major step forward in terms of Linux kernel build efficiency & performance."

A 50 to 80% improvement is well worth the time and trouble. These speed savings come from reducing the size of the default headers, which with the fast-headers tree will mostly include type definitions, by 1-2 orders of magnitude. 

But, wait, those 2,200 commits are only the tip of the iceberg. Those changes will affect almost every program in the Linux kernel. All together, Molnar estimates that "in addition to the aforementioned 25 sub-trees and 2,200 commits, the fast-headers tree modifies over half of all kernel source files in existence." It's going to change 25,288 files with 178,024 insertions and 74,720 deletions. In other words, "Yeah, so this is probably the largest single feature announcement in LKML's [Linux Kernel Mailing List] history. Not by choice! :-/"

On top of this, making these changes doable will require aggressive decoupling of high-level headers; type and API header decoupling; automated dependency addition to .h and .c files; and optimizing headers. This will not be easy. So, before pulling the trigger and starting to make these changes Molnar is gathering feedback from his fellow maintainers and, in particular, he'd loved to hear from "Linus [Torvalds] & Andrew [Morton] and the other maintainers of the biggest subsystems affected by these changes."

Greg Kroah-Hartman, the Linux kernel maintainer for the Linux stable branch, thinks "This is 'interesting,' but how are you going to keep the kernel/sched/per_task_area_struct_defs.h and struct task_struct_per_task definition in sync?" In short, who gets to bell the cat of maintaining all these changes? 

Molnar replied that he's willing to tackle this job and that he doesn't think it will be that much trouble. Kroah-Hatman then gave Molnar's efforts his blessings and remarked, "I'll leave all of this up to the scheduler developers, but it still looks odd to me. The mess we create trying to work around issues in C :)"

He's not wrong. This is one reason why there are efforts afoot to make Rust Linux's second language.

If adopted, users won't see any real changes. But Linux kernel and distro developers will be able to compile Linux faster than ever. The result will be to make it easier and quicker than ever to make improvements, patches, and add features to Linux.

Related Stories:

Editorial standards