The top 1,000 open-source libraries

The Linux Foundation and Harvard's Lab for Innovation Science list the most important open-source application libraries.
Written by Steven Vaughan-Nichols, Senior Contributing Editor

When you think of important open-source projects you almost certainly recall Linux, the Apache Web Server, LibreOffice, and so on. And, that's true. These are vital, but beneath these are the critical software libraries that empower hundreds of thousands of other programs. These are far less well known. That's why the Harvard Laboratory for Innovation Science (LISH) and the Linux Foundation's Open Source Security Foundation (OpenSSF), recently put together a comprehensive survey, Census II of Free and Open Source Software - Application Libraries, of these under-the-hood critical programs.

This is the second such study. The first, 2020's "Vulnerabilities in the Core,' a preliminary report and Census II of open-source software, focused on the lower level critical operating system libraries and utilities. This new report aggregates data from over half a million observations of free and open-source (FOSS) libraries used in production applications at thousands of companies.

The data for this report came from the Software Composition Analysis (SCA) scans of codebases of thousands of companies. This data was provided by Snyk, the Synopsys Cybersecurity Research Center (CyRC), and FOSSA.

The purpose of this, besides simply wanting to know what were indeed the most popular, open-source application libraries, packages, and components, is to help secure these projects. Until you know that's important, you can't know what you need to secure first. 

For example, the heretofore relatively unknown log4j logging package became a massive security problem when the Log4Shell zero-day was revealed. Jen Easterly, the director of the United States Department of Homeland Security (DHS) Cybersecurity and Infrastructure Security Agency (CISA) called it  "the most serious vulnerability I've seen in my decades-long career." This bug affected tens or hundreds of millions of devices and programs

Kevin Wang, FOSSA's Founder and CEO observed, The ubiquitous nature of OSS means that severe vulnerabilities — such as Log4Shell — can have a devastating and widespread impact. Mounting a comprehensive defense against supply chain threats starts with establishing strong visibility into software." Only by understanding our "open source dependencies can we improve transparency and trust in the software supply chain."

Mike Dolan, the Linux Foundation's senior vice president of Projects, added, "Understanding what FOSS packages are the most critical to society allows us to proactively support projects that warrant operations and security support. Open-source software is the foundation upon which our day-to-day lives run, from our banking institutions to our schools and workplaces. " 

This census breaks down the 500 most used FOSS packages in eight different areas. These include different slices of the data including versioned/version-agnostic, npm/non-npm package manager, and direct/direct and indirect package calls. For example, the top 10 version-agnostic npm JavaScript packages that are called directly are:

  1. lodash

  2. react

  3. axios

  4. debug

  5. @babel/core

  6. express

  7. semver

  8. uuid

  9. react-dom

  10. jquery

These, and the other top libraries, need to be closely watched for any security issues. 

Besides simply listing them, the survey's authors, from Harvard University, made five overall findings:

1) There's a need for a standardized naming schema for software components. As it is, the names aren't random, but there's not a lot of rhyme or reason to them either. 

2) We need to clean up the complexities of package versioning. Can you tell at a glance what version a package is? You can if you work on that program, but if you just use it as a brick in your higher-level software, it can be a mystery. 

3) Much of the most widely used FOSS is developed by only a handful of contributors. Everyone knows the XKCD cartoon of a giant software stack that all depends on a single developer in Nebraska. The sad and funny thing about this is that it's not a joke. We still depend on code that relies on a sole programmer.  

4) Improving individual developer account security is becoming critical. With hacking attacks on developers becoming more common, we must protect their accounts like the crown jewels of development they are.

5) Legacy software in the open-source space needs to be cleaned up. Usually, we think of legacy software in terms of that one guy we all know who's still running Windows XP. But, old, crufty code lives on in open-source repositories as well.  

That said, while this survey is useful, the work is far from done. More and continuing work needs to be done. All the participants in this report are planning on working on another study. This is only a precursor to more exhaustive studies to come to better understand these critical pillars of our information infrastructure

Related Stories:

Editorial standards