How to judge open-source projects

We'd like to have an easy way of judging open-source programs. It can be done. But easily? Dream on!.

Plenty of people have put together systems to collect, judge, and evaluate open-source projects, including information about a project's popularity, reliability, and activity. But they all have flaws. 

Take that oldest of metrics: Lines of code (LoC). Yes, it's easy to measure. But it's also profoundly misleading. As programming genius Edsger Dijkstra observed in 1988, LOC gives us "the reassuring illusion that programs are just devices like any others, the only difference admitted being that their manufacture might require a new type of craftsmen, viz. programmers. From there it is only a small step to measuring 'programmer productivity' in terms of 'number of lines of code produced per month.' This is a very costly measuring unit because it encourages the writing of insipid code."

We got better since then, haven't we? Maybe not.

Take GitHub Stars, for example. GitHub, the biggest open-source Git repository, describes these as just a way to keep track of projects people find interesting. But many developers see them as a way of measuring an open-source project's popularity and reputation. In short, the more stars, the better the program. Or is it?   

Just listen to Solomon Hykes, Docker's co-founder: "GitHub stars are a scam. This bullshit metric is so pervasive, and GitHub's chokehold on the open-source community so complete, that maintainers have to distort their workflows to fit the 'GitHub model' or risk being publicly shamed by industry analysts. What a disgrace."

Hykes isn't the only one who sees them that way. Fintan Ryan, a Gartner senior director, calls them a game, which many projects play. Microsoft project manager for open-source development on Azure, Ralph Squillace, tweeted: "In my opinion and for Microsoft project eng and management they are worthless. [But] There are always people who seize on them anyway."

And that's the problem. People love easy metrics. They want a quick, one-glance answer to their coding problems. Spoiler alert: There is no such thing.

It would be great if there were such a thing. In 2005, open-source was already taking over the developer world. Red Monk analyst Stephen O'Grady observed that open-source software was taking over the enterprise in a major power shift. So can anyone really be surprised that Red Hat recently found that 95% of enterprise IT leaders thought open-source software was "strategically important?" But an easy way to work out which of the tens of thousands of projects are the vital, important one, a software Yelp if you would, doesn't exist. It may never.

Sure, we can identify the most important open-source projects. Some are tech household names such as Apache, Firefox, and Linux. 

Or can we? Others -- and this should concern you -- are vitally important but hidden as components deep inside more well-known programs. In a 2020 study, Vulnerabilities in the Core, The Linux Foundation's Core Infrastructure Initiative (CII), and the Laboratory for Innovation Science at Harvard (LISH) discovered that the most commonly used programs include such unknown (except to developers) components as; Httpcomponents-core; and Lodash. 

To find these, they used Software Composition Analysis (SCAs) tools from developer security companies Snyk and Synopsys Cybersecurity Research Center to take deep dives into tens-of-thousands open-source codebases. This is not trivial to do. But it's the only way to determine what code functions really are popular and frequently used. 

An easier way to determine an open-source program's quality is simply to look at the number and quality of its developers. Mike Volpi, a well-known venture capitalist and Index Ventures partner, said that since "software is never sold," it is adopted by the developers who appreciate the software more because they can see it and use it themselves rather than being subject to it based on executive decisions." Therefore, "open-source software permeates itself through the true experts," and . . . "the developers . . . vote with their feet."

If the programmers are leaving, the maintainers aren't getting back on patch requests, and the code is growly moldy, it's time to bid that program good-bye. Or, if it's essential to you, take it over yourself. 

You can also determine a project's health by how easy -- or not -- it makes it for others to participate in it. Ed Warnicke, a Cisco Distinguished Consulting Engineer, believes successful open-source communities lower the barriers to useful participation. He lists many barriers to participation, which are red flags. These include:

  • Is it easy to figure out where to get the source code?
  • Can experts easily contribute, and modify the code?
  • Is the process to contribute changes back upstream clear and straightforward?
  • And are all of these true even if you aren't already part of the "in crowd" for the project?
  • Is the community open to communication and participation from outsiders?
  • Is it clear and obvious how to become part of the conversation around the development of the software? Is it easy?
  • Is the project using a well-known Open Source Initiative (OSI) license, such as GPL, Apache, or  BSD? 

A project might succeed without these, but Warnicke said, "The higher the barriers to collaboration a community raises, the harder it is for it to become successful."

Another way of judging open-source projects is how many people actually use them. For example, while Apache is the webserver everyone knows, by Netcraft's web server survey's count, Nginx has been the most widely deployed web server since June 2019

A related method is to see how a given program is included in other programs, toolsets, or operating systems If a project is included in a major Linux distribution, for instance, it's clearly a successful program. 

That's not always clear. To look at one case, a continuing problem with the Linux desktop is how to make it easier for independent software vendors (ISVs) to deliver programs to users. Until recently, you had to hand-craft Linux desktop programs to each specific distro and its various releases. Most ISVs don't bother, which is why as NextCloud founder Frank Karlitschek said there are only four or 500 significant Linux desktop apps compared to tens of thousands on macOS and Windows.

The solution, Linux desktop developers agree, is to replace traditional ways of delivering Linux desktop apps, such as DEB and RPM package management systems, with containerized package systems Flatpak and Snap. Both have their supporters: Red Hat and its allies with Flatpak, and Canonical and friends supporting Snap. 

So, which one should you invest in for your applications? We don't know. If you're not invested in this containerized app approach, which also has deployment roles to play with servers and IoT, you can only wait, watch closely, and see. If you're in the middle of it, you must pick a side and hope that you've picked the right one for your program. 

Sometimes, you can see which way the technology wind is blowing. That's by looking at which technology other tech companies invest in. Three years ago, we all knew containers were the future. What we didn't know was how we'd manage them. Apache Mesos, Docker swarm mode, Marathon, and Kubernetes were all, briefly, contenders.

I say "briefly" because as everyone knows Kubernetes is overwhelming the most popular container orchestration program. Even its former rivals, D2iQ, formerly Mesosphere, and Docker now offer Kubernetes to their customers. You could see it coming, though. Users, companies, and developers flocked to Kubernetes. 

If in your specific field you see people moving to a particular toolkit, QA program, or what have you, you must consider it closely. True, sometimes the masses back the wrong horse, but you need to be aware of what's hot and what's not in your field.

Sure, it would be lovely if there were some easy way of working this all out. But there's not. Only by staying on top of the open-source projects, which matter to you, and watching how they're seen and used, can you ever really know which ones will work well for you.

It's hard work, but it's worth it. 

Related Stories: