One of the things hampering the whole open source movement today is our failure to count and report our successes. As I put it in something I wrote for Linuxinsider almost two years ago:
In management you get what you measure; in volume sales, you get what the press reports. In the case of Linux installs, what the press reports is license sales numbers --and that's a terrible mistake hurting everyone involved with open source products like Linux.
It hasn't gotten any better since and it's probably well past the time to do something about it. I imagine that lots of people have ideas about what that should be -- but here's mine.
Installation scripts, whether for distributions, applications, or anything else that qualifies for the open source label, should have a self-reporting option that's enabled by default. On installation, whether that's an interactive process or not, this should send a minimal installation report to a central repository. That repository should then be publicly queryable.
It's not obvious what information should be collected. My own view is that less is better, because transmission of the data has to be voluntary and the more data you ask for, the more people will turn it off.
For example (although I didn't know about this until Joe Brockmeier mentioned it in an email) Debian's popcon project does some of this but extends far beyond merely counting installs to provide weekly reports on applications usage. That's great stuff, and clearly valuable, but comes at the cost of turning a lot of people off because of the apparent intrusion into normal operations. From the popcon project's perspective, that's not a decisive issue; from ours it is because high participation rates are key to achieving our purpose: driving open source acceptance by showcasing its success.
The Pine people, I think, get closer to the ideal I have in mind: theirs is a one-shot installation report that you can turn off, but most people leave on. As a result their installation numbers are among the most credible in the industry.
Thus I'd suggest asking for an absolute minimum and making sure that anything we ask can be automatically collected by the installation script. We might, for example, ask for what is being installed, whether it's an upgrade, what hardware architecture it's being installed on, and perhaps the Ethernet address of the machine (or its primary NIC card if it's a PC). That last one, of course, could be sensitive, but we need something that will let us catch intentional frauds -- people sending us the install message hundreds or thousands of times without actually doing the indicated installs.
As I see it, the installer would use the email queue to send its message to the repository for automated processing -- and obviously would not expect a reply. Where that repository should be is another matter: ideally a university or large institution with highly reliable network connections should take it on -- ZDNet, for example, has those resources and one could hope that some senior people would see this as a nice way to gather news while giving back to the open source community.
Repository side code development shouldn't be a big deal -- in fact I'll volunteer, if there's enough interest among applications and distribution developers, to write, test, and initially maintain at least release 0.1 of a PostgresSQL-based application to handle that side of things.
Readers are asked to comment, therefore, on all aspects of this: whether this should be done, on how it should be done, who should host it, and on what information, including audit and control information, should be collected.