Hadoop veteran Ted Dunning: When open source is anything but open

Some projects observe the letter of the open-source movement but neglect its spirit, according to MapR expert Ted Dunning.

MapR chief application architect Ted Dunning: Now we have open source, closed community. Image: MapR

Among today's crop of open-source projects are technologies ranging from the essential to the obscure. But one open-source veteran and leading Hadoop expert believes that despite all the effort and innovation a few projects have strayed from the ethos of open product and open community.

Ted Dunning, chief application architect at Hadoop distribution firm MapR and committer and PMC member of the Apache Mahout, ZooKeeper and Drill projects and mentor for Apache Storm, DataFu, Flink and Optiq, thinks some open source is almost a modern form of proprietary software.

"It's absolutely true. It's isn't quite as closed as closed was. But closed source wasn't as closed as closed might seem. If you go back in the day, with Digital Equipment, and even IBM, everybody had access to the source code who needed it. And you could push back changes into Hasp or VMS. There was very much a communal feeling about then - but it was within a single vendor," he said.

"Those were vital communities. [They] were not officially open but they were effectively open. They were closed source, open community in some bizarre way. Now we have open source, closed community."

​Cut the marketing nonsense: Will the real data scientist please stand up?

Marketing people are just muddying the waters by misappropriating the 'data scientist' job title, according to former CERN physicist Dr Paul Schaack.

Read More

Those types of projects, which fail to observe the community aspect of open source, are even present in the confusingly large number of initiatives in the ecosystem surrounding big-data framework Apache Hadoop, according to Dunning.

"I said to myself, 'OK, I'm a big-time Apache guy. I must know about this. There must be 15 or 20 Hadoop-related projects'. So I went and counted them. I went through all the 200-plus Apache projects and all the incubators and I came up with nearly 40," he said.

"I'm supposedly in the know and I misestimated by two to one. There were twice as many as I thought there were, just in Apache - and then there are things outside that are uncountable. So there's a lot and it's clearly confusing. If you don't make a full time job of tracking it, there's no chance you'll be able to enumerate all the projects involved."

But that level of innovation is a good thing. It is only natural for people with good ideas to want to test them out in the community.

"That's the kind of froth that should be there but we should skim that froth off a little bit when presenting to people who have to get things done," Dunning said.

"A second kind of the source of the froth, which is not as good, is when people don't really embrace open source. It's hard to [embrace open source] because as a business you have to have some value that you bring to the game, and as open source you have to be very happy that people take some of your work without credit. Those two things are fundamentally irreconcilable in the same moment."

​The NoSQL database glut: What's the real price of the current boom?

Today's abundance of NoSQL databases gives firms choice with one hand but doles out management complexity with the other, according to Basho CTO Dave McCrory.

Read More

Dunning argues that companies that lack the technological or intellectual-property core values have a much more difficult time establishing their relationship to a project.

"They try to own an Apache project. They're not supposed to be doing that. But if you have that conflict between, 'I want to have a core value' and 'I want to be open', then you end up with these conflicted situations where you can't be clear in your mind about whether you're open or closed," he said.

"You end up with open source but closed community: the 'I'm sorry, we're the only developers ' sort of thing. That's not happening with Hadoop per se but other Apache projects have 90 percent of the core committee from one Hadoop vendor. That means it's not a community project. It isn't really a closed community but it's not really an open community either. I see that as a disaster."

He argues that that lack of openness leads to problems with creating a true consensus about standards for the open-source project in question.

"It's important to be totally clear when you do something: is it open or is it closed? It's very hard to tread a middle line," Dunning said.

Some projects may end up being monopolised by a single company simply because they exclude others, because they have cornered the market in that area of expertise or because nobody else is that interested.

"There are definitely projects where people are trying to push it but nobody cares. That's one option. Another one is where they're trying to monopolise it and they say things like, 'We do this project' with the implication that they are that project. That's not the way to build community. Building community is the point of open source," he said.

Dunning's association with open-source software stretches back over decades, with March this year marking a 40-year involvement.

​Hortonworks founder: Ambari 2.0 is as big a deal as Hadoop 2.0

From the Atlas security project, to Ambari 2.0 and SequenceIQ, Hadoop veteran and Hortonworks co-founder Arun Murthy discusses some big-data themes of the moment.

Read More

"It's changed a lot. But I've seen over and over again what is most important and I'm so happy to see Apache because Apache accords exactly with this conclusion that community is more important than code," he said.

"A good community can rebuild the code very quickly. Code - excuse my French - doesn't do s**t. It's dead. It's people that make it live and people are the community. So if you say you own a community, well, indentured service went out by law a long time ago."

The way some of these projects are run, the dominant force in development simply excludes others from making significant contributions.

"If I start a project and put it on GitHub and say I'm approving all changes, what are you going to do about it? You can copy my software if it's open source, you can build your own, but you can't really force me to take your changes. If you've got something that's important to you but it's not important to me, tough luck," Dunning said.

He reckons out of the 40 or so Hadoop-related projects, probably between half a dozen and a dozen are really core and important. Of the ones that are less important, many may simply be new and have not yet had a chance to prove their value.

"That's not pejorative; they're just not important yet. Often those are effectively closed communities because they're still growing - that's OK. For instance, Pivotal just donated a project called Geode which is the database underneath GemFire [NoSQL in-memory database]. That's effectively a one-company show," Dunning said.

"You can't fault them for that. You could fault them if three years from now it has high adoption and it's still one company. The guys running that understand that and are going in the right direction. There are others that are unimportant. They haven't built a community because they haven't wanted to build a community. They wanted to keep it too tight."

More on Hadoop and open source