An estimated 20-25% of a developer's time is spent looking for answers to problems others have already encountered (and perhaps solved). That's the time Darren Rush, CEO and co-founder of Koders.com, is trying to help you save. In this exclusive interview, Darren shares his insights on code searching, open source license proliferation, and what it's like to have"Google has helped to validate the code search category" your niche invaded by a big company like Google.
[Dev Connection] What does a special code search site give you that a regular web search engine doesn't?
[Darren Rush] There are 2 fundamental differences between code and general text content which require specialization by search providers. First, the best quality code is stored in version control systems (CVS, Subversion, ClearCase, etc) which are not generally crawled by a conventional search engines. Second, unlike a web page, source code is very structured text content. A well designed code search system exploits this structure to increase result relevance, facilitate content navigation, and integrate with other developer tools, such as IDEs.
Google recently started their own free code search site. How do Koders and Google Code Search (GCS) compare in terms of their reach and the number of source files indexed?
Koders was initially launched by looking at the code stored within version control systems at popular open source hosting sites such as SourceForge, the Apache Foundation and others. It appears Google has started out by indexing primarily zip files that previously represented 'the end of the line' for their web crawling system. So for the time being, our results don't overlap that much. Koders currently has more than 425 million lines of code from more than a hundred repositories. We have found that code managed in version control systems has implicitly better quality than code found in ad-hoc archives posted on the web.
How does your list of supported languages compare with Google? Is this something you're working on?
We are closely matched in this area. We have a few they don't and vice versa. In this arena, more is better and we take cues from our users about where to go next. A great example is ActionScript which we've seen good demand for and we will be rolling out very shortly.
One of the complaints about GCS has been mistakes in figuring out the license used by a piece of code. How does Koders figure out the code's licensing terms, and does it do a better job than Google?
We look inside individual files to establish the license. We can roll this up to the project level - and we see that many times open source projects have a variety of licenses within a single project. This [license proliferation] is one of the points of confusion and contention for end users.
Opensource.org lists about 50 open source licenses. What do you think about that?
We think there is a definite need for both consolidation and clarification in the open source license arena. Unfortunately, it looks like the next generation of GPL will fall short of achieving some of the necessary requirements for 'business compatibility'. But the good news is that people are aware of the issue and groups such as the Apache Foundation are working to drive towards fewer standard licenses and are paying attention to the needs of businesses.
[Dev Connection] In some unscientific tests, GCS seemed to return results faster than Koders.com. Is that a fair assessment, and is it something you're working on improving?
[Darren Rush] Google has helped to validate the code search category and GCS has increased awareness of Koders. This increase in recent traffic is driving acceleration of infrastructure improvements - which is happening as we speak. We're our first customers, so getting results fast is as important to us as it is to users - and fortunately our scalable architecture lets us grow with market demand.
GCS supports regular expressions and Koders has its own search syntax including stemming. What are the pros and cons of these approaches?
Developers are power users, and we're seeing demand for a variety of advanced search features - regular expressions is just one of them. Many search scenarios are supported with our existing search syntax, and regular expressions is one of several incremental search capabilities we are developing. Our approach is to identify the highest value use cases for developers and enable those first.
How do you handle the problem of sensitive information, such as passwords, being left accidentally in source files. Do you worry that you're making these things more accessible to intruders?
Insofar as the Koders.com index focuses more on open source repositories, and less on ad-hoc archives, we've seen that project owners are generally diligent about managing confidential information within their open source projects. Unmanaged archives are more at risk. It certainly could be a problem for developers who may have posted 'hidden' zips for partners which Google has now picked up.
If you had to pick one killer feature that Koders has that nobody else has, what would it be?
SmartSearch is one of the killer features available in our IDE plugins for Eclipse and Visual Studio. This feature works proactively in the background as developers write code and SmartSearch sends alerts when it finds existing code, services and components. SmartSearch enables well-written reusable code to find developers before they hit a road block or begin re-inventing the wheel. The Koders IDE plugins with SmartSearch are available for web site and enterprise users.
The same technology that powers Koders.com and the plugins is available to our enterprise customers for use inside the firewall. Koders Enterprise Edition 1.2 is now available with fixed pricing for workgroups under 100 seats, and per-user pricing beyond that.
If Google had been doing code search in 2004 (when you started the site) would you have undertaken it anyway, and why?
Yes. Koders is dedicated to providing the most sophisticated and specialized search engine for code. Moreover, Koders is not only focused on code found on the web, but also code found in the enterprise. Feedback from customers has been tremendous because Koders is solving big problems for their development organizations. With Koders inside the firewall, customers are finding code stored in heterogeneous repositories, discovering internal branching and redundant bugs, reducing the learning curve for developers, and enabling service discovery (SOA). Koders will continue to evolve our suite of products and services to provide seamless code access for professional developers everywhere.