Build your own Google

Build your own Google

Summary: In an earlier post I talked a bit about a start-up called Kickfire who are releasing an accelerated database appliance for MySQL applications in the 1TB-3TB range that gives comparable performance to an equivalent Oracle RDBMS setup for around a quarter of the price.Kickfire sounds compelling for utility LAMP computing applications that need to be fast and cheap, but it doesn’t address ultra-scalable and cheap, at least not yet.


In an earlier post I talked a bit about a start-up called Kickfire who are releasing an accelerated database appliance for MySQL applications in the 1TB-3TB range that gives comparable performance to an equivalent Oracle RDBMS setup for around a quarter of the price.

Kickfire sounds compelling for utility LAMP computing applications that need to be fast and cheap, but it doesn’t address ultra-scalable and cheap, at least not yet. If you want to build some big Web 2.0-type application, such as a Facebook, a Yahoo or a Google, or even something very storage and database intensive like a bioinformatics application or geophysical data modeling, you are going to need to store very, very large amounts of data, in the hundreds of terabytes.

Bee Movie
Credit: Dreamworks Animation LLC

Another new company, Aster Data, which was formed by a group of Computer Science PhD’s from Stanford University, is attempting to go with a similar Open Source technology-enabled appliance approach to Kickfire's, but are applying it to horizontal database scalability for roll-your-own data warehousing applications.

Aster uses the same philosophy to scaling databases the same way “Beowulf” clusters are used for compute-intensive applications, such as with CGI render-farms and DNA-sequencing, high-energy physics and weather simulation. But instead of just distributing application memory and CPU cycles over a large number of nodes using a shared memory and parallelism API like MPI-2, where slave nodes boot off a high-speed network (such as Myrinet, multiple bonded gigabit interfaces or 10GigE) on a master node that issues instructions and use shared storage, Aster uses a massively parallel “Beehive” approach for database storage, where “Worker” nodes, each with their own storage, CPU and memory are provisioned and controlled by “Queen” and “Loader” nodes.

The “Queen” is a Linux appliance with all the software needed to give birth to new Workers, which are tweaked out PostgresSQL drones that share a distributed copy of your database, much like the way striped and parity data is stored on RAID drive arrays. All your IT staff has to do is add a bunch of totally virgin, new commodity servers to your network, boot via PXE from the Queen, and she does all the work of installing a new OSes on your Workers and building out your database schema. The result is a highly distributed and highly available database that can scale easily into the hundreds of terabytes.


Google and other large Web 2.0 operations have had to design totally proprietary, specialized systems in order to scale their databases that large – but with Aster, all that scalability is completely transparent because your front-end apps and your middleware business logic is the same as it always was on monolithic systems – applications communicate with the Queen via standard ODBC and JDBC interfaces, so you don’t have to do a ton of re-coding to stand up your own Hive. The “Loader” nodes are responsible for partitioning and loading datasets onto the workers and communicate with data federation services.

Aster Beehives are such a compelling technology that it has already been noticed by the big guys. MySpace already uses it for storing large amounts of data and issuing huge numbers of transactions – and one of the original financial backers of Google, Sequoia Capital, has financed the company through it’s A-round of venture capital financing.

Now, if only Kickfire and Aster could join forces. Then you'd really have something.

Do you think your enterprise can put this type of distributed database technology to work? Talk Back and let me know.

Topics: Enterprise Software, Data Centers, Data Management, Google, Hardware, Software, Storage


Jason Perlow, Sr. Technology Editor at ZDNet, is a technologist with over two decades of experience integrating large heterogeneous multi-vendor computing environments in Fortune 500 companies. Jason is currently a Partner Technology Strategist with Microsoft Corp. His expressed views do not necessarily represent those of his employer.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • A similarity that cannot be ignored...

    That bee looks like my brother Tom.
  • Maybe if acquired by a large vendor

    I'd like to see both companies acquired and merged together by a larger vendor say HP. These two companies are the future of hardware because they contain open source source at an enterprise level.
    • No, they don't.

      Enterprise level needs substantially more power than mysql can provide. The limitations of the database require costly balancing that seriously increases the usage price of the platform.
      • Mysql has plenty of power...

        to handle Enterprise data. Terabytes worth.
  • Yeah, except...

    If you're building an enterprise application, 1-3tb isn't going to cut it. The synchronous allocation needs to be substantially more than that.

    Also, mysql still doesn't come close to oracle's level of database security, which has always been the bigger issue.
  • Another similarity (or patent infringement)

    This seems extremely (possibly patent infringingly) like the Teradata DBMS.

    Where Aster has "Queen Nodes" and "Worker Nodes" Teradata has "Parsing Engines" and "AMPs".

    Even if it's not infringing, the only thing new about it is that it's open source.

    • The distinction may be in the automation

      Even SQl Server allows partitions across multiple servers. The problem has been in organizing the data, what data goes on what partition. They seem to have this automated, where a system like SQL Server requires it to be setup.
  • Did Google reroute the phone lines?

    What computer system is capable of doing all this?
  • RE: Build your own Google

    McGoogle! I like it. time to start building!

  • RE: Build your own Google

    If Kickfire and Aster joined forces, would the new company be called KickAsster or FireAsster?
  • RE: Build your own Google

    An alternative approach to database partitioning is decoupling the database from the application and using in-memory-data-grid as front-end data store. You can find more details here:

    I found the title a bit misleading though as i was expecting to find something equivalent to google AppEngine and not yet another discussion on database partitioning technology. IMO creating a platform that will scale the entire application (Business logic, web-tier, messaging tier) and address the entire life cycle of application deployment, self-healing in case of failure would be the right equivalent.
  • RE: Build your own Google

    Hi Jason, Thanks for including us in this post. It's a great piece. Just to set the record straight, Kickfire is actually 1/10 the price or less than Oracle. Kickfire's price-performance is $0.89 QphH@300GB USD on the TPC-H 300 GB benchmark.

    Oracle Database 10g Release 2, Enterprise Edition with Oracle Real Application Clusters and Partitioning and Red Hat Enterprise Linux running on the HP BladeSystem ProLiant BL460c IB Cluster 16P DC has price-performance of $12.57 USD/QphH@300GB.

    To comply with the TPC organization???s policies, here are all the relevant details of the benchmarks. The Kickfire Database Appliance Series 2400 delivers 54,895 QphH@300GB (Queries per hour on the TPC-H benchmark) propelling Kickfire to world leadership in query performance (non-clustered systems) on the 300GB TPC-H benchmark. Kickfire is also number one in price/performance at $0.89/QphH@300GB USD on the 300GB benchmark. Moreover, Kickfire delivers this record breaking performance with a 3 year total system cost of only $48,790 USD. The Kickfire system availability is 10/14/08.

    As of June 10, 2008, Oracle running on the HP BladeSystem ProLiant delivers 39,614 QphH@300GB and price-performance of $12.57 USD/QphH@300GB with a 3 year total system cost of $497,869 USD.

    TPCH, QphH and $/QphH are trademarks of the TPC. For additional information on the TPCH benchmark, please visit the Transaction Processing Performance Council's Web site at
  • RE: Build your own Google

    I agree. Here is another custom 'google':
    It searches for 3D objects. Rocket fast
  • RE: Build your own Google

    Really nice 3D.
    This article is very interesting. Thank you very much for sharing .
    <a href=""><b>FLV to dvd Converter</b></a>