Build your own Google
Summary: In an earlier post I talked a bit about a start-up called Kickfire who are releasing an accelerated database appliance for MySQL applications in the 1TB-3TB range that gives comparable performance to an equivalent Oracle RDBMS setup for around a quarter of the price.Kickfire sounds compelling for utility LAMP computing applications that need to be fast and cheap, but it doesn’t address ultra-scalable and cheap, at least not yet.
In an earlier post I talked a bit about a start-up called Kickfire who are releasing an accelerated database appliance for MySQL applications in the 1TB-3TB range that gives comparable performance to an equivalent Oracle RDBMS setup for around a quarter of the price.
Kickfire sounds compelling for utility LAMP computing applications that need to be fast and cheap, but it doesn’t address ultra-scalable and cheap, at least not yet. If you want to build some big Web 2.0-type application, such as a Facebook, a Yahoo or a Google, or even something very storage and database intensive like a bioinformatics application or geophysical data modeling, you are going to need to store very, very large amounts of data, in the hundreds of terabytes.
![]() |
| Credit: Dreamworks Animation LLC |
Another new company, Aster Data, which was formed by a group of Computer Science PhD’s from
Aster uses the same philosophy to scaling databases the same way “Beowulf” clusters are used for compute-intensive applications, such as with CGI render-farms and DNA-sequencing, high-energy physics and weather simulation. But instead of just distributing application memory and CPU cycles over a large number of nodes using a shared memory and parallelism API like MPI-2, where slave nodes boot off a high-speed network (such as Myrinet, multiple bonded gigabit interfaces or 10GigE) on a master node that issues instructions and use shared storage, Aster uses a massively parallel “Beehive” approach for database storage, where “Worker” nodes, each with their own storage, CPU and memory are provisioned and controlled by “Queen” and “Loader” nodes.
The “Queen” is a Linux appliance with all the software needed to give birth to new Workers, which are tweaked out PostgresSQL drones that share a distributed copy of your database, much like the way striped and parity data is stored on RAID drive arrays. All your IT staff has to do is add a bunch of totally virgin, new commodity servers to your network, boot via PXE from the Queen, and she does all the work of installing a new OSes on your Workers and building out your database schema. The result is a highly distributed and highly available database that can scale easily into the hundreds of terabytes.
Google and other large Web 2.0 operations have had to design totally proprietary, specialized systems in order to scale their databases that large – but with Aster, all that scalability is completely transparent because your front-end apps and your middleware business logic is the same as it always was on monolithic systems – applications communicate with the Queen via standard ODBC and JDBC interfaces, so you don’t have to do a ton of re-coding to stand up your own Hive. The “Loader” nodes are responsible for partitioning and loading datasets onto the workers and communicate with data federation services.
Aster Beehives are such a compelling technology that it has already been noticed by the big guys. MySpace already uses it for storing large amounts of data and issuing huge numbers of transactions – and one of the original financial backers of Google, Sequoia Capital, has financed the company through it’s A-round of venture capital financing.
Now, if only Kickfire and Aster could join forces. Then you'd really have something.
Do you think your enterprise can put this type of distributed database technology to work? Talk Back and let me know.
Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback
A similarity that cannot be ignored...
Weird.
Maybe if acquired by a large vendor
No, they don't.
Mysql has plenty of power...
Yeah, except...
Also, mysql still doesn't come close to oracle's level of database security, which has always been the bigger issue.
Another similarity (or patent infringement)
Where Aster has "Queen Nodes" and "Worker Nodes" Teradata has "Parsing Engines" and "AMPs".
Even if it's not infringing, the only thing new about it is that it's open source.
See: http://www.teradata.com/t/page/87083/index.html
The distinction may be in the automation
Did Google reroute the phone lines?
RE: Build your own Google
JT
http://www.Privacy-Center.net
RE: Build your own Google
RE: Build your own Google
http://natishalom.typepad.com/nati_shaloms_blog/2008/03/scaling-out-mys.html
I found the title a bit misleading though as i was expecting to find something equivalent to google AppEngine and not yet another discussion on database partitioning technology. IMO creating a platform that will scale the entire application (Business logic, web-tier, messaging tier) and address the entire life cycle of application deployment, self-healing in case of failure would be the right equivalent.
RE: Build your own Google
Oracle Database 10g Release 2, Enterprise Edition with Oracle Real Application Clusters and Partitioning and Red Hat Enterprise Linux running on the HP BladeSystem ProLiant BL460c IB Cluster 16P DC has price-performance of $12.57 USD/QphH@300GB.
To comply with the TPC organization???s policies, here are all the relevant details of the benchmarks. The Kickfire Database Appliance Series 2400 delivers 54,895 QphH@300GB (Queries per hour on the TPC-H benchmark) propelling Kickfire to world leadership in query performance (non-clustered systems) on the 300GB TPC-H benchmark. Kickfire is also number one in price/performance at $0.89/QphH@300GB USD on the 300GB benchmark. Moreover, Kickfire delivers this record breaking performance with a 3 year total system cost of only $48,790 USD. The Kickfire system availability is 10/14/08.
As of June 10, 2008, Oracle running on the HP BladeSystem ProLiant delivers 39,614 QphH@300GB and price-performance of $12.57 USD/QphH@300GB with a 3 year total system cost of $497,869 USD.
TPCH, QphH and $/QphH are trademarks of the TPC. For additional information on the TPCH benchmark, please visit the Transaction Processing Performance Council's Web site at http://www.tpc.org/.
RE: Build your own Google
It searches for 3D objects. Rocket fast
RE: Build your own Google
This article is very interesting. Thank you very much for sharing .
<a href="http://www.flvdvdconverter.net"><b>FLV to dvd Converter</b></a>
RE: Build your own Google