Amazon Web Services will launch data warehousing as a service in a move to cut hardware, software and administration costs. AWS is now previewing Amazon Redshift, a datawarehousing service.
"Large companies feel like they are paying too much. And small companies can't afford data warehousing solutions. As a result, they throw out some of their data," said Andy Jassy, senior vice president of AWS.
Jassy made the comments at Amazon Web Services' inaugural re:Invent conference in Las Vegas. The powwow was designed to bring together developers and customers to talk cloud migration and other key topics.
The move is likely to be a headache for traditional datawarehousing players who combine hardware and software. Key players include Teradata, IBM and Oracle among others.
Jassy said key points are:
- RedShift has been tested on Amazon. Amazon took 2 billion rows of data and ran on Redshift.
- Two 16 TB nodes on RedShift cost $3.65 an hour or $32,000 and got faster queries for a tenth of the cost.
- Redshift will work with all the current business intelligence tools.
- Pricing is 85 cents an hour for 2TB nodes. Annual deals and reserved instances are cheaper.
- Companies pay $19,000 to $25,000 a year per TB.
- Limited preview starts today with full launch in 2013.
In a follow-up interview, Adam Selipsky, vice president of product marketing, sales, and product management at AWS, said he expected Redshift to disrupt both high-end data warehouse systems as well as midrange. "From a pricing perspective Redshift will be competitive with both. Redshift is the 10th of the price of a high-end data warehouse and significantly cheaper than midrange offerings," said Selipsky.
Selipsky also added that the drumbeat of customers wanting data warehousing as a service was picking up. The key catalyst: Big data. He also noted that Redshift represented what is likely to be a wave of data warehousing as a service offerings.
Indeed, companies like BitYota will take a SaaS approach to data warehousing and abstract the hardware layer completely. AWS will have its data warehousing infrastructure. And a number of other players are likely to add value to the space possibly built on AWS.
AWS also detailed Redshift on its blog. Here's a look at the pricing.
As for the architecture, Amazon said:
An active instance of Amazon Redshift is called a Data Warehouse Cluster, or just a cluster for short. You can create single node and multi-node clusters. Single node clusters store up to 2 TB of data and are a great way to get started. You can convert a single node cluster to a multi-node cluster as your needs change. Each multi-node cluster must include a Leader Node and two or more Compute Nodes. A Leader Node manages connections, parses queries, builds execution plans, and manages query execution in the Compute Nodes. The Compute Nodes store data, perform computations, and run queries as directed by the Leader Node.
Amazon Redshift nodes come in two sizes, the hs1.xlarge and hs1.8xlarge, which hold 2 TB and 16 TB of compressed data, respectively. An Amazon Redshift cluster can have up to 32 hs1.xlarge nodes for up to 64 TB of storage or 100 hs1.8xlarge nodes for up to 1.6 PB of storage. We currently set a maximum cluster size of 40 nodes (640 TB of storage) by default. If you need to store more than 640 TB of data, simply fill out [this form] to request a limit increase. Keep in mind that Amazon Redshift is a column-oriented database and that it is able to compress data to a higher degree than is generally possible with a traditional row-oriented database.