Google has a new cloud service in beta aimed at simplifying data analysis on Hadoop and Spark.
Offered as a managed service via the Google Cloud Platform, Cloud Dataproc is geared toward open-source users looking to automate the management of their data clusters.
"Coud Dataproc automation helps you create clusters quickly, manage them easily, and save money by turning clusters off when you don't need them," Google Cloud Platform product manager James Malone wrote in a blog post on the new service. "With less time and money spent on administration, you can focus on your jobs and your data."
The service is similar to offerings already available on other cloud platforms, such as Amazon Web Services and Microsoft Azure, so Google's platform is essentially just catching up. But the company is trying to keep competitive on price.
Cloud Dataproc costs 1 cent per virtual CPU per hour in the cluster. Clusters can also include preemptible instances that have still lower compute prices, which reduces costs further. And while many providers round up usage to the nearest hour, Cloud Dataproc uses minute-by-minute billing and a 10-minute-minimum billing period.
Google said Companies can use Spark and Hadoop clusters without the assistance of an administrator or special software. Instead, they can interact with clusters and Spark or Hadoop jobs through the Google Developers Console, the Google Cloud SDK or the Cloud Dataproc REST API. When a cluster is no longer in use it can be turned off to avoid spending money needlessly.
Cloud Dataproc is integrated with the rest of Google's cloud services, including BigQuery, Cloud Storage, Cloud Bigtable, Cloud Logging and Cloud Monitoring. The current implementation of Cloud Dataproc features clusters based on Spark 1.5 and Hadoop 2.7.1.