Google's Cloud Dataproc service exits beta

Now generally available, the Cloud Dataproc service is geared toward open-source users looking to automate the management of their data clusters.


Google said Tuesday that its Cloud Dataproc service is now generally available.

Offered as a managed service via the Google Cloud Platform, Cloud Dataproc is geared toward open-source users looking to automate the management of their data clusters.

The Next IT Transformation

What you need to know before implementing edge computing

These are the questions your firm should ask before going down the route of edge analytics and processing.

Read More

Since it was first introduced in September, Google said it has added additional features to the service aimed at simplifying data analysis on Hadoop and Spark.

"While in beta, Cloud Dataproc added several important features including property tuning, VM metadata and tagging, and cluster versioning," Google product manager James Malone wrote in a blog post.

The service is similar to offerings already available on other cloud platforms, such as Amazon Web Services and Microsoft Azure, so Google's platform is essentially just catching up. But the company is trying to keep competitive on price.

Cloud Dataproc costs 1 cent per virtual CPU per hour in the cluster. Clusters can also include preemptible instances that have still lower compute prices, which reduces costs further. And while many providers round up usage to the nearest hour, Cloud Dataproc uses minute-by-minute billing and a 10-minute-minimum billing period.

Google said Companies can use Spark and Hadoop clusters without the assistance of an administrator or special software. Instead, they can interact with clusters and Spark or Hadoop jobs through the Google Developers Console, the Google Cloud SDK or the Cloud Dataproc REST API. When a cluster is no longer in use it can be turned off to avoid spending money needlessly.