/>
X

Google BigQuery: Self-service cloud data analysis, from your iPad or desktop

Google made its BigQuery service publicly available last month. So I decided to put it through its paces, and compare it to Microsoft’s Excel and PowerPivot.
andrew-brust.jpg
By Andrew Brust, Contributor on
6367072.jpg
1 of 11 Andrew Brust/ZDNet

Google made its BigQuery service public on May 1st.  Though the name sounds like “Big Data,” Google’s offering is really a self-service Business Intelligence (BI) solution, hosted in the cloud.  In this gallery, I'll tell you how it works.

BigQuery is a cloud-based data analytics system from Google.  It lets you upload data, then analyze it using SQL (Structured Query Language) as the query interface.  BigQuery lets you query up to 100GB of data per month for free.  Just create a Google Developer account for yourself, create a new project and enable BigQuery within it.

Click on any image to enlarge.

Source: Google

6367073.jpg
2 of 11 Andrew Brust/ZDNet

BigQuery offers a simple, easy to master browser console, providing for dataset browsing on the left and SQL querying on the right.  When queries are returned, options appear to let you save the results in alocal CSV file or create a new table containing the result set’s content.

A simple query against the sample github_timeline table is shown here.  The first few rows from the results of the query appear on the bottom-right of the screen, along with navigation controls that allow you to page through the data.  Note the “Save as Table” and “Download as CSV” options, which work nicely in Chrome and FireFox.  Unfortunately, the "Save as Table" option is not available in Internet Explorer (nor is a file upload option we'll look at shortly).  Everything in BigQuery also works nicely in Safari on the iPad, though you can’t save or upload local files there either.

6367074.jpg
3 of 11 Andrew Brust/ZDNet

BigQuery data is stored in tables, much as in a relational database.  Tables, in turn, are stored within datasets.  Datasets serve as a unit of security, allowing for sharing with specific users or the overall public.  Google supplies the publicdata:samples dataset which is added to every BigQuery project.  This allows you to examine and query tables right away.

At the left side of the screen, you can drill down on a dataset to see the tables it contains.  Select one of those tables and its schema will appear on the right-hand side of the console.  The github_timeline table’s schema is displayed in this figure.  Notice that its label on the left side of the screen appears in bold text with a red bar beside it.  The “Click to preview table data” link does what it says, but you can also write your own SQL queries.
 

6367076.jpg
4 of 11 Andrew Brust/ZDNet

To run a Query, type it in, then click the “RUN QUERY” button or just tap Ctrl-Enter on your keyboard.  While the query is running, the query text area is disabled and the elapsed query time clock runs up, right next to the “Query running” label.

BigQuery does not permit “SELECT *”-style queries; instead, you must specify all column names.  And although you’ll be querying large datasets, you will want to keep your result sets small.  To do that, make use of aggregating queries (using aggregate functions and GROUP BY) and/or the LIMIT n clause at the end of your query as was done here (i.e. “LIMIT 200” appears at the end of the query).

Tables are identified using a syntax of datasetname.tablename.  If you reference any table from the samples dataset, you’ll need to use the “publicdata:” prefix before the “samples” dataset name.

6367077.jpg
5 of 11 Andrew Brust/ZDNet

BigQuery isn’t just for big desktop and laptop computers.  It runs very well on the iPad, for example, as shown here.  And BigQuery is hip to tablet use too: rotate your iPad from landscape to portrait and the page rendering adjusts, showing you several extra rows of data (illustrated nicely here).

6367078.jpg
6 of 11 Andrew Brust/ZDNet

You can create your own tables too, of course.  Just hover over your dataset and click the “+” sign that appears to the right of its name to bring up the “Create Table” form, shown above.  In the form, you need only supply an ID (name) for the table, its schema (expressed as a list of column names and data types) and point to the source file containing the data.  Then click OK.

If the file is 10MB or below in size, you can select it right from your own computer’s hard drive. If it’s bigger, you’ll need to push it up to Google Cloud Storage first, and then supply a link to the file, using “gs://” at the beginning of the URI, as seen here.

BigQuery assumes it will be importing from a CSV (comma separated values) text file with no initial row containing column names.  If your file uses a non-comma delimiter, or its first row contains column names instead of data, you can tell BigQuery what delimiter to expect and to skip the first row (or first several rows).

This particular CSV file has historical baby name data for six states in the USA.  The data was taken from the U.S. Social Security Administration’s namesbystate.zip data collection file, which contains data for all fifty states, each one in a separate file.  The single file I built with the just the six states’ data nonetheless has over 1 million rows.
 

6367079.jpg
7 of 11 Andrew Brust/ZDNet

While your table’s being created, you can monitor the progress of the data load task by clicking the “Job History” link at the upper-the left of the screen.  This brings up the "Recent Jobs" screen pictured here.  Despite the reference to “history” in the link, you can view running jobs too.  Click one and you’ll get the detail shown here.

6367080.jpg
8 of 11 Andrew Brust/ZDNet

Of course, if you’ve got data in CSV format, nothing stops you from opening it in Excel.  From there, you could save the data in Excel’s own file format, then start crunching its numbers.

6367081.jpg
9 of 11 Andrew Brust/ZDNet

Google describes BigQuery as a column store database that works well for OLAP (OnLine Analytical Processing) queries.  Microsoft PowerPivot, a free add-in for Excel 2010, precisely fits that description as well and can import from CSV files just fine.  Here’s what the data looks like once it’s in PowerPivot.  Throwing large datasets in PowerPivot will give you far better performance than storing the same data directly in a spreadsheet.

One very powerful aspect of PowerPivot is its use of data compression.  The CSV file containing the six states’ baby data was over 20MB. The Excel workbook containing the PowerPivot model with that same data was less than 1MB.

6367082.jpg
10 of 11 Andrew Brust/ZDNet

Once your data is in PowerPivot, you can easily visualize it back in Excel using PivotTables, charts or both.  Here we see a chart bound to the PowerPivot model with our baby name data, featuring several “slicers” above and to the left of the chart.

Slicers let you filter your data and see the chart update automatically.  Here we have filtered the query to show data only for the names Abigail and Allison, in the states of AZ, CA, CO and NY, between 1995 and 2005.  The slicer at the top contains only girls’ names because we selected “F” (female) in the "Gender" slicer toward the upper-left.

6367083.jpg
11 of 11 Andrew Brust/ZDNet

Here’s the same result in BigQuery.  There’s no visualization of the data and rather than just clicking slicers, the WHERE clause in the query had to be written carefully to get the same result.  Query times were comparable with PowerPivot's.

Going the PowerPivot route offers a lot of advantages and will likely make more sense for many users.  But BigQuery is cloud-based and can be accessed from any Internet-connected computer or device.

Related Galleries

Linux turns 30: The biggest events in its history so far
05-debian.jpg

Related Galleries

Linux turns 30: The biggest events in its history so far

31 Photos
Say hello to the early days of web browsers
netscape-shutterstock-189041855.jpg

Related Galleries

Say hello to the early days of web browsers

9 Photos
Parallels Toolbox 5.0 for Windows and Mac, in pictures
Mac Dashboard

Related Galleries

Parallels Toolbox 5.0 for Windows and Mac, in pictures

12 Photos
Parallels Toolbox 4.5 for PCs, M1 and Intel Macs
Parallels Toolbox

Related Galleries

Parallels Toolbox 4.5 for PCs, M1 and Intel Macs

39 Photos
Parallels Toolbox 4 for Windows and Mac
parallels-toolbox-show-desktop-macos-screenshot

Related Galleries

Parallels Toolbox 4 for Windows and Mac

10 Photos
How to perform a clean install of Windows 10: Here's a step-by-step checklist
00-before-you-start.jpg

Related Galleries

How to perform a clean install of Windows 10: Here's a step-by-step checklist

17 Photos
Linux survival guide: These 21 applications let you move easily between Linux and Windows
apps-for-linux-and-windows.jpg

Related Galleries

Linux survival guide: These 21 applications let you move easily between Linux and Windows

22 Photos