Getting big data right is about more than the size of your database

Big data isn't about size, it's about scale - and that might be bigger than your business.
Written by Mary Branscombe, Contributor

Every few months the argument about how 'big' data needs to be in order to be 'big data' rolls around again. As a term becomes more familiar, it also gets misused and it's easy to think of big data as a system that makes a data warehouse achievable even for a small business.

That misses the point, though.

Big data isn't big because you have lots of it. It's big because it covers lots of areas in which you can find insights that your regular data set - however large - doesn't cover.

You can do some clever things using only your own data when you have enough of it. JJ Food Service Limited took three years of sales in Microsoft Dynamics AX (25 million transactions, totalling 6GB) plus the details of how customers use its ordering website and put them through the Azure Machine Learning service. This created a system that pre-populates the shopping cart for each customer, making it faster for them to put an order in - and making it less likely they'll forget something and go buy it elsewhere.

If you have data about your business, it's a waste not to try running predictive analytics or a recommendation engine to see what you can find out that you don't already know. Azure ML is like the Excel macros of business intelligence: you'll need to learn how to use it but you don't have to have a PhD and two years' experience programming R to get useful analysis out of it.

Offering this as a service makes it easier to do the second step of big data which is - once you've found the interesting results that you didn't already know about - automate them as part of your business. They stop being big data and turn into regular old business insights at this point, so you don't want to have to do a lot of custom processing to use them: they need to fit in with the way you run your other business processes.

At the very least, stop using email quotas and forcing employees to delete old mail or put in PST files that will get lost when they upgrade computers or even leaked if they lose a laptop, and turn it into an asset they can search.

An archive service like Mimecast (which plugs into Exchange and Office 365), or Office 365 tools like Delve - or even the search box in Outlook Web App - turns old mail into a way to find out who you have relationships with.

Old emails and documents aren't cruft that forces you to buy more drives for storage, they're a huge potential source of big data once we get more tools for extracting information out of unstructured and semi structured data. Those tools are still mostly pipe dreams and research projects, but the sooner you start thinking of business information of all kinds as a resource rather than a millstone, the more you'll get out of the big data future.

But the most interesting big data often turns out to be correlations with information that your business doesn't already have. If you want to know if someone is going to be a good insurance risk, someone who will pay their premiums on time and not make many claims, the thing you really want to know is whether they use scuff protectors on their chairs - because people who use scuff protectors are good insurance risks (there are some interesting privacy questions in all of this: your insurance company isn't going to tell you that it's charging you a higher rate because you don't use scuff protectors).

Similarly, if you want to get a quick feel for how healthy a business is, find out how many parcels they send and receive (and whether that's changed significantly over time). Most businesses receive supplies and ship products, and changes in the volume of parcels going in and out of the company tells you a lot about how the business is doing, just what you want to know if you're deciding whether to give them a loan.

One large package shipping company that's already analysing all the shipping information it has to put warehouses in the right place and recruit enough couriers realised this and started a side business taking all that package history doing analytics on it and providing the results of those analytics to financial companies who are managing loans and credit ratings.

That's reminiscent of what might be the earliest known use of big data: setting petrol prices.

When a commercial petrol card company realised that what it had wasn't just a record of what customers had to pay at the end of the month, but a huge geographical database of what petrol cost in locations all over the UK, it went to Esso and asked if they'd like to know exactly how much they could charge and still have the cheapest local petrol.

Esso jumped at the opportunity. Now all the oil companies buy feeds of prices at the pump but for several years - while Esso had exclusive access to the data - it had a big advantage over its competition.

The lesson is that what you most need to know about your market or your customers may not be in the data you have - and that the data you have might contain information some other company could use.

Read more on big data

Editorial standards