India aims for world's big data

Growing demand for big data skills will see companies looking to outsourcing to plug the gap, presenting opportunities for India. But, it must first address several fundamental challenges.
Written by Mahesh Sharma, Correspondent

Growth in India's data analytics market can hasten the adoption of big data technologies among local organizations which significantly lag their global counterparts, but the country faces challenges in terms of infrastructure and data collection.

In its report "Big data: The next big thing", Indian IT services industry group Nasscom expects the country's big data industry to grow from US$200 million in 2012 to US$1 billion in 2015. The biggest challenge--and opportunity--is to satisfy the demand for data scientists. Avendus Capital, for one, estimates the United States will suffer a shortage of up to 200,000 data scientists by 2018, a gap that will most likely be filled by outsourcing.

Big data typically encompasses a wider range and various forms of data than traditional analytics, which is typically limited and specifically structured.

Numerous global big data players have sprouted in India over the past two years. Sears, for instance, established wholly-owned subsidiary MetaScale to service healthcare and entertainment customers with revenues between US$1 million and US$10 million. A @WalmartLabs facility also opened in Bangalore in April to develop social media and analytics and big data infrastructure. And in July, Yahoo set up a grid computing lab at the IIT-Madras campus.

On the supply side, India churns out more than 2.5 million university graduates and 750,000 post-graduates every year, of which 700,000 graduate in maths and science.

However, local demand for big data remains poor. There are hurdles facing Indian companies--specifically in telecom, retail, and banking--to tap business insights by analyzing large amounts of unstructured data.

"India's domestic demand for big data analytics is at a nascent stage, since most Indian organizations still consider big data as mere hype," the Nasscom report noted.

Tapping Indian resources to meet world demand
Big data services provider Mu Sigma, though, has already tapped Indian resources to meet U.S. demand for data analytics. Three quarters of its 2,500 data scientists are based in Bangalore, India, and according to founder Dhiraj Rajaram, its clients include 75 Fortune 500 companies, including Microsoft.

This army of Indian analysts start crunching numbers once they graduate from Mu Sigma University, a three-month "MBA for data analysts", Rajaram said. India, he added, produces a unique breed of analysts. "You need people in analytics who have a combination of maths, business, and technology, and there's no place in the world that's more suited to produce this sort of talent than in India."

Raised and educated in India, Rajaram founded the company in 2005 in Chicago, where he studied at the University of Chicago and subsequently worked as a consultant at Booz Allen Hamilton. His then-employer could not accommodate his idea for a "bionic man" big data ecosystem, which he said aimed to solve complex problems for businesses through a combination of processes, software, and human analysis. He decided then to strike out on his own.

In late-2011, Mu Sigma secured US$108 million in Series C round funding, on top of an earlier US$25 million investment from Google-investor, Sequoia Capital.

According to Rajaram, companies can analyze data to ease India's infrastructure and information bottlenecks. "In a developing country like India, there is so much inefficiency because we don't have the right infrastructure, the right government, [and] we have corruption," he said. "Information doesn't clearly flow from producers to suppliers to consumers."

"Big data and analytics can infuse behavior and intelligence that make these processes more efficient," he noted.

The Indian corporate customer is not very mature and has very little demand for such sophisticated information analysis. This is similar to other geographies, he said, but noted Indian companies need to capture and store data with the same rigor as their American counterparts.

"Certain businesses will have to mature," Rajaram said. "It's not just an issue of skillset, toolset, or data set. It's a mindset issue. Indian CEOs must have a mindset that data matters and being information-efficient matters."

BMC Software APAC CTO Suhas Kelkar leads the IT vendor's research and development facility based in Pune, just outside of Mumbai. He said large Indian companies struggle to satisfy the huge market demands of 1 billion people, and do not have the capability, or incentive, to use big data to grow profits or revenues.

This, though, will change due to one major factor: retail foreign investment.

According to Kelkar, U.S. retail giant Walmart will soon open its first chain of stores in the Asian country, after the Indian government opened its doors to foreign investment. Indian consumers will soon benefit from the technology-optimized service and logistics operations which have made Walmart one of the world's most valuable companies, he said.

He believes within two to three years Indian companies can use new technologies to operate more efficiently.

Data first needs to be scrubbed
Meanwhile, however, more efforts are needed to ensure data gathered is accurate, noted New Delhi-based freelance data scientist, Vivek Sharma, who is ranked third on the popular big data crowdsourcing site, Kaggle.

Sharma had won a US$10,000 prize from Kaggle for formulating algorithms and methods to efficiently use a dataset to solve a specific problem--in this case, to "scrub the filth of the Internet away in one pass".

The Kaggle model, which has facilitated 70 competitions, is much "looser" than the systematic consulting approach; where a contract defines exactly what will be delivered, how this will be done, and who will do it. Conversely, it attracts data scientists keen to use their own methods, knowledge, and flair to solve a problem.

Sharma, who previously worked at Goldman Sachs in New York, wanted to test different ideas he read about in books and rank himself against his peers.

However, his model will not work in India today because data still is not reliable. For example, GDP statistics were collected and stored in an Excel spreadsheet which was constantly edited. "They might have said inflation was 7 percent last quarter, but next quarter they'll revise that to 9 percent...[without] tracking why it's 9 percent, and not 7 percent, and what went wrong," Sharma explained.

The Indian government has addressed the issue, he said, but it will take two to five years before data collection techniques are clean and can produce more accurate information. 

"Those challenges are much more fundamental," Sharma said, pointing to the need to set up the basic infrastructure and establish a good data-collection system.

Once the Indian government and local businesses provide free access to clean data, it can realize the huge opportunitities in data analytics. 

"Some of big data challenges in India are bigger than big data in the United States," Sharma said. "A mobile carrier in India will need to use much more advanced techniques than a U.S. company just because the economics are different--in India, the scale is huge."

Mahesh Sharma is a freelance IT writer based in Australia.

Editorial standards