must read Tech's leaps, limps and likes: The 7 trends that defined 2017

Going data-driven on a budget

How can you think and do data analysis globally, aiming to act and have an impact locally, when the data you need is scattered and incomplete and your resources are limited? And what does an analysis on the affordability of data plans around the world show?

The Web Foundation is an organization many people are familiar with, due in some part to being led by someone like Sir Tim Berners Lee who is credited with the invention of the Web, and in other part to its central role in the development of the Web.

Although the Alliance for Affordable Internet (A4AI) is not as well known, this coalition of organizations is led by the Web Foundation and its mission is a complementary one: to advocate for policies for affordable internet access everywhere in the world.

A4AI is a data driven organization, collecting, integrating, and analyzing data on a global scale while working on a budget. In a way, this is fitting to advocate on behalf of those with little or no access to data. Case in point, recent results from A4AI show that the majority of the world's population does not have access to affordable internet.

The process of concretely defining and measuring something as vague as affordability and using this as an instrument to communicate and advocate change on a global scale, while working with limited resources, is one that may have interesting lessons to teach. ZDNet discussed with Dhanaraj Thakur, Senior Research Manager at A4AI.

Defining affordability and lying with statistics

To begin with, what does affordability mean and who gets to define it? As Thakur explained, the working definition of affordability as proposed by the UN, or more specifically, the ITU, was that internet in a country is affordable if 500 MB of mobile data access for one month does not cost more than 5 percent of a person's income.

That's not a very good definition though, for a number of reasons. To begin with, as Thakur points out, 500 MB is hardly adequate -- you can easily spend it all by watching one video online. And then, the 5 percent threshold is also not a very good one either. Why?

Because as Thakur says, doing a percentile analysis on income for countries where data is available reveals something interesting. If we take the 5 percent threshold over average income in a country, it may seem like this criterion is met, therefore as per the above definition internet access is affordable. But what does average income means?

To give a simplistic example, if a country's population consists of 10 people, 1 of which has an income of 1 million and each of the remaining has an income of 1, the average income in that country is 100K. That is in no way representative of the income distribution in that fictitious country.

Using the wrong metrics in the wrong context and interpreting them erroneously has been called lying with statistics, and average income is clearly not a good indicator of the buying power of the majority of a population. Any data literate person realizes that, and the people in A4AI are no exception.

This is why they tried to come up with a more realistic metric, and ended up using what they call 1 for 2: for the A4AI, internet access in a certain country is affordable if 1 GB worth of data over the period of one month does not cost more than 2 percent of the average national income.

That's not a perfect metric either, but as Thakur says their data analysis showed it comes closer. 1 GB is still not a whole lot of data to go by, considering the average use at the moment is closer to 2.5 GB. And then there is still the dreaded "average" there. So why not use a more realistic cap on data, and segmentation criteria such as percentiles?

Data in and of the developing world

Thakur explains that the data A4AI uses for income comes from the World Bank (WB), and the WB does not publish detailed data on income distribution. Why that is the case is a question for the WB, but that's just how the situation is at the moment.

As for the 1 GB cap, Thakur said they considered this good enough for developing countries, which is what A4AI's focus is on. But how does A4AI get pricing data for 1 GB data plans around the world, and how is that combined with average income to calculate the affordability metric?

The data collection part is, as Thakur explains, the biggest part of this effort. It is at this point a manual task which consists of many steps. Researchers have to initially identify every data plan provider in the countries of interest. Then for each provider they have to identify all their data plans, find the ones that are at least 1 GB per month, and choose the cheapest one among them as the basis for calculating the metric.

Again, this is ITU methodology, and far from perfect. For one, it fails to account for market share. So if for example data plan X is the cheapest in country C, but only used by 1 percent of the population, it still forms the basis of the calculation.

Data analysis based on percentiles, such as the one shown here, is one way to overcome loss of information in metrics such as average. But this requires data availability and more through analysis. Image: A4AI

Since we are talking about pricing data, which are by nature highly volatile, unless that data is updated, calculations will soon be out of date. Even if A4AI manages to keep the pricing data up to date, the WB data is only updated once per year.

To give a real world example, let's consider what a back of an envelope calculation would show for Germany. Although Germany, like the US, is not listed as a developing country and is not monitored by A4AI, it is estimated that over 7 million people in Germany work in mini-jobs, earning no more than 450 euro per month.

The cheapest 1 GB data plan in Germany at this time costs 10 euro per month. The average income for Germany according to WB data is a little under 45K euro per annum, or 3.75K euro per month, so internet in Germany would be considered affordable based on A4AI's definition. But the same calculation for the 450 euro / month earners shows that the cheapest 1 GB data plan costs more than 2.2 percent of their monthly income.

Data analysis on a budget

Thakur is very much aware of the ways in which A4AI's analysis is imperfect. He acknowledges the shortcomings, but he views A4AI's efforts as a first step towards advocacy for affordable internet access.

Thakur is also pragmatic as to how much A4AI can achieve with its current resources. A4AI has a total of 35 people in its workforce, and considering all the administrative, advocacy and education tasks it is involved in, there are no more than 5 or 6 people left to do the actual research and analysis.

Taking into account what coming up with the findings we described here entails, and keeping in mind that these are just a small part of the work A4AI does may give an idea of the challenges A4AI has to deal with.

A4AI's target audience is mostly policy makers, and its main instrument for advocacy is the Affordability Report. This report features policy indicators derived from A4AI's surveys and analyses, and supply side indicators based on data from sources such as WB, ITU, and GSMA.

To collect and to assess those metrics requires lots of manual work, and this is just the beginning of the analysis. All the metrics that A4AI uses are available through a self-service data portal, but according to Thakur most policy makers are mainly interested in reports and policy recommendations focused on their own country.

One of the recommended policies for internet affordability, shared infrastructure, could also make sense for NGOs working on related topics. Image: A4AI

These reports are currently compiled manually as well, so A4AI is looking into ways of automating this data-driven storytelling at least to some extent. There are other parts of A4AI's work that could also be automated, mostly in terms of data collection. Thakur also mentions replacing costly offline surveys with online ones.

The biggest impediment to A4AI's work according to Thakur is the lack of publicly available high quality data. Thakur emphasizes that producing such data is a key part of A4AI's mission, especially considering the fact that a big part of its funding comes from public sources.

In the end, many of the challenges that initiatives working on similar domains such as Mozilla's Internet Health Report or the Global Open Data Initiative are overlapping. Some of the challenges are technical, others have to do with resources and priorities. An overarching initiative may have something to offer in terms of technical solutions, still a change in priorities and funding distribution would be essential to move forward with this kind of work.

Visit ZDNET