Govt, hurry up with releasing data

Govt, hurry up with releasing data

Summary: A programmer scraped data from the My School website to make some really cool heat maps showing regions of smart schools — no thanks to the government, which didn't supply the data in any useful kind of format.

SHARE:

A programmer scraped data from the My School website to make some really cool heat maps showing regions of smart schools — no thanks to the government, which didn't supply the data in any useful kind of format.

Joel Pobar, a former Microsoft employee, showed on his blog how he combined the data he scraped with Google Maps to show visual heat maps showing which regions offered the best education.

He marked schools across the state as either green or red in colour on Google Maps, depicting how good the school's average was. It made for an interesting map, and from first impressions, revealed some very interesting information visually, such as city schools having better averages than rural schools.

NSW with the My School data applied

NSW with the My School data applied
(Credit: Joel Pobar)

But these images weren't easy to produce. Pobar had a lot of trouble getting the raw data (data that is straight from its original source). He ended up "scraping" the data from the My School website, something he didn't likely have permission to do.

Data scraping, as defined by Wikipedia, is a technique in which a computer program extracts data from human-readable output (such as a web page).

In Pobar's case he needed to extract data from the My School site and export it into a format that his code could understand. It is, however, something any programmer would try to avoid as you can end up with all sorts of nasties if the data isn't extracted correctly.

The scraping process took him around four hours, four hours of his life he could have had back if the government had provided the data for developers to use. "Why didn't the government just offer up the raw data and let the programmers of Australia mash it up ... or at least give me a feed of the raw data to save me some time," Pobar said on his blog.

You see, government website data, by default, is not licensed under a creative commons licence (oh how nice that would be!). Although we pay taxes to the government, we don't own the information it produces — that data is Crown data; data we need to get permission to reproduce. So if Pobar wished to publish his work, he would need to seek permission to do so. If he wanted to earn money from the work, well that's another kettle of fish.

The My School's copyright statement says:

Copyright in the content and design of this website, including publications and logos, is owned by or licensed to the Australian Curriculum, Assessment and Reporting Authority (ACARA).

Subject to uses permitted under the Copyright Act 1968, you may only download, display, print and reproduce this material in unaltered form only for your personal, non-commercial educational use or non-commercial educational use within your organisation. However, unless otherwise indicated, this permission does not extend to reproduction, communication to the public, publication or other use of the work (in whole or in part) on an external website, intranet site or equivalent media.

This has been an issue the Government 2.0 Taskforce had attempted to try and fix late last year by creating a contest designed to entice programmers to use government data.

In creating the competition, it also released a new website called data.australia.gov.au, which lists a whole bunch of raw data sets available for people to use.

This is a great step forward, but we need more of it. At the time I wrote this, the date of the last data release on the site was 11 December. That's last year! Also, some of the data wasn't raw, it was in excel spreadsheets which weren't comma separated or easily usable for mashups. That definitely needs improvement.

Of course, I understand that not all data can be released.

I was at a mashup event last year where an Australian Bureau of Statistics employee faced down angry developers calling for the release of data. He said that if it were to release most of its raw data it could allow people to figure out sensitive information about other people or companies.

"Some people are that brilliant that they can work out how much companies earn, what their profit margins are and all of that — and that's something that we have to kind of avoid," said Anthony Zuza, quality assurance manager at the ABS.

I guess finding the right balance is going to be tough, and is something which is slowing the government's hand at releasing information in the right format. Hopefully we can get over this, so that developers can start doing more cool things.

Topics: Government, Government AU

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

8 comments
Log in or register to join the discussion
  • Another Missed Opportunity

    Another point is - why isnt the myschools website showing this information already as if they had the raw data - it would be pretty easy to do - and adds value.

    I do also think that the information should be released in some format that developers can get to as well.
    anonymous
  • another scraper

    I'm also working on a scraper for the site in case anyone wanted the data but not the extra work at scraping, http://github.com/andrewharvey/myschool
    anonymous
  • league tables

    because, danger danger, this means people could make league tables of schools and oh no they don't want this to happen do they. Of course, mr scraper has managed to do this so lets wait for the next bit of government spin on this flawed-concept site.
    anonymous
  • Govt "making money" from products is fiction

    There is a crazy idea that the govt has that is can "make money" and reduce taxes by selling govt information back to the same people and companies that pay the taxes. It is a zero sum game.

    All it means is that citizens either:
    a) Pay higher taxes and get the freedom to use govt data as they wish
    b) Pay lower taxes and a "hidden tax" to use the data they already own (either directly or buried in the costs of products that have to license the data)

    Government- stay after school and wite a hundred times on the blackboard "I am not a private company, I am not a private company".
    anonymous
  • Govt zero sum game

    Sorry on re-reading there were some typos...

    There is a crazy idea that the govt has in its head that it can "make money" (and reduce taxes) by selling govt information back to the same people and companies that pay the taxes. It is a zero sum game.

    All it means is that citizens either:
    a) Pay higher taxes and get the freedom to use govt data as they wish
    b) Pay lower taxes and a "hidden tax" to use the data (either directly or buried in the costs of products that have to license the data)

    Government- stay after school and write a hundred times on the blackboard "I am not a private company, I am not a private company".
    anonymous
  • Government *is* planning to release data under CC licence

    ... coincidentally, just after reading this, I stumbled across the following statement from within Vic Government (where I work).

    "The Victorian Government supports the release of PSI for re-use with the expectation it will lead to increased commercial activity, provide primary data to researchers in a wide range of disciplines, and increase transparency of government in Victoria.

    The committee's finding that that it is likely that Creative Commons licences could be appropriately applied to around 85 per cent of government PSI (== 'public sector information') underscores the scale and significance of the task the Victorian Public Service has ahead of it.

    http://www.diird.vic.gov.au/diird-projects/access-to-public-sector-information

    ... I suspect other governments are intending to do the same. It just takes a *long* time to do anything in Government, largely because the public expect that we can't make any mistakes, so we have to be very, very careful :-).

    (obligatory "I don't speak for the vic gov etc. etc.)
    anonymous
  • Wow, this is such a shame.

    I worked on the Gov2.0 project, looking at online video, but I was there for the free the data discussions.

    It is just crazy that this could have happened. Such a high-profile site.

    I am sure that the folks that did the site were so rushed and so I think that we can forgive them.

    Let us hope that we start to see less of this and we see machine readable data be the norm.

    It is perhaps coincedence that the last data.gov is just before the Gov 2.0 Taskforce packed up.

    I would love to know what is happing with the Gov 2.0 report

    Jimi Bostock
    PUSH Agency
    Brisbane | Canberra | Sydney | Australia
    jimi@pushagency.net
    JimiBostock
  • ah, yes, it is this make mistakes thing. It actually takes me back to when the Gov 2.0 project started.

    Right at the start they decided to do some crowdsourcing with their logo. Well, the natives went ape. People poured their scorn on 'govt getting free work".

    Me, I was sitting and watching that and I thought, hey, we are supposed to be the release early, release often, perpetual beta crowd and here we were yelling at the govt for making a mistake.

    So, I posted that on the Gov 2.0. Other people agreed. Then the TaskForce came on and said, hey, look, it was probably a mistake but we were trying to be cool. Everyone calmed down and away we all went on what was a nearly year ride.

    It is for sure that this risk issue is the biggie in any Gov 2.0 capers.

    Maybe we can ask the people. Hey, we could do it by social media / social networking. We could just ask if the people think that govt should free the data and if they will be more forgiving of mistakes by governments if that happens.

    Just a crazy thought, let's ask the people :)

    Jimi Bostock
    PUSH Agency
    Brisbane | Canberra | Sydney | Australia
    jimi@pushagency.net
    JimiBostock