Avoiding your own Logies leak moment

Avoiding your own Logies leak moment

Summary: Melbourne newspaper the Herald Sun told the world who had won the Gold Logie before it had even been announced. It's hardly the world's most serious data breach, but it was easily avoidable.

SHARE:
TOPICS: Google
2

Melbourne newspaper the Herald Sun told the world who had won the Gold Logie before it had even been announced. It's hardly the world's most serious data breach, but it was easily avoidable.

This morning, a war of words seemed to be breaking out over who was to blame for "leaking" the result of Australia's top award for television.

"At no time did the Herald Sun publish the name of the winner on its website, iPad app or in Twitter," the newspaper reported. "A link to an embargoed story naming the winner was momentarily created and published by Google."

However, Google pointed the finger back at the Herald Sun, cheekily tweeting a link to information on the Robots Exclusion Standard and the robots.txt file that would have prevented the news story from being indexed against the newspaper's wishes.

On this week's Patch Monday podcast, web developer Dave Hall, principal engineer at Technocrat, explains how robots.txt and another file, sitemap.xml, could have prevented the Herald Sun's problem, and speculates about what went wrong.

These files should be part of every well-run website, he says.

You'll also hear how Herald Sun editor Simon Pristel had already admitted on radio station Triple M that it was all down to a technical glitch at the newspaper's end. The speculation that Google was somehow inside the Herald Sun's systems came from the Triple M presenters.

To leave an audio comment on the program, Skype to stilgherrian, or phone Sydney 02 8011 3733.

Running time: 22 minutes, 21 seconds

Topic: Google

About

Stilgherrian is a freelance journalist, commentator and podcaster interested in big-picture internet issues, especially security, cybercrime and hoovering up bulldust.

He studied computing science and linguistics before a wide-ranging media career and a stint at running an IT business. He can write iptables firewall rules, set a rabbit trap, clear a jam in an IBM model 026 card punch and mix a mean whiskey sour.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

2 comments
Log in or register to join the discussion
  • robots.txt would not necessarily have prevented the leak from occurring. If there was a link to the story's URL somewhere, the story could be indexed, without being crawled.

    From Google:
    "Note: Pages may be indexed despite never having been crawled: the two processes are independent of each other. If enough information is available about a page, and the page is deemed relevant to users, search engine algorithms may decide to include it in the search results despite never having had access to the content directly. That said, there are simple mechanisms such as robots meta tags to make sure that pages are not indexed."
    https://developers.google.com/webmasters/control-crawl-index/docs/getting_started
    JB2-b1d11
    • That's an important clarification, thank you -- especially the point that a page can still be indexed even if the robots aren't the ones who found it.

      I guess it comes back to the point that the page should've have been on the web in the first place.
      stilgherrian