Google: Is robots.txt really a copyright infringement defense?

Google: Is robots.txt really a copyright infringement defense?

Summary: Can Google really rely on robots.txt?

TOPICS: Google

Danny Sullivan headlines “Google loses in Belgium newspaper case,” but nevertheless puts forth in his opening paragraph: “Google may have to pay a fine, but the ruling is far more positive for the company.” 

How so? He offers paragraph after paragraph of “evidence” for why he concludes that: 

This case was never about getting content out. It was about trying to blackmail Google into including content.

Sullivan does not share how he knows that leading European publishers are “trying to blackmail Google,” but he references Google to assert that:

The content could have been removed through the use of robots.txt files or meta robots files such as explained on the Google Blog recently.

Google today waves the robots.txt flag once again at its blog: 

If publishers do not want their websites to appear in search results, technical standards like robots.txt and metatags enable them automatically to prevent the indexation of their content. 


In layman terms, Google’s robots.txt defense is similar to a shoplifter saying “If stores do not want their merchandise to be taken without payment, locksmiths enable them automatically to prevent the theft of their products.” 

Just as a shoplifter’s “if you didn’t want me to take your merchandise, you shouldn’t have made it so tempting” defense has no merit, Google’s “if you didn’t want us to take your content, you shouldn’t have made it so tempting” defense has no merit as well, as the Belgium courts suggest:

Brussels court said Google Inc. violated copyright laws by publishing links to Belgian newspapers without permission and ordered the company to remove them (see “Will Google pay for content?”).

ALSO: Google gets defensive, all over the world

Topic: Google

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • A poor analogy

    Your 'stealing from a shopkeeper' analogy is a poor reflection of the usage of the internet. Any information posted to a public web site is, by definition, public knowledge. Copyright still applies; however, Google is not duplicating the article, merely indexing it so people can find the original. If you like shopkeepers, a better analogy might be a sale on ice-cream, where the shopkeeper decides to sue the fellow that comes in intending to buy 50 cartons. Nothing illegal was done; but as a stop-gap, the shopkeeper may put up a sign saying, 'Limit 2 per customer'. In a way, this is what robots.txt does - politely asks people not to use a resource in a way the provider disapproves of - and Google respects that.
    • I Agree

      The analogy is indeed flawed. But the major difference is that we are talking about automated systems performing the caching, where there is no known way of automatically recognizing whether an infringement occurs during the process. The robots.txt tag is used specifically for *robots* to recognize this. Humans have other means of determining this, and thus do not need the flag per se.

      At least in the US, copyright law allows for caching in this way when performed by automated systems, but the rightsholder can still demand that a particular item be removed from the cache. If the law did not allow for this, then even transient copies of data could not be made during a communication unless the machine could determine whether an infringement would occur. The law in the US is designed specifically to prevent the chilling effect this would have on the development and use of communication technologies.
  • Actually

    Google's defense is perfectly valid, and has been standard practice for years. (see:

    If Google did not have that basic defense, along with standard fair use laws, then no large-scale search engine would be as useful as they've become today.

    Your comparison to a shoplifter is flawed, as you're comparing a criminal to someone that didn't commit a crime; you're comparing private property to something more private.

    Pretend you're at a public pool with your friends, and you see a beach ball that nobody is using floating in the water next to you. You pick it up and toss it to your friend, then proceed to play a vigorous game of toss-the-beach-ball-back-and-forth. The ball was right next to you, nobody had laid immediate claim to it, and there were no rules posted saying the use of items in the public pool were off-limits, and nobody told you that you couldn't use it. Is it your fault when someone comes back pissed that you were enjoying yourself with their toy?

    What Google did was perfectly in the realm of fair use.
    • Just realized

      I just realized that part of the issue here is related to Google caching content that was previously allowed to be search-able.
      Section 14.9 explains telling proxies/search engines what they aren't allowed to cache. Again, it's up to the implementors of the site to make sure that data they want private/uncached is marked properly.
  • Very Poor Analogy

    In a store, it is not normal to steal and normal to pay.

    In a search engine it is normal to crawl since users of the search engine want the most information it can get from the Internet and search engines are the best searching tools in the vast size of the Internet.

    In a store, it is normal to pay to get a service to the common buyer.

    In a search engine it is normal use the search engine for free as a tool to search everything you want to find out. Websites are pretty much like TV, Radio and Newspapers. You watch it, you read it, you hear it. Websites do all three.

    TV news reports everything, sometimes the people don't want to be exposed in the news. But the news still gets it. Same thing with the web.
  • Very poor analogy - No merit

    If you don't want your site to be on goodgle use robots.txt or metatags. I don't see a problem with that. What are you expecting? What more do you want from them? I just don't understand or see a better solution.

    Google isn't hiding behind robots.txt and metatags but it is the ONLY efficent way to communicate with a spider like google. If you don't speak up than google is crawl your site like it should.
  • Should have been titled: Is Goggle infringing a Copyright?

    This article is typical of someone who knows very little about copyrights and the fair uses associated with them. I will observe that in most countries, including the US, this would not likely have been found to be copyright infringement, but fair use of publicly displayed material. It is similar to taking a picture of a billboard. The material on the billboard is copyrighted, but your photo of grandma next to the billboard would only be considerd infringement if you reproduced it for profit. This is likely the reason that Google does not dsiplay advertising when you view a cached page. The best analogy is those summary or index guides of published material that you have found in every library for the past 100 years or so. They are designed to help your find something that was published for the purposes of being found.
    • Sorrry, should have been Google...

      I nevuh deid spel tou gud...
    • Well then , I guess the difference is

      much of what was chisled, block printed, handwritten in the past was not copyrighted, whereas many of today's authors copyright their material.

      Fair use does not trump a copyright when it comes to reprinting or publishing without their consent.
      John Zern
  • Is It Illegal To Sell A Map That Shows Points Of Interest?

    By your logic, it is.

    After all, the map is "profiting" by pointing out the location of points of interest without compensating that point of interest.

    That is a much better analogy to your ridiculous one. Google is acting as the map, sending traffic to these sites. Despite your claims, Google *is* paying these sites. It's paying them in traffic.

    It's not using their content without permission, it's directing people to the traffic.

    The value that Google provides is as a guide. It's not making money off the *content* but in better guiding people to content.
  • The case wasn't about copyright infringement

    Donna, along with those paragraphs of "evidence," I also link to plenty of my past articles that document my coverage of the case. If you go to the key one here:

    I explain exactly how I know they are trying to blackmail Google. From that article, some selected parts to enlighten you:

    I had a very long conversation about the permissions issue with Margaret Boribon, secretary general of Copiepresse, to try and better understand how they wanted Google to operate. Why not use commonly understood and effective mechanisms such as robots.txt files or meta robots tags to prevent indexing?

    "If you do so, you admit that Google does what they want, and if you don't agree, you have to contact them. This is not the legal framework of copyright," Boribon said....

    I asked Boribon about this, how her group would propose search engines undertake such a task.

    "I'm sure they can find a very easy system to send an email or a document to alert the site and ask for permission or maybe a system of opt-in or opt-out," she said.

    Would it be OK for such a system to work automatically, I asked? Yes, that would be fine. A machine-to-machine connection would be OK, she said. So then, I asked, why not use the existing robots.txt or meta robots systems?....

    Boribon rejected the existing solutions. One issue she had was that they weren't legally endorsed....

    "Our purpose is not to be excluded. Of course, we want to be in the system, but on a legal basis," said Boribon. "We want to be remunerated."

    Got it? I don't know how much clearer to make it. They don't want to be excluded from Google -- yet they've sued Google for copyright infringement because they refused to use systems that would have kept them out, if that was the goal (which they themselves say it is not).

    On the chance you somehow missed that article when researching your accusations -- or from your past readings of my work -- I've now linked to it for a second time from my most recent article and made it even more explicit where the background for this is coming from.

      Google is proud that it has ?chosen to ignore conventional wisdom in designing its business.?

      Perhaps it would be best for all parties involved, however, if $150 billion market cap Google started playing by good ?old-fashioned? content licensing rules, instead of trying to skirt by on a DMCA and fair-use powered no-fee required content acquisition business model.

      I wrote yesterday "Google gets defensive, all over the world."

      It seems that Google's most ardent believers are getting defensive as well!

      • Not defensive, just factual

        > It seems that Google's most ardent believers are getting defensive as well!

        Shall I flip it around and say that Google's most ardent attackers will do anything to drive up page views, including trying to bait me by suggesting I'm implying that I'm following a Google party line rather than having talked at length to the opposing side (which I have -- you haven't) to draw my own analysis?

        I don't "believe" in Google. If they do something I disagree with, I disagree with them. If I agree, I agree with them. In this case, I'm more concerned with the overriding issue at stake -- is it an infringement for someone to link to an article and describe it in a few sentences? I think not. It's up to each country and its own laws to make that determination, of course.

        In Belgium -- and in this specific case -- Google was found to have violated copyright. But that doesn't take away from the fact that if the case was about not being included, the newspapers could have used the effective means to block spiders that existed before Google was a bee in your bonnet.

        It's convenient for you to reduce things to black and white. They aren't. If you want to examine the facts closely in the case, it's pretty clear there are some things that few are going to find a copyright violation. In particular, the linking to article and describing them in a few sentences.

        The caching of pages is a much bigger issue and something I have previously said that I think Google shouldn't do. Of course, in the US, so far these have actually been ruled as completely legal.

        The use of thumbnail images is more gray. Again, this is a case where I've argued that Google and other search engines should NOT operate under an opt-out basis.

        This article covers these issues in more depth:

        Legal issues aside, I don't think you can ignore that in all of these cases, content owners can very easily exclude their material from Google if they want. It is not some type of grab Google makes without any control. This case never, ever had to go to court if the content owners simply didn't want to be in Google. It went to court quite simply because they wanted to pressure Google into paying them for inclusion.

        In the end, if you disagree with the idea of Google asking people to opt-out of inclusion, then please get a robots.txt file or a meta noindex tag up and get your own content out of Google.