
The newest service from Google called Picasa Web Albums lets you upload photos as "public" or "unlisted" to a website viewable from anywhere. Until now, unlisted albums were just that -- but Philipp Lenssen discovered some by doing a simple search on Google.
Even though the robots.txt file for Picasa Web Albums disallows indexing, unlisted albums are still being listed (minus the content). I thought robots.txt was intended to prevent search engines from listing anything about "disallowed" pages. In my opinion, the exclusion should not be limited to content.
Consider a private site not to be seen by the public, but hosted on a public web server. Because of this situation, the webmaster would likely create a robots.txt file to stop search engines from crawling -- theoretically eliminating most unwanted exposure.
With Google publishing links to "disallowed" pages in their results, it would be very easy to accidentally expose this website. A simple link on an obscure website could be enough to do the damage.
How do the other search engines match up? From what I can tell, Yahoo is the only one that seems to be following the robots.txt in a way that makes sense. MSN shows similar search results to Google.
Join Discussion