Cyber porn and search engines - Report

A new study from the U.S. has debunked one mainstream myth about cyberporn -- and confirmed many a surfer's sneaking suspicions about search engines.

First, the porn myth.

According to a study conducted by Dr. Steve Lawrence and Dr. C. Lee Giles for the NEC Research Institute, the Web contains about 800 million pages encompassing about 15 terabytes of data and about 180 million images. Contrary to popular opinion that the Web's a haven for porn, though, the study found that only 1.5 percent of Web sites contain pornographic content.

"The sex sites were much less than you would have thought," Lawrence said. In fact, the study, which will be published in the July 8 issue of Nature magazine, found that commercial sites have taken over the Web, 83 percent of sites contain commercial content and 6 percent contain scientific/educational content.

Lawrence said the study gauged the Web's content by random sample -- the study manually surveyed and categorised the content of 2,500 sites whose IP addresses had been randomly selected.

The study's other key finding won't be news to regular search engine or portal users. According to the study, search engine coverage of the Web has decreased substantially since December 1997, with no search engine indexing more than 16 percent of the Web's indexable sites. That means, for surfers navigating the Web via search engines, the Web's 15 terabytes of data is more than ever like an iceberg -- largely submerged. And, for e-commerce sites, not being indexed by the search engines could be the difference between sinking and swimming.

"That could have a substantial impact on their economic viability," Lawrence said. "Because the situation now is relatively unequal, in the sense that ... the more well known sites are the ones getting indexed. Lawrence says the reason for decreasing coverage of the Web is simple -- the search engines just can't keep up with the explosive growth in indexable pages -- but, he assures, "that trend is going to reverse."

Lawrence explained: "At the moment you have a lot of information out there that's not available on the Web." But, once all that information is available on the Web, the avalanche of indexable information getting posted on the Web will slow, allowing the search engines to catch up. And how long will it take for that information avalanche to ease? Lawrence hasn't done precise calculations, but hazards an educated guess: "10, 20 years."

"Engines will be able to improve their coverage over time, but the question is, will they really want to?"

Other findings in the study:

  • Search engines are more likely to index sites that have more links to them (more 'popular' sites).

  • They are more likely to index U.S. sites.

  • Search sites are more likely to index commercial sites than educational sites.

  • Indexing of new or modified pages by just one of the major search engines can take months.

Are you surprised at these findings? Are you satisfied with the results your search engine provides? Tell the Mailroom

Newsletters

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
See All
See All