Is search broken?

Is search broken?

Summary: Search engines say they use complex algorithms to help users find exactly what they want Google's "I'm feeling lucky" button (btw, does anybody use it?), right below the search box implies that very thing.

SHARE:
TOPICS: Browser
10

Search engines say they use complex algorithms to help users find exactly what they want Google's "I'm feeling lucky" button (btw, does anybody use it?), right below the search box implies that very thing.

The legions of Ph.Ds working for the search engines publish oodles of scientific papers on complex mathematical concepts related to search.

 Recent Papers Written by Googlers

It all looks very impressive but it seems to have more to do with contributing to the mythology surrounding search--that is very complex and scientific--than to the actual reality of how search is done.

From my vantage point as an online publisher, it is clear that search is increasingly "people-powered" rather than machine-powered. There are millions of people helping the searchbots find information.

Here are some examples and gripes:

- There are many publishers that try to make sure their headlines catch the attention of the search engines rather than catch the attention of readers. The same is true for content, editors increasingly optimize it for the search engines rather than the readers.

- Why should I have to tag my content, and tag it according to the specific formats that Technorati, and other search engines recommend?  Aren't they supposed to do that?

- Google relies on a tremendous amount of user-helped search. Websites are encouraged to create site maps and leave the XML file on their server so that the GOOGbot can find its way around.

- The search engines ask web site owners to mask-off parts of their sites that are not relevant, such as the comment sections,  with no-follow and no-index tags.

- Web sites are encouraged to upload their content into the Googlebase database. Nice--it doesn't even need to send out a robot to index the site.

- Every time I publish something, I send out notification "pings" to dozens of search engines and aggregators. Again, they don't have to send out their robots to check if there is new content.

- Google asks users to create collections of sites within specific topics so that other users can use them to find specific types of information.

- The popularity of blogs is partly based on the fact that they find lots of relevant links around a particular subject. Blogs are clear examples of people-powered search services.

And there are many more examples. If the search engines are so great at doing what they do, then how come we have to do all of the above?

I resent the fact that I have to create all this content describing my content--the search engines should be creating this "metadata."

I just want to write stuff,  and leave it up to the search engines to find it, classify it, index it, and do all the other things their mythology suggests that they do.

In the world of enterprise search, companies such as FAST, Vivisimo, Autonomy, etc, have to find information without the benefit of aids. Corporate documents have no pagerank or tags or much metadata of any kind.

Yet in consumer search it seems as if nothing would be found without a huge amount of help from millions of people every day.

I wonder about the productivity cost to society from all this human labor--work that is supposed to be done by robots.

It's as if these searchbots are blind, and we have to lead them patiently along the street and point things out to them, while they tap away at the world with white canes.

...

 

,

 

 

 

 

 

 

 

Topic: Browser

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

10 comments
Log in or register to join the discussion
  • Need Strong AI

    Most of the human effort described here is necessary because we haven't yet developed machine intelligence that can do those things automatically. It's not that search engines are broken, just that they don't actually understand what people are really looking for.

    What made PageRank work so well is that it could infer, to a degree, the likelihood a human would be interested in a given page, based on human supplied information (such as links) on other pages. But that doesn't really tell you whether a given query should take you to a specific page. It just gives you a nice way to rank the matches from an unintelligent search algorithm.

    If and when strong AI becomes a reality, things will get a lot simpler for us humans. My opinion is that the field of AI has constantly been set back by unreasonable expectations, and from trying to fly before we can crawl. We can't hope to automate human language if we can't even replicate the intelligence of other animals, or replicate the experience of a baby human. We've known for some time that those things come first in natural order, but our impatience causes research to be directed towards the immediately practical.

    Thus, progress towards true machine intelligence is slow, but I do believe we'll get there eventually. Hopefully, companies like Google recognize the need for more basic research and focus more effort there than has been done traditionally.
    sethhoyt@...
    • RE: Is search broken?

      @sethhoyt@...
      so awesome!! <a href="http://www.rolexwatchesuk.co.uk">rolex replica</a>
      yantangseo
  • Search engines

    As a editor and author of technical books, I use search engines to determine what is available on the subject under review, read or review the material and keep copies of those sources that provide value and not opinion or product hawking. This material plus my own knowledge is combined into a publication that tries to provide useful information in an easily read form. The quality of existing technical reports tend to be (1) simplistic or (2) unreadable. Equations are either presented in non-SI format, nonattributed, or are incorrect. SI standardization is ignored and inches, and the rest of that terrible system, is still used in the world of electronics.
    dovbenhos
  • Pointy Hats with stars and moons...

    Tom,

    You're asking for magic. There ain't no magic.

    >Here are some examples and gripes:

    - There are many publishers that try to make sure their headlines catch the attention of the search engines rather than catch the attention of readers. The same is true for content, editors increasingly optimize it for the search engines rather than the readers.
    ---------
    Why do the editors optimize it for search engines? So that they can attract readers who use the search engines. Don't see that this is much of a problem, and it can be somewhat solved by meta-data (more below). The real gripe you seem to have is "Why doesn't a search engine act like 'Me'". Even if the search engines were replaced with small legions of humans, there would be bias in what they looked for, and with search the gateway to content, content providers will tune toward that bias.
    -------------
    - Why should I have to tag my content, and tag it according to the specific formats that Technorati, and other search engines recommend? Aren't they supposed to do that?
    ----------

    No! The meta-data you provide is what you want other humans to pick out as signficant from your writing. Again, think of it as if you were asking another person to tag your writing accurately (I think sociologists refer to this as 'coding').

    It's hard for people. And it's a heck of a lot to ask of a machine, especially when you've hopefully got the keywords right at hand.
    --------
    - Google relies on a tremendous amount of user-helped search. Websites are encouraged to create site maps and leave the XML file on their server so that the GOOGbot can find its way around.
    -------------
    Sure. Manus Lavat Manum and all that. It can get by without, but as above, who better to understand content structure than the creator? Ever try actually browsing the web? There really isn't a standard to webpage structure (though you can usually spot a 'well designed' page from a 'poorly designed' one. If I were a robot, trying to figure out which subpages were most important, I'd have trouble and probably waste a lot of time, without sitemaps. Again, it's information that should exist, and helps out both parties.
    ---------------------
    - The search engines ask web site owners to mask-off parts of their sites that are not relevant, such as the comment sections, with no-follow and no-index tags.
    --------------
    This may be a legetimate gripe, (and is getting some airing in Belgium, I believe. I tend to be suspicious of opt-out type things. However, as a search user, I'm more than happy that most folks allow indexing by default. More content for me to suck in.

    From another angle, you're publishing content for the world to see. If there's pieces that you want to hide from certain parties, it doesn't seem all that unreasonable to ask you to buy some window-shades.
    -----------------
    - Web sites are encouraged to upload their content into the Googlebase database. Nice?it doesn't even need to send out a robot to index the site.
    -----------------
    Once again, sure. Why not ask? It makes it easier for searchers and searchees alike. What's the gripe here? Should Search engines be able to read publisher's minds and 'know' when to send the robots out? I could see asking publishers to send search engines a schedule, but wouldn't that also be a gripe, when one party or the other missed a deadline?
    -----------------
    - Every time I publish something, I send out notification "pings" to dozens of search engines and aggregators. Again, they don't have to send out their robots to check if there is new content.
    --------------------
    Ibid. I don't really see the problem here. You're publishing. Do you NOT want your stuff indexed? If so, don't send out the notifications. If you do want it indexed, how are the search engines supposed to know?
    -------------
    - Google asks users to create collections of sites within specific topics so that other users can use them to find specific types of information.
    ----------------
    There's a lot of social information out there, and the fact that people are best at figuring out what's best for people shouldn't be that surprising. Try getting a machine to understand natural language for a few decades. You'll see what I mean.
    --------------------
    - The popularity of blogs is partly based on the fact that they find lots of relevant links around a particular subject. Blogs are clear examples of people-powered search services.

    And there are many more examples. If the search engines are so great at doing what they do, then how come we have to do all of the above?

    I resent the fact that I have to create all this content describing my content?the search engines should be creating this "metadata."

    I just want to write stuff, and leave it up to the search engines to find it, classify it, index it, and do all the other things their mythology suggests that they do.
    ------------------

    It's the people-created mythology that's the problem (of course it doesn't hurt GOOG stock price, either). Even the "I'm feeling lucky" button is a tongue-in-cheek way of acknowledging that it is just mythology. You only expect it to work, if you're 'lucky', which is not really very much.

    All the engines are asking that you do, is provide them with the bits and pieces that should fall out of your creative process, in order to provide greater access to your content. Doesn't seem like much to resent.

    I mean, if you write something, and then have no idea what the keywords to what you had written should be, or if you create a website, and have no idea what the sitemap is, then you've got a legetimate gripe. They're making you do more work than you would otherwise. But it seems to me that you're whining about having to put it in a specific format, and heck, that's a problem with the tools your using.

    ------------------------------
    In the world of enterprise search, companies such as FAST, Vivisimo, Autonomy, etc, have to find information without the benefit of aids. Corporate documents have no pagerank or tags or much metadata of any kind.
    -----------------------------
    And how well does the automation work? It's an entirely different ballgame. Much, much smaller dataset. Much, Much, Much smaller variance in semantic content (which is what's going to affect the accuracy of the results the most) and the enterprise has top-down control (at least a little) of document structure.
    -----------
    Yet in consumer search it seems as if nothing would be found without a huge amount of help from millions of people every day.

    I wonder about the productivity cost to society from all this human labor?work that is supposed to be done by robots.
    -------------

    As I mentioned before, the productivity cost should be small, because they're asking for reformulations and reductions of exisiting content, not new content. Not only that, a motive for their asking many of the questions is to provide superior visibility/update time to the content providers (as well as the search users). Search can't read your mind, and know when you decide to update. Search machines can't understand information in the way people can, and as such are ill-poised to create useful new content (which is what subject guides, and keyword lists are).

    --jtmodel
    jtmodel
  • Desktop Search surely is broken

    I have been schooling the microsoft experts on desktop search at

    http://channel9.msdn.com/showuserthreads.aspx?userid=31672

    Mine has flowcharts and related program logic for the hi-light
    and line wrap.

    good thread. They won't like it.

    Spectate Swamp
    SpectateSwamp
  • I TYPE "CAR"

    --and the search engine kicks in.I suppose it would choose an alphabetic search here.Next I type in "Car with six cylinders" and once again the engine does its work.This search is done really fast.Where does the engine search?How is the Internet wired?The Internet uses the telephone system and the search appears to take place instantly.That's the whole Planet being searched!Sounds like a job for Harvard or Cambridge.
    BALTHOR
  • Google is broken by purpose?

    I have searched the scientific journals since about 1975.
    Clearly, nowadays Google standard Search is often a joke when it comes to finding good content compared to even simple scientific search programs.

    Seemingly, Google's algorithm has been optimized for displaying bad natural results, from eBay etc. for commercially interesting keywords such as lcd tv, fooling users to click on Google's AdWords sponsored links.

    Google used to give the best search results. Now there are clear indications that Google's search algorithm only works well for Google and associated stores!

    More details:
    http://www.axisnova.com/articles/google-page-ranking-problems.shtml
    Jim Olsson
  • More and More

    Yes! And Google is leading the breakdown.

    More and more lately when searching Google I have been finding giant index sites leading the top of the results. And more often than not those do not include the information I am looking for.

    On these I have found myself following links from page to page and never finding related at all to my search. It's like walking down a hundred blind alleys.

    Searching for computer repair in different towns in western Mass where I live gives me lots in Boston and New York but not one near here. At least not until I get to about the third or fourth page of the results.

    Searches like these used to provide results but not anymore.

    They should have followed the advise "if it ain't broke, don't fix it"!!!!!!!!
    DistinctDispatches@...
  • Broken

    I've been on the thing (internet) since it started. When search engines first came upon the scene you could search for something and what you were looking for was near or at the top. Now you have to dig 3-4 or more pages deep. And I'm really tired of the SEO crap like the author is. I want to publish rich content not search engine rich content.

    What gets me is that they're all the same. From Google to Yahoo to MSN or any other engine you could think of. What someone needs to do is either start a new paradigm shift and either publish search engines that work the way they should or start a new one.
    mcphoto
  • A Relevant Memory

    I've been noticing lately that Google is giving me more junk results. And now I get a cookie from the website that is at the top of the list? Before I even go to that site? Why is Google giving me the cookie?

    And what's with all the sites that steal my search terms and try to act like they have something to say? It seems like more and more Google is becoming a market-place for advertisers and less like a universal library.

    I can remember when Google was about relevant results, and when I tried using Google, I found lots of information that went beyond what I had been thinking of. But lately I'm not finding much more than what I've already heard. Maybe I've finally seen the net!

    I wouldn't say that the engines are broken. But the results sure seem to be busted. I'm thinking it's the result of meta-data management getting more sophisticated.

    If only there was a way to use Google without all that...

    I can remember when Google found things...
    Shadetree Engineer