Thinking about recommenders

Summary: Perhaps the most interesting thing about google as a content recommendation engine is that it leads directly to the internet echo chamber effect - and that there's probably not a lot we can do about it.

The following questions:

  • What's psradm do?

  • Where's the nearest pizza joint?

  • Which movie should I rent?

  • Which products should be placed near the cash registers?

are all simple examples of question types for which people have developed specialized "recommender" solutions - applications that rank answers, or sources of answers, to user questions in some order of recommendation.

Google and Bing -and their predecessors back to BRS and ORBIT - attempt to answer questions of the first type by enabling boolean word search across multiple documents.

In the most general sense this type of solution requires no contextual information about either the questioner or the question - so early google home pages loaded with no embedded javascript and no server calls for customized information about the user.

The need to meet advertiser expectations changed that as google added contextual recommender layers starting with IP localization and now including search history, to deliver more targeted ads - so google home pages now require significant load time processing to produce less general search results than previous generations did.

The most obvious specialization here has been geo-location: using a chipset and some software on board the local device for critical input on which to answer the second type of question.

Still, this kind of thing has its limits, and the communications burden during session set-up can be significant, so some companies turned back to using more general search engines and embedding contextual information in the queries sent these engines. Thus Apple's siri search application for iDevices is structured as an expert systems application interfacing user queries to the backend search engine through the application of customer specific information stored on the "client", not the server. Thus if you use someone else's iPhone to query Siri about the fastest route home, the people there may be surprised to see you.

The role of social context and the usefulness of word clouds derived by textual analysis is obvious in some cases: the movie rental question is not, for example, generically different from the problems you'd face if asked to rank facebook users in terms of the sales they're individually likely to generate if sent a free bottle of a new shampoo - and the contextual word cloud idea is pretty obviously where you'd start with that one.

Similarly, business intelligence, such as it is, is often concerned with applying contextual data to sales prediction; hence the perception that deciding what items to place in the customer's line of sight near the cash register is best done by combining sales histories with the word cloud surrounding products that have sold well in that position.

Unfortunately all of these recommender solutions suffer from a practical problem known as cold start: whether it's a physical product, a personal blog, or an entertainment, something that's new never makes it to the top of any recommender list unless its description copies or only vaguely extends an existing product or products.

You can, for example, analyze tweet word clouds to determine what sells shampoo and then advertise your new product accordingly, but this is just another form of "search engine optimization" and thus ultimately a fraud on the consumer. Basically, the bottom line on cold start is that the more your product, service, or idea differs from the mass, the less likely it is to be proffered by any of the existing recommender solutions -just as anyone writing a master's thesis is best advised to spend 96 pages praising others, two pages apologizing for offering a new idea, one paragraph describing the idea, and two pages disparaging it.

Unfortunately this recommender engine behavior meshes perfectly with an aspect of human behavior as described by Festinger: specifically that we tend to actively seek out information confirming or supporting what we believe, and even more actively seek to avoid or repudiate contrary information.

Thus one result of the mutual support human nature and search engines provide each other is the internet echo chamber in which it's not currently possible to determine whether the "042-68-4425" story is true or not - largely because both the believers and the deniers just quote fellow partisans.

What we need to balance this is technology that doesn't actively support our willingness to delude ourselves: i.e. a way of asking questions which produces results objectively free of both perceiver and transmitter bias - and thus something that expands rather than reinforces our mental horizons.

Wolfram Alpha tries to do this by focusing on the factual context of the question -and for that reason both illustrates the cold start problem and demonstrates a possible solution to it with respect to quite a large set of questions.

But this won't work for all questions: there are many for which no practical approach free of external context is known. Consider what you'd have to do, for example, if given a million hours of recorded VoIP calls and asked to recommend the three minutes best worth an anti-terrorist team's time.

All the cues you need to do this are in the data, but that's theory: in practice there's no known way to do this without spending a lot of time on contextual information about the speakers. That's the limitation in all of today's recommender technologies: absent a general theory of information content, ordering, and transfer, we've worked out a lot of practical solutions to specific subsets of the problem - but they all depend on context, and context, as demonstrated by everything from google to the parable about bullet proofing academic work, misleads as often as it serves.

Topics: Google, Browser, Enterprise Software

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

10 comments
Log in or register to join the discussion
  • Interesting.

    Something to think about. Predicting human behavior and thought isn't as easy as some of the AI people used to think years ago. Nowadays you don't hear the term "AI" as much any more but "expert system" which a tacit confession that we are a long way from computer intelligence. Expert systems codify a certain domain of knowledge and can be surprisingly useful for many tasks but still have inherit limitations.

    I haven't checked this lately but I remember an article about chess matches between computer and man. The sheer number crunching power makes a machine a formidable opponent against even the best but that given time a human opponent playing many games against a computer can eventually win the majority of games because they can adapt to the machine.
    DevGuy_z
  • Thinking about futility

    <i>All the cues you need to do this are in the data, but thats theory: in practice theres no known way to do this without spending a lot of time on contextual information about the speakers.</i><br><br>Same principle at work when trying to bring so many things closer to the hub or source. Take lawyers for example. Everyone universally despises them (and for good reason) except their own of course. This same thinking often extends to the ambulance chasers' elected kin, the politicians (a prime example of "artificial intelligence" in all its wayward glory), which in turn leads to incumbent bias. At that point everything becomes hopeless.<br><br>"Human intelligence" creating the basis of "business intelligence" -- examples of the reason we invented the word <i>oxymoron!</i> And <i>futility.</i>
    klumper
  • how about reducing the number of duplicates??

    if a search engine can't determine when it's giving you the same result over and over again (by recommending the same site a thousand times in a given set of search results), how can we expect it to be "intellegent"?

    The sheer number of results is often so daunting that it's impossible to read them, except for the first few pages, so the real answer to your question may not be able to be found. Recombing the results and reducing, if not eliminating duplicates would go a long way to finding the data you need.
    sparkle farkle
  • RE: Thinking about recommenders

    Hey! You gonna stop blogging now??? YOU PROMISED!
    N_Bushnell
    • RE: Thinking about recommenders

      @N_Bushnell

      As much as I appreciate your earnestness, directness and enthusiasm it seems appropriate to suggest that in all likelihood there will be one last appeal before we can attempt to hold him to any statements regarding the case you would appear to be referring to.

      But nothing says you can't begin to prepare him for what is to come. So, I'll second you in that regard.
      Still Lynn
      • What about Sun Tsu?

        @Still Lynn
        Out of sight isn't keeping them closer . . .
        Roger Ramjet
  • Time to stop reading.

    Likely, he will NOT stop blogging, though the case is closed, Novell wins decisively, and the door is closed to any further litigation re: IBM et al. Personally, I'm giving him one shot at that crow pie he's got to eat, then it's much time to stop reading. Bye-bye RSS feed.
    dave.leigh@...
  • Stewart Rules: Novell Wins! CASE CLOSED!

    This matter came before the Court for trial on March 8, 2010, through March 26, 2010. Based on the Jury Verdict and the Court?s Findings of Fact and Conclusions of Law, Final Judgment is entered as follows:

    1. Judgment is entered in favor of Novell and against SCO on SCO?s claim for slander of title pursuant to the Jury Verdict.

    2. Judgment is entered in favor of Novell and against SCO on SCO?s claim for specific performance pursuant to the Court?s Findings of Fact and Conclusions of Law.

    3. Judgment is entered in favor of Novell and against SCO on Novell?s claim for declaratory relief pursuant to the Court?s Findings of Fact and Conclusions of Law. Specifically, the Court declares:

    a. Under ? 4.16(b) of the APA, Novell is entitled, at its sole discretion, to direct SCO to waive its purported claims against IBM, Sequent and other SVRX licensees;

    b. Under ? 4.16(b) of the APA, Novell is entitled to waive on SCO?s behalf SCO?s purported claims against IBM, Sequent and other SVRX licensees, when SCO refuses to act as directed by Novell; and

    c. SCO is obligated to recognize Novell?s waiver of SCO?s purported claims against IBM and Sequent.

    4. Judgment is entered in favor of Novell and against SCO on SCO?s claim for breach of the implied covenant of good faith and fair dealing pursuant to the Court?s Findings of Fact and Conclusions of Law. The Clerk of the Court is directed to close this case forthwith.

    SO ORDERED.

    DATED June 10, 2010.

    BY THE COURT:

    ______[signature]__________________
    TED STEWART
    United States District Judge
    junknstuff@...
    • Commonplace and long-expected

      @murph_z ...

      The only surprise is that it took this fscking long.
      dave.leigh@...
    • RE: Thinking about recommenders

      @murph_z
      Are you really stunned? C'mon you knew SCO were fsck'ed a while back eh.
      junknstuff@...