madison

Singing search engines have it all wrong

By | July 29, 2007, 11:45pm PDT

On an off-topic item, I came across fellow blogger Roland Piquepaille’s blog about search engines that let you find music by singing to the computer.  Well Midomi has just such a search engine where you sing to the computer to find the song you’re looking for.  While Midomi sounds interesting, it flat out doesn’t work in practice.  It’s not just the hardware requirements that are failing; it’s the human that’s the weak point.

One day I heard a song on the radio that I liked and I couldn’t catch the name of the song and I recalled hearing about Midomi from somewhere so I fired up the webpage as soon as I got home.  I couldn’t really use this in the office because I didn’t have a microphone and even if I did I wouldn’t have used it because I would have felt embarrassed.  I consider myself in the minority since I have a working Microphone hooked up since I have a Polycom Communicator but most people don’t so that’s one major limitation of these types of search engines.  But even assuming that problem can be overcome, we’re still a long ways off from a working solution.

So when I fired up Midomi and got my Microphone working, I find my voice cracking because I hadn’t warmed it up yet and it’s been about 17 years since I’ve been in a Symphony Chorus so I’m out of practice.  I finally manage to sing the right tune in to the computer but no luck finding the song because I only knew a few notes to the song.  I ended up spending an hour typing in the few words of the song that I did know in to Google and I finally managed to find the song.  So while the concept was certainly interesting, it was utterly useless from a usability standpoint.

So that got me thinking about how I would approach the problem in a way that combines the best of the text and note search techniques in an easier to use interface.  While I’m certainly no pianist or anything close to one, I know I can hunt and peck out a few notes and I’ll bet most people can.  If Midomi had something like this flash based piano for instance, I wouldn’t need a Mic hooked up and I wouldn’t need to warm up my voice.  Even people who can’t carry a tune have a chance to hunt and peck the notes.  Of course it wouldn’t need to be at the right key and the search engine could transpose through every key to search for the right song and you’d be able to make adjustments to the notes one at a time.

Furthermore, words could be directly attached to each note if you can visually see the notes which means the search parameters would be greatly narrowed.  Even if all you had was five words correlating with six notes, that would almost precisely pin the song down.  You don’t need a word for every note and there could be blanks left in place, but the more information there is the easier it is to narrow the search results.  While there may be songs that share similar sequences of notes or similar sequences of words, the odds that they would share the same words corresponding to the same notes would be highly improbable.

Now granted, not everyone will be able to hunt and peck on a piano keyboard but every computer has a mouse while few have working Mics attached.  The most logical solution would be to have both user interfaces available for the user and let them choose what they’re comfortable with or what they’re limited to.  We’re not asking the user to play a whole symphony here; just a few notes. Each note can be wrong and it won’t be committed to the search parameter until the user hears the right note and confirms it. If you had a really smart search engine, real time possible results should start playing back as you’re pecking out the notes.

So to the people at Midomi or whoever else may be reading this, how about it?  Can you give me this search engine that I’ve described?

Kick off your day with ZDNet's daily e-mail newsletter. It's the freshest tech news and opinion, served hot. Get it.

Topics

Disclosure

George Ou

http://blogs.zdnet.com/Ou/?page_id=557

Biography

George Ou

George Ou, a former ZDNet blogger, is an IT consultant specializing in Servers, Microsoft, Cisco, Switches, Routers, Firewalls, IDS, VPN, Wireless LAN, Security, and IT infrastructure and architecture.

Talkback Most Recent of 25 Talkback(s)

  • So....?
    What song was it?
    ZDNet Gravatar
    Real World
    30th Jul 2007
  • Yeah, out with it,
    before we start making guesses
    ZDNet Gravatar
    JetJaguar
    31st Jul 2007
  • For lack of a reply, George,
    I'm going to have to assume it was an NSYNC song!
    ZDNet Gravatar
    Real World
    31st Jul 2007
  • LOL, please, NO!!!!!!!!!!! That's cruel!
    But seriously, be nice. Everybody seems to pick on those guys (or guys like them). The worst was in one of the X-Men movies where boy-band music came on in Cyclopes car, LOL.
    ZDNet Gravatar
    georgeou
    31st Jul 2007
  • And to top it off, they killed off the poor guy in the first few minutes
    And to top it off, they killed off the poor guy in the first few minutes of the last one. The guy was never brutalized like that in the Comic.
    ZDNet Gravatar
    georgeou
    31st Jul 2007
  • See this is why I'm ducking this question
    See this is why I'm ducking this question. It doesn't matter what I say, it's going to be laughed at by someone happy.
    ZDNet Gravatar
    georgeou
    31st Jul 2007
  • Interesting Off the normal topic
    However, Voice Recognition technologies have came a long way in the past decade. Still, I wouldn't think a database like that would be useful because of rate of growth and such. Although, I like the idea, I wouldn't see how you could make it practical. However, perhaps that is why I am a Sys Admin and not a developer.
    ZDNet Gravatar
    nucrash
    30th Jul 2007
  • Not voice recognition.
    What you'd be looking for here isn't voice recognition. Rather, you're looking for particular tones and their relationship to one another. It's the "shape" of the sequence of tones that is of importance in identifying the song, since that shape is the same regardless of the musical key you're using (or even if you're singing "between the keys"). Rather than trying to make sense of it, you're just extracting frequency and duration from the sung notes. Then the raw notes are pretty easy to express in compact yet human-readable formats like ABC or Lilypond. Indexing it by the "shape" of the tune would be akin to indexing names using the Soundex algorithm. I'd be really surprised if that's not what's already being done.

    The rate of growth is far, far less than what Google normally handles.

    To address George's problem, I'd recommend something like a Flash version of iABC (http://abc.sourceforge.net/iabc/) in which you can tweak the tune until it's acceptable to your ear, then submit the search. The problem with a virtual keyboard is that hunting and pecking leaves you with the problem of expressing the notes' durations and musical rests. Not everybody knows the piano keyboard, and even those that do can't be expected to play it with a mouse.
    ZDNet Gravatar
    dave.leigh@...
    30th Jul 2007
  • True, but...
    Who would pay for the music library to compare to.

    How would that be licensed?

    Questions I have to ask. Although if you did something to the effect of directing a user to iTunes or Napster music stores, I am sure some one would come along ways with the technology. Infact, you might even get those companies to foot the development bill.
    ZDNet Gravatar
    nucrash
    30th Jul 2007
  • Oy, a barracks lawyer.
    Who would pay for the music library to compare to.

    Oh, you are kidding, right? Who pays for the (also-copyrighted) content indexed by Google? An index is not the same as the content being indexed. And as it is the purpose of a search engine to direct you to the original source of the information indexed, just as the website's purpose is to generate traffic, then the search facility and content providers' goals coincide. As a content provider, you'd have to be a complete blithering idiot to turn down a no-cost (to you) facility that directs wallet-carrying people to your door. Especially when you're getting the ad for free and just need to monetize the traffic you get. (I could understand objecting to a cache, but not the index.)

    How would that be licensed?

    You jolly joker, again with the kidding. And again the same argument applies. An index is not the same as the content. Nevertheless, if such things bother you and you prefer wallowing in obscurity over free traffic you could certainly tag your pages with <meta name="robots" content="NOINDEX, NOFOLLOW, NOCACHE, NOARCHIVE">

    Questions I have to ask. Although if you did something to the effect of directing a user to iTunes or Napster music stores, I am sure some one would come along ways with the technology. Infact, you might even get those companies to foot the development bill.

    Considering that that's where most of the music is, and therefore that's where most of the traffic will go, I think they'd be fools to opt out. But hey! we already know that the music industry is rife with fools.

    But from a technical perspective I need to point out that the solution is the same whether you're searching the entire web or whether you're providing a new service exclusive to iTunes (or whatever) users. The issues you raise are not within the technical scope of the problem. "Can we solve the problem" is a fundamentally different question than "are the RIAA's lawyers cranio-rectally impacted?" For one thing, the first question is not rhetorical whereas the second question is.
    ZDNet Gravatar
    dave.leigh@...
    30th Jul 2007
  • You wouldn't be storing the music
    You wouldn't be storing the music; you would only be storing a representation of the music.

    For example: 0 4 5 7 4 could represent a "key-normalized" version of D F# G A F# or C E F G E. Then the notes you type in would also be key normalized and compared to the search database.
    ZDNet Gravatar
    georgeou
    30th Jul 2007
  • Kind of like voice Google search, sounds cool but hard to work
    Kind of like voice Google search, sounds cool but hard to work since the computer won't always recognize the right words and you can't assume the user has a Mic most of the time.

    As for the note duration, I really doubt that people can sing the right duration if they can't hunt and peck the right duration. While it would narrow the search parameters greatly, I don't think it's absolutely needed since you have a few notes bound to a few words.
    ZDNet Gravatar
    georgeou
    30th Jul 2007
  • They're even less likely to install software
    "To address George's problem, I'd recommend something like a Flash version of iABC (http://abc.sourceforge.net/iabc/) in which you can tweak the tune until it's acceptable to your ear, then submit the search. The problem with a virtual keyboard is that hunting and pecking leaves you with the problem of expressing the notes' durations and musical rests. Not everybody knows the piano keyboard, and even those that do can't be expected to play it with a mouse."

    They're even less likely to install software than going out and connecting a Mic to their computer. As I said in my earlier post, duration doesn't matter and it's no more accurate when someone hums the tune. I asked someone who was initially scared of the virtual keyboard but they told me they thought they could get use to it. But like I said, 100% of the users at least have mice and keyboards to use. I'd guess that a small percentage of users have working Mics hooked up. Heck, even most of my Skype contacts are just for text chat anyways since most of them don't have working Mics hooked up.
    ZDNet Gravatar
    georgeou
    30th Jul 2007
  • Funny thing about software
    People only seem to install it when you tell them it is bad for you. I don't know how many times I have to un-install the weather bug and all that other crap.

    Make the software some sort of ad-ridden crap and everyone will want to install it.
    ZDNet Gravatar
    nucrash
    30th Jul 2007
  • No installation needed
    That's why I suggest "something like a Flash version" of that. I'm not suggesting anybody install iABC; that's just an example. But a Flash or Java applet doesn't need installation, and I'm sure something can be done in that vein to allow you to arranged to point and click the placement of notes.

    It wouldn't even have to be traditional notation either... just bars to show that this note's higher than that one, and drag it out if it's longer. Play the note when you've dropped it. Play the phrase to confirm it all sounds good together. This would be like the "piano roll" editors on some MIDI software (like http://tinyurl.com/2vofty, but I'm not endorsing that either, just pointing out features).

    The specifics of the interface are less important that the concept. An applet to let you phrase your musical "query", a button to submit it. The search engine returns a list of songs containing the phrase, disregarding key or tempo. No mic, no client to install, no off-key humming, no need to learn a musical instrument just to find a song. Wouldn't that fit the bill?
    ZDNet Gravatar
    dave.leigh@...
    30th Jul 2007

Talkback - Tell Us What You Think

Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]
Click Here

The best of ZDNet, delivered

ZDNet Newsletters

Get the best of ZDNet delivered straight to your inbox

Facebook Activity

White Papers, Webcasts, & Resources