ie8 fix
madison

Singing search engines have it all wrong

By | July 29, 2007, 11:45pm PDT

On an off-topic item, I came across fellow blogger Roland Piquepaille’s blog about search engines that let you find music by singing to the computer.  Well Midomi has just such a search engine where you sing to the computer to find the song you’re looking for.  While Midomi sounds interesting, it flat out doesn’t work in practice.  It’s not just the hardware requirements that are failing; it’s the human that’s the weak point.

One day I heard a song on the radio that I liked and I couldn’t catch the name of the song and I recalled hearing about Midomi from somewhere so I fired up the webpage as soon as I got home.  I couldn’t really use this in the office because I didn’t have a microphone and even if I did I wouldn’t have used it because I would have felt embarrassed.  I consider myself in the minority since I have a working Microphone hooked up since I have a Polycom Communicator but most people don’t so that’s one major limitation of these types of search engines.  But even assuming that problem can be overcome, we’re still a long ways off from a working solution.

So when I fired up Midomi and got my Microphone working, I find my voice cracking because I hadn’t warmed it up yet and it’s been about 17 years since I’ve been in a Symphony Chorus so I’m out of practice.  I finally manage to sing the right tune in to the computer but no luck finding the song because I only knew a few notes to the song.  I ended up spending an hour typing in the few words of the song that I did know in to Google and I finally managed to find the song.  So while the concept was certainly interesting, it was utterly useless from a usability standpoint.

So that got me thinking about how I would approach the problem in a way that combines the best of the text and note search techniques in an easier to use interface.  While I’m certainly no pianist or anything close to one, I know I can hunt and peck out a few notes and I’ll bet most people can.  If Midomi had something like this flash based piano for instance, I wouldn’t need a Mic hooked up and I wouldn’t need to warm up my voice.  Even people who can’t carry a tune have a chance to hunt and peck the notes.  Of course it wouldn’t need to be at the right key and the search engine could transpose through every key to search for the right song and you’d be able to make adjustments to the notes one at a time.

Furthermore, words could be directly attached to each note if you can visually see the notes which means the search parameters would be greatly narrowed.  Even if all you had was five words correlating with six notes, that would almost precisely pin the song down.  You don’t need a word for every note and there could be blanks left in place, but the more information there is the easier it is to narrow the search results.  While there may be songs that share similar sequences of notes or similar sequences of words, the odds that they would share the same words corresponding to the same notes would be highly improbable.

Now granted, not everyone will be able to hunt and peck on a piano keyboard but every computer has a mouse while few have working Mics attached.  The most logical solution would be to have both user interfaces available for the user and let them choose what they’re comfortable with or what they’re limited to.  We’re not asking the user to play a whole symphony here; just a few notes. Each note can be wrong and it won’t be committed to the search parameter until the user hears the right note and confirms it. If you had a really smart search engine, real time possible results should start playing back as you’re pecking out the notes.

So to the people at Midomi or whoever else may be reading this, how about it?  Can you give me this search engine that I’ve described?

Kick off your day with ZDNet's daily e-mail newsletter. It's the freshest tech news and opinion, served hot. Get it.

Topics

Disclosure

George Ou

http://blogs.zdnet.com/Ou/?page_id=557

Biography

George Ou

George Ou, a former ZDNet blogger, is an IT consultant specializing in Servers, Microsoft, Cisco, Switches, Routers, Firewalls, IDS, VPN, Wireless LAN, Security, and IT infrastructure and architecture.

Related Discussions on TechRepublic

Did you know you can take part in these discussions with your ZDNet membership?
25
Comments

Join the conversation!

Just In

Nonsense.
lhgm Updated - 13th Dec
So you find it easier for the average internet user to be able to identify and come up with the right notes on a piano than having a mic on his computer and just whistle the song?
I'll die before I can understand some people's way of thinking.
0 Votes
+ -
So....?
Real World 30th Jul 2007
What song was it?
0 Votes
+ -
Yeah, out with it,
JetJaguar 31st Jul 2007
before we start making guesses
0 Votes
+ -
For lack of a reply, George,
Real World 31st Jul 2007
I'm going to have to assume it was an NSYNC song!
0 Votes
+ -
But seriously, be nice. Everybody seems to pick on those guys (or guys like them). The worst was in one of the X-Men movies where boy-band music came on in Cyclopes car, LOL.
And to top it off, they killed off the poor guy in the first few minutes of the last one. The guy was never brutalized like that in the Comic.
0 Votes
+ -
See this is why I'm ducking this question. It doesn't matter what I say, it's going to be laughed at by someone happy.
0 Votes
+ -
Interesting Off the normal topic
nucrash 30th Jul 2007
However, Voice Recognition technologies have came a long way in the past decade. Still, I wouldn't think a database like that would be useful because of rate of growth and such. Although, I like the idea, I wouldn't see how you could make it practical. However, perhaps that is why I am a Sys Admin and not a developer.
0 Votes
+ -
Not voice recognition.
dave.leigh@... 30th Jul 2007
What you'd be looking for here isn't voice recognition. Rather, you're looking for particular tones and their relationship to one another. It's the "shape" of the sequence of tones that is of importance in identifying the song, since that shape is the same regardless of the musical key you're using (or even if you're singing "between the keys"). Rather than trying to make sense of it, you're just extracting frequency and duration from the sung notes. Then the raw notes are pretty easy to express in compact yet human-readable formats like ABC or Lilypond. Indexing it by the "shape" of the tune would be akin to indexing names using the Soundex algorithm. I'd be really surprised if that's not what's already being done.

The rate of growth is far, far less than what Google normally handles.

To address George's problem, I'd recommend something like a Flash version of iABC (http://abc.sourceforge.net/iabc/) in which you can tweak the tune until it's acceptable to your ear, then submit the search. The problem with a virtual keyboard is that hunting and pecking leaves you with the problem of expressing the notes' durations and musical rests. Not everybody knows the piano keyboard, and even those that do can't be expected to play it with a mouse.
0 Votes
+ -
True, but...
nucrash 30th Jul 2007
Who would pay for the music library to compare to.

How would that be licensed?

Questions I have to ask. Although if you did something to the effect of directing a user to iTunes or Napster music stores, I am sure some one would come along ways with the technology. Infact, you might even get those companies to foot the development bill.
0 Votes
+ -
Oy, a barracks lawyer.
dave.leigh@... 30th Jul 2007
Who would pay for the music library to compare to.

Oh, you are kidding, right? Who pays for the (also-copyrighted) content indexed by Google? An index is not the same as the content being indexed. And as it is the purpose of a search engine to direct you to the original source of the information indexed, just as the website's purpose is to generate traffic, then the search facility and content providers' goals coincide. As a content provider, you'd have to be a complete blithering idiot to turn down a no-cost (to you) facility that directs wallet-carrying people to your door. Especially when you're getting the ad for free and just need to monetize the traffic you get. (I could understand objecting to a cache, but not the index.)

How would that be licensed?

You jolly joker, again with the kidding. And again the same argument applies. An index is not the same as the content. Nevertheless, if such things bother you and you prefer wallowing in obscurity over free traffic you could certainly tag your pages with <meta name="robots" content="NOINDEX, NOFOLLOW, NOCACHE, NOARCHIVE">

Questions I have to ask. Although if you did something to the effect of directing a user to iTunes or Napster music stores, I am sure some one would come along ways with the technology. Infact, you might even get those companies to foot the development bill.

Considering that that's where most of the music is, and therefore that's where most of the traffic will go, I think they'd be fools to opt out. But hey! we already know that the music industry is rife with fools.

But from a technical perspective I need to point out that the solution is the same whether you're searching the entire web or whether you're providing a new service exclusive to iTunes (or whatever) users. The issues you raise are not within the technical scope of the problem. "Can we solve the problem" is a fundamentally different question than "are the RIAA's lawyers cranio-rectally impacted?" For one thing, the first question is not rhetorical whereas the second question is.
0 Votes
+ -
You wouldn't be storing the music
georgeou 30th Jul 2007
You wouldn't be storing the music; you would only be storing a representation of the music.

For example: 0 4 5 7 4 could represent a "key-normalized" version of D F# G A F# or C E F G E. Then the notes you type in would also be key normalized and compared to the search database.
Kind of like voice Google search, sounds cool but hard to work since the computer won't always recognize the right words and you can't assume the user has a Mic most of the time.

As for the note duration, I really doubt that people can sing the right duration if they can't hunt and peck the right duration. While it would narrow the search parameters greatly, I don't think it's absolutely needed since you have a few notes bound to a few words.
0 Votes
+ -
"To address George's problem, I'd recommend something like a Flash version of iABC (http://abc.sourceforge.net/iabc/) in which you can tweak the tune until it's acceptable to your ear, then submit the search. The problem with a virtual keyboard is that hunting and pecking leaves you with the problem of expressing the notes' durations and musical rests. Not everybody knows the piano keyboard, and even those that do can't be expected to play it with a mouse."

They're even less likely to install software than going out and connecting a Mic to their computer. As I said in my earlier post, duration doesn't matter and it's no more accurate when someone hums the tune. I asked someone who was initially scared of the virtual keyboard but they told me they thought they could get use to it. But like I said, 100% of the users at least have mice and keyboards to use. I'd guess that a small percentage of users have working Mics hooked up. Heck, even most of my Skype contacts are just for text chat anyways since most of them don't have working Mics hooked up.
0 Votes
+ -
Funny thing about software
nucrash 30th Jul 2007
People only seem to install it when you tell them it is bad for you. I don't know how many times I have to un-install the weather bug and all that other crap.

Make the software some sort of ad-ridden crap and everyone will want to install it.
0 Votes
+ -
No installation needed
dave.leigh@... 30th Jul 2007
That's why I suggest "something like a Flash version" of that. I'm not suggesting anybody install iABC; that's just an example. But a Flash or Java applet doesn't need installation, and I'm sure something can be done in that vein to allow you to arranged to point and click the placement of notes.

It wouldn't even have to be traditional notation either... just bars to show that this note's higher than that one, and drag it out if it's longer. Play the note when you've dropped it. Play the phrase to confirm it all sounds good together. This would be like the "piano roll" editors on some MIDI software (like http://tinyurl.com/2vofty, but I'm not endorsing that either, just pointing out features).

The specifics of the interface are less important that the concept. An applet to let you phrase your musical "query", a button to submit it. The search engine returns a list of songs containing the phrase, disregarding key or tempo. No mic, no client to install, no off-key humming, no need to learn a musical instrument just to find a song. Wouldn't that fit the bill?
I still think the traditional piano interface is easier than trying to come up with a whole new concept.

I agree with disregarding the key and tempo, be we also need to disregard the duration of each individual note for simplicity sakes. I suppose it could offer an advanced version where a user could specify the note lengths but I doubt most users will be able to handle quarter notes and dotted quarter notes.
0 Votes
+ -
Must try to appreciate. (ymmv)
dave.leigh@... 31st Jul 2007
I still think the traditional piano interface is easier than trying to come up with a whole new concept.

The piano roll concept probably has to be tried to be appreciated. It's only 110 years old, so I guess it is a little new-fangled. wink However, I think a couple of minutes with the electronic version is enough to dispel doubts about users being able to handle durations. You drop bars representing notes next to the proper piano keys and drag them around and stretch them until they sound right. It's very intuitive, and the traditional keyboard is right there. You're right about most users not being able to handle traditional notation, which is why looking at alternatives is worthwhile.

Don't underestimate the value of keeping the note durations. Try an experiment: you'll need a partner. Just tap a few songs out on the tabletop with your knuckles. Eliminate the tune entirely and keep only the durations. Jingle Bells, Theme from Star Wars... whatever. You'll probably be surprised at how many songs they recognize. Rhythm is as much a part of the song as the tune.

Not that you couldn't discard duration in search; I just wouldn't be so quick to do it if you can keep it easily. For one thing, including it would drastically reduce your false hit count. The number of songs that actually use the same sequence of notes is both amazing and inevitable given there are only a limited number of notes in a scale. (For instance, the Theme from Independence Day strongly resembles Das Lied der Deutschen ("Deutschland Uber Alles")). Besides, it helps distinguish the syncopated Jazz version of a song you're looking for from the flood of traditional versions that might swamp your results.

Anyway, just chuckin' ideas out there.
0 Votes
+ -
I think you may have a good point about a drag and drop interface. Flash would work and it might be more intuitive than using the piano for someone who's never touched a piano before. However, I would strongly reserve duration for "advanced mode". I agree with you that rhythm is important. However, I'd like to see how many normal people (who have never studied music) can express or understand the concept of dotted quarter notes and rests.
0 Votes
+ -
One more thing and I'll shut up.
dave.leigh@... 31st Jul 2007
However, I'd like to see how many normal people (who have never studied music) can express or understand the concept of dotted quarter notes and rests.

This sort of interface doesn't have either dotted quarter notes or rests. Or half or whole notes for that matter. It has bars. A longer bar is a longer note. You drag the note until it sounds right. Rests are just the spaces between the notes. (Actually, the thing resembles a Gantt chart as much as anything else and is edited similarly. Imagine a Gantt chart with a piano keyboard running up the left side. Vertical axis=pitch; horizontal axis=time)

An example to show how important it is, and I choose this one because the result is so striking and because it's deliberate. In the musical, The Music Man, "Goodnight My Someone" and "Seventy Six Trombones" share the same tune. The first is a lullaby in 3/4 time; the other is a march in 4/4 time. The difference is so striking that I've seen people slap their foreheads when I point out it's the same tune.

All that said, I agree with you that you'd want the option to search without regard to tempo or rhythm. Not only is it a help in finding the songs you're specifically searching for, but it's could be useful for other purposes. For instance, both as a tool for those who are searching for copyright-infringing tunes; AND for those who are defending against such suits by showing how natural and prevalent such repetition and reuse is (as in, "I'm not infringing your copyright, I'm paying homage to the same Chopin tune YOU cribbed. Goober.").
0 Votes
+ -
Recognizing the notes that a user is singing and searching based on those notes are really two distinct steps, but singing search engines presently combine them into one. That, I think, is the crux of the problem with the current design of singing search engines. I think they're a great idea, but here's how I'd like to see them work: the user sings a snippet of a song into the computer, then an on-screen piano plays back what it _thinks_ the user sang. The user would make corrections at the piano as needed, and then submit the corrected snippet to the search engine.

I also like the idea of a simplified musical scale, or some other visual representation of the music. As others have suggested, it could be something as simple as dots and bars: a user could click them in place on the "scale" with the mouse and drag them up/down (to increase or decrease pitch) or right/left (to increase or decrease duration). If the user knows any words to the song they're looking for, such an interface would also make it easy to attach them beneath the "notes."
0 Votes
+ -
http://www.songtapper.com/
JetJaguar 31st Jul 2007
There's this thing. You tap the rhythm of the song on your keyboard and it tries to find it. Not pitches, just rhythm. http://www.songtapper.com/
0 Votes
+ -
It's a subscription service and I haven't tried it, so I don't know how well it works. It's called Song IDentity and it's in the Get New Ringtones apps.
0 Votes
+ -
This is amazing
Mark Miller 1st Aug 2007
I was thinking about this very problem recently. About a month ago I heard a song at the intro to the D conference interview with Bill Gates and Steve Jobs. I thought, "Oh! Oh! I remember this song," but I couldn't remember what it was called, or what the name of the group was. And the music portion played had no lyrics for me to go on. A while later I thought, "Gosh, I wish I could hum some notes into a music search engine and try to find it that way," because I knew the tune. I've had this problem on a few other occasions.

Then I started thinking about the technical implications of that. What if I hummed off key? What if I got the rhythm wrong? It would demand too much of me. The answer I came up with was just what you said. Set up a keyboard on the search site. It's much more precise and easier for a computer to interpret. The real complication here is since most people are not musicians either, they'd need a bit of a tutorial on what notes can be generated and how, and they'd need a bit to practice without the app. recording what they were playing. Once they got it down, they could record their notes in sequence, then say to the app. "Okay, find this." One thing that might help if they don't know the words, is to have the app. optionally record rhythm as well. This would require more of a literal keyboard interface. You could have each key on the music keyboard match a key on the computer keyboard. That way the user could tap out a tune.

In an ideal situation it would be easier for the person to just hum the notes, but I think a computer would have just as hard a time trying to figure out what the person really meant to hum, as a real person would, trying to figure it out.

I think it might even be harder than a computer trying to interpret speech. Tone matters there, but at least a computer can get most of the speech correct by finding phonemes in the stream. Here, pitch really matters.
You can always normalize the key out of the music. Take a look at this post I made.

http://talkback.zdnet.com/5208-10533-0.html?forumID=1&threadID=36698&messageID=676325&start=-9977
0 Votes
+ -
Nonsense.
lhgm Updated - 13th Dec
So you find it easier for the average internet user to be able to identify and come up with the right notes on a piano than having a mic on his computer and just whistle the song?
I'll die before I can understand some people's way of thinking.

Join the conversation!

Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]
ie8 fix
Click Here
ie8 fix

The best of ZDNet, delivered

ZDNet Newsletters

Get the best of ZDNet delivered straight to your inbox

Facebook Activity

White Papers, Webcasts, & Resources
ie8 fix
ie8 fix