Business

Develop a VoiceXML solution using BeVocal

VoiceXML (VXML) is a markup language like HTML. The difference is that a phone browser rather than a Web browser renders VXML. Synthesized speech or audio files create the interface between the application and the user, and the user can respond by either speaking to the phone browser or punching the touch-tone buttons on the phone.

Written by ZDNet Staff, Contributor April 26, 2005 at 9:00 a.m. PT

In order to serve up VXML "pages" to the user, you must use a VXML gateway to convert the VXML to speech that is played over the phone. I'll demonstrate how you can create a VXML version 2.0 solution using BeVocal as a gateway, which will connect to a database and read back a message based on user input.

I chose BeVocal because it has a nice and free development environment, which allows me to connect to a remote URL for serving up VXML files. This is necessary for processing user input and querying data from a database server. The BeVocal site requires registration, but I think that's a small price to pay to get the speech application development functionality I need.

In my solution, I want to prompt the user for a PIN and a passcode. Once the user's PIN and passcode are verified, I'll use this information to look up event information for that user from an organizer database. The event information is then read back to the user.

Event information is stored in a database in two different tables: a user information table and an event table. The user information table contains the PIN and passcode information, along with the users' names. The event table contains the event information: pin and passcode, event (type), event_title, date, and time. The pin and passcode combination are used to pinpoint the unique user. The event is an event type: meeting, reminder, etc. The event_title is a short description of the event. The date and time are varchar fields that hold text-to-speech (TTS) data that is easily parsed.

Listing A shows how I prompt the user for his PIN. VXML is much like HTML. If you look at the VXML, you'll notice that the document contains a form. The form contains a field that works like an HTML input. And the prompt is a command that turns the contained text to speech.

When the user enters or says his PIN, the filled event fires. When the filled event fires in this example, the form is submitted via POST to the URL defined in the next attribute of the submit element. The namelist attribute defines what fields are to be passed. If the user does not say anything, the noinput event fires, speaks the contained text, and reprompts the user for his PIN.

When the user enters or says his PIN, the pin variable is filled and passed to the defined URL, i.e., http://someurl/get_passcode.php. Get_passcode.php uses the PIN information to query the database and retrieve the user's passcode. There's a security flaw in my design: You can access the page through a regular browser and get any passcode by passing in a PIN. But I wanted to make sure that I could query the database and return the passcode to match against the user's PIN. I also wanted to use two pages to return VXML.

Listing B shows you how the get_passcode.php page looks. First, the code sets the content type of the response to application/voicexml+xml, which identifies the document as VXML. Next, the posted PIN is used to query the VXML_USERS table for the passcode. If a result is found, the value is used to set the grammar information in the resulting VXML. It’s also used to test the value of the user’s entered or spoken passcode.

Inside the form there are two fields: pin and passcode. The pin field is the value of the posted PIN and is defined by the expr attribute. I’ll use this value in my next PHP page, so I want it to post with the passcode. The passcode field is a prompt; the grammar tag specifies what values are accepted in this field. Here's the syntax of my grammar tag:

<grammar type="application/x-nuance-gsl">

[
    (12345678)
    (dtmf-1 dtmf-2 dtmf-3 dtmf-4 dtmf-5 dtmf-6 dtmf-7 dtmf-8)


]

</grammar>

This means that either the spoken sequence or the touch-tone entered sequence (1 2 3 4 5 6 7 8) is acceptable.

When the user speaks or enters the passcode, this value is compared to the value in the cond attribute of the if element in the filled event. If the value matches, the pin and passcode are then passed to the get_event.php page through submit. If not, a nomatch event is thrown. The nomatch event handler simply reprompts the user to enter his passcode. When the user enters the acceptable passcode, the parser is then directed to get_event.php.

Listing C demonstrates how the get_event.php page selects the events from the VXML_EVENTS table and reads them back to the user.

A block tag works just like a prompt. The parser will parse the text contained in the block tag and read it back to the user. My example creates simple sentences to read the information back to the user. After these are read, the BeVocal system will hang up.

You can test this code by entering the default VXML in the supplied default.vxml file in the BeVocal Café, which is an integrated environment for developing a VXML solution, and supplying your own PHP capable Web server with the included PHP pages.

Editorial standards

Show Comments

Develop a VoiceXML solution using BeVocal

Related

7 reasons I use Copilot instead of ChatGPT

Facebook's Meta AI is lying when it says you can disable it - but here's what you can do

The work laptop I recommend to most people is not made by Apple or Lenovo