News

Speech technology has long languished in the no-man's land between sci-fi fantasy ("Computer, engage warp drive!") and disappointing reality ("For further assistance, please say or press 1 ..."). But that's about to change, as advances in computing power make voice recognition the next big thing in electronic security and user-interface design. A whole host of highly advanced speech technologies, including emotion and lie detection, are moving from the lab to the marketplace.

"This is not a new technology," says Daniel Hong, an analyst at Datamonitor who specializes in speech technology. "But it took a long time for Moore's Law to make it viable." Hong estimates that the speech technology market is worth more than $2 billion, with plenty of growth in embedded and network apps.

It's about time. Speech technology has been around since the 1950s, but only recently have computer processors grown powerful enough to handle the complex algorithms that are required to recognize human speech with enough accuracy to be useful. There are already several capable voice-controlled technologies on the market. You can issue spoken commands to devices like Motorola's Mobile TV DH01n, a mobile TV with navigation capabilities, and TomTom's GO 920 GPS navigation boxes. Microsoft recently announced a deal to slip voice-activation software into cars manufactured by Hyundai and Kia, and its TellMe division is investigating voice-recognition applications for the iPhone. And Indesit, Europe's second-largest home appliances manufacturer, just introduced the world's first voice-controlled oven.

Yet as promising as this year's crop of voice-activated gadgets may be, they're just the beginning. Speech technology comes in several flavors, including the speech recognition that drives voice-activated mobile devices; network systems that power automated call centers; and PC applications like the MacSpeech Dictate transcription software I'm using to write this article.

Voice biometrics is a particularly hot area. Every individual has a unique voice print that is determined by the physical characteristics of his or her vocal tract. By analyzing speech samples for telltale acoustic features, voice biometrics can verify a speaker's identity either in person or over the phone, without the specialized hardware required for fingerprint or retinal scanning.

The technology can also have unanticipated consequences. When the Australian social services agency Centrelink began using voice biometrics to authenticate users of its automated phone system, the software started to identify welfare fraudsters who were claiming multiple benefits -- something a simple password system could never do.

The Federal Financial Institutions Examination Council has issued guidance requiring stronger security than simple ID and password combinations, which is expected to drive widespread adoption of voice verification by U.S. financial institutions in coming years. Ameritrade, Volkswagen and European banking giant ABN AMRO all employ voice-authentication systems already.

Speech recognition systems that can tell if a speaker is agitated, anxious or lying are also in the pipeline. Computer scientists have already developed software that can identify emotional states and even truthfulness by analyzing acoustic features like pitch and intensity, and lexical ones like the use of contractions and particular parts of speech. And they are honing their algorithms using the massive amounts of real-world speech data collected by call centers.

A reliable, speech-based lie detector would be a boon to law enforcement and the military. But broader emotion detection could be useful as well. For example, a virtual call center agent that could sense a customer's mounting frustration and route her to a live agent would save time, money and customer loyalty. "It's not quite ready, but it's coming pretty soon," says James Larson, an independent speech application consultant who co-chairs the W3C Voice Browser Working Group. Companies like Autonomy eTalk claim to have functioning anger and frustration detection systems already, but experts are skeptical. According to Julia Hirschberg, a computer scientist at Columbia University, "The systems in place are typically not ones that have been scientifically tested." According to Hirschberg, lab-grade systems are currently able to detect anger with accuracy rates in "the mid-70s to the low 80s." They are even better at detecting uncertainty, which could be helpful in automated training contexts. (Imagine a computer-based tutorial that was sufficiently savvy to drill you in areas you seemed unsure of.)

Lie detection is a tougher nut to crack, but progress is being made. In a study funded by the National Science Foundation and the Department of Homeland Security, Hirschberg and several colleagues used software tools developed by SRI to scan statements that were known to be either true or false. Scanning for 250 different acoustic and lexical cues, "We were getting accuracy maybe around the mid- to upper-60s," she says. That may not sound so hot, but it's a lot better than the commercial speech-based lie detection systems currently on the market. According to independent researchers, such "voice stress analysis" systems are no more reliable than a coin-toss.

It may be awhile before industrial-strength emotion and lie detection come to a call center near you. But make no mistake: They are coming. And they will be preceded by a mounting tide of gadgets that you can talk to -- and argue with. Don't be surprised if, some day soon, your Bluetooth headset tells you to calm down. Or informs you that your last caller was lying through his teeth.