Until recently, a conversation with your computer would have been nonsensical at best, one-sided at worst.
“It’s always been promised as, ‘It’s going to work really well, really soon,’ ” said James Landay, a professor of computer science at Stanford University, “but over the years it was never really ready.”
Until now, that is.
Landay, an expert in human-computer interaction, credits a convergence of breakthroughs — everything from cloud and AI to advanced algorithms and cheap noise-cancelling mics — for making speech recognition a soon-to-be game changer.
“A big thing happened three or four years ago,” said Landay, “which is the switch to using deep learning in the speech recognition algorithms. Right now, most of the processing — AI and machine learning — is running in the cloud. That has really pushed the technology beyond a certain level.”
The chattier our computers get, the bigger the implications.
Not only in the home, where products like Siri and Alexa are already offering day-to-day assistance. But in the enterprise, where business leaders and workers should prepare for a new dimension in human-machine interactions.
Already, speech recognition is outpacing humans at typing and texting, as a 2016 Stanford study showed. “We did this study in both English and Mandarin Chinese,” Landay said, “and found pretty much the same results — 2.8 times faster to be accurate.”
Next up? Computers that go beyond fast typing and simple responses to comprehending, learning, and holding up their end in a true dialogue.
“I think the part of just getting speech recognition right, that part is working really well now,” Landy said. “The language understanding is really the next part, that is the hard problem.”
After all, as impressive as Alexa and Siri are, we’ve all experienced those comic moments when they don’t have a clue what we were asking.
“I do think people run into a wall very quickly,” Landay said, “and find what these devices cannot do. The speech recognition works, but now it’s the language understanding issue.”
Landay expects cloud-based AI to eventually comprehend nuances of communication like double-meanings, context, and non-verbal gestures.
“The next step is to do more complex things,” he said. “Can you have interfaces that understand what you’re speaking about and what you’re gesturing with your body, what we call multi-modal?”
Landay predicts such complex interactions within five to ten years, with “intelligent assistants” that master the subtleties of your voice and changing needs. “Emotion even,” he added, “understanding your tone.”
But speech recognition is already a player in business today. And bots may be “manning” a helpdesk near you.
“We already see it in customer support,” said Landay. “Maybe you have a bot at the front-end that tries to figure out what the customer’s problem is. And if that’s just a standard thing we hear over and over, here’s the solution.”
For more complicated interactions, the bot can hand the question off to a human. “We see it as a triage kind of thing,” Landay added. “In the end, a person still has to figure out what to do.”
Speech recognition can also play a role wherever hands-free interactions are needed.
“It’s a really good interface when you’re busy doing something else,” Landay said. “We could imagine speech working in certain factory situations or driving situations.
Obviously noise has been an issue in a factory environment. But that’s being taken care of by noise-canceling microphones.”