Voice command-triggered ignition is technically possible – after all, phones can be programmed to dial the number of any person whose name you speak – but there are issues of practicality. For one, the computer needs to be able to understand what you’re saying, and for security purposes you’d want the car to respond to your voice only – and only when you’re speaking to it, and not just in or near it.
“A car key provides three functions,” notes Blade Kotelly, a lecturer in the Gordon-MIT Engineering Leadership Program. “Access to the car itself, access to the ignition mechanism, and the ability to actually start the car. For a voice command to replace the use of a physical key, two speech technologies could be used: one to decode what you say, and one to verify that you are an authorized user.”
The challenge with the first speech technology is that people enunciate differently and have different accents. Just as it can be difficult for an American to understand someone with a heavy French accent, computers can have the same problem. However, the human brain can make decisions based on what the person thinks he heard and square it with what he expected to hear. And as it turns out, computers can do that, too.
“In a typical use-case scenario, a speech system attempts to recognize what a person is saying from a known list of words,” says Kotelly. “For example, if a computer asks ‘What size pizza do you want?’ it would be programmed to try to match what a person says to a list of expected words (‘small’ or ‘medium’ or ‘large’).”
The computer does this by calculating a confidence score for each word, so if a person clearly says “medium” the computer might produce a list like this:
- Medium (98%)
- Large (23%)
- Small (12%)
The computer would then deduce that the person said “medium”. However, if the person said “tedium” the sound of the word is similar and might produce a score like this:
- Medium (88%)
- Large (32%)
- Small (20%)
“Those numbers are still good enough for the computer to assume that the person might have said ‘medium’,” says Kotelly. “But if the person said ‘rutabaga’ then none of the scores would be high enough to imply a match. The computer would probably be programmed to prompt the user to say one of the three words again, and possibly cancel the transaction outright after two more unsuccessful attempts.”
The second technology is called speaker verification and it relies on measurable characteristics of the human voice – called a voiceprint or voice biometrics – that are based on the unique configurations of a person’s mouth and throat. These characteristics can be expressed mathematically and enable the computer to accurately identify the speaker.
“By combining these two technologies,” Kotelly says, ”a computer in a car could listen to a person saying something like ‘My voice is my passphrase’ and know it’s the owner of the car with 99.8% accuracy (meaning that some people might be able to fool the system with their voice, but not many). When voice biometrics are combined with a passphrase that only the owner knows, the system becomes a very secure way for a person to get access to the car, or to start the engine.”
A student project in MIT’s Engineering Innovation and Design (ESD.051/6.902) course, taught by Kotelly and Gordon-MIT Engineering Leadership Program co-director Joel Schindall, recently took home the $10,000 grand prize in the OnStar Student Developer Challenge. Entrants were required to submit voice-enabled applications that would provide improved connectivity to OnStar subscribers.
The winning team – comprising sophomores Drew Dennison, Isaac Evans, Sarah Sprague, and Marie Burkland – developed an application called EatOn, which allows drivers to identify and navigate to nearby restaurants, listen to reviews, and make reservations using just their voice. — Jason M. Rubin
Thanks to Allen Prasad from Tamilnadu, India, for this question.