If Guillermo Cecchi wants to figure out if you've taken MDMA or meth, all he needs is a computer and a recording of your voice. Cecchi is a computer scientist at IBM, and part of a growing community of scientists who think our voices can reveal far more than our sex, age, or cultural origins. He thinks it can also unlock the mind — and the various psychological and neurological states our brains may be experiencing at any given time.
"This is exactly what psychiatrists do every day: they talk to the patients," Cecchi says, "but we used machine learning and mathematics to replicate it."
In a study published earlier this year, Cecchi used recordings of short interviews to determine which drug his test subjects had been given prior to the experiment. His results rely largely on language and its meaning. "What we did on the analytics side was to use machine learning techniques that can measure things like semantic distance" — the symbolic distance between words with related meanings. "Chair" and "table" are semantically closer than "chair" and "flower" for instance. "We can identify individual interviews with high accuracy with regards to the drugs they took just by computing the semantic distance to between a handful of concepts."
people on ecstasy don’t say "like" and "you know" as often
With regards to MDMA, those concepts were friendliness, rapport, and empathy. "There was a higher similarity to these words in the interviews with a high dose of ecstasy," Cecchi says. He also found that people on MDMA used fewer "catchphrases" and jargon. When contemporaries talk to each other, "the word ‘like’ is typically 10 percent of the words." But people on ecstasy don’t use terms such as "like" and "you know" as often. Their speech, he says, is much more fluid.
Yet Cecchi’s drug-related work represents only one example of the information that our voices contain. He’s also used voice recordings to measure speech disturbances in manic depressive patients, and people who suffer from schizophrenia. Moreover, in recent years, scientists have begun to investigate the voice’s potential for diagnosing Parkinson’s disease, Alzheimer’s disease, sleepiness, depression, and even ADHD.
From hyperactive to sleepy voices
Jorg Langner is a mathematician and musicologist at a Berlin-based company called AudioProfiling. He think ADHD isn’t just about movement or ability to focus. That’s why his team is working on diagnosing children with ADHD using voice recordings. "Speech rhythm of an ADHD child" is different from a child without ADHD, he says. "The length of syllables are less equal in length." This is but one example of the measures he makes, and he says that, so far, his team has classified 1,000 previously diagnosed children with "above 90 percent" accuracy.
The "speech rhythm of an ADHD child" is different
Langner is also developing technology that will detect when someone is too sleepy to drive. When we’re tired, he says, our "speech rhythm isn’t so precise, it’s inexact." It’s also "not very pronounced."
Jarek Krajewski, a psychologist at the Rhenish University of Applied Sciences Cologne, is working on a similar project — except his team wants to apply sleepiness detection to air traffic controllers. "Sleepiness can be detected with a classification accuracy of about 75-80 percent on unseen speaker, and 80-85 percent on known speaker" in a matter of seconds, he wrote in an email to The Verge.
But detecting sleepy air traffic controllers is just the start for Krajewski. "We have developed a depression-detection system based on 200 subjects," he said. "Another phonetic approach deals with measuring alcoholization, anxiety, confidence, leadership states or personality." He also wants to build a dataset for vocal influenza detection.
Other researchers are taking a more neurological approach. "Our studies essentially looked at speech patterns in patients with Parkinson’s disease," says Rahul Shrivastav, a speech scientist at Michigan State University. "People with Parkinson’s experience changes in their voice quality, in the way they produce their sound, so vowels and consonants aren’t clear," he says. "These are very subtle, they aren’t not obvious just listening to it, but with a computer you can do much more."
Shrivastav’s team is in the early stages. So far, they’ve characterized the vocal changes that occur when the disease is more advanced, but they hope to replicate the findings in newly diagnosed patients. This is important, he says, because there’s "no gold standard test. There’s a whole variety of symptoms that a neurologist will look at and a lot of time they will give the right drugs for Parkinson’s and if the symptoms go away, then that’s what you have." That process means that patients can go more than a decade without being diagnosed — a reality that voice diagnosis, Shrivastav hopes, will be able to change.
some people can't travel to see a neurologist. Voice recordings can help
Max Little, a research fellow at MIT and the director of the Parkinson’s Voice Initiative, is also working on developing vocal diagnostic techniques for Parkinson’s. His team can obtain 99 percent accuracy in lab-based diagnostic tests, but Little notes that getting that level of accuracy isn’t "nearly as easy" with telephone-quality voice recordings. The group is now working on accurate telephone-based diagnostics. This is crucial, Little says, because many people can’t travel to a neurologist. "For them, a piece of software running on a smartphone would be perhaps the only lifeline they have to get useful information about their symptoms."
Alzheimer’s disease might also hold a future with vocal diagnostics, said Karmele Lopez de Ipiña, a computer scientist at The University of the Basque Country in Spain, in an email to The Verge. "The deterioration of spoken language immediately affects the patient’s ability to interact naturally with his or her social environment," she said, "and is usually also accompanied by alterations in emotional responses." Her team used spontaneous speech analysis to identify features, like speech fluency, to detect Alzheimer’s disease. Combined with an emotional response test, the technique boasts over 90 percent accuracy in discriminating Alzheimer’s patients from healthy controls. The ultimate goal of the research, Lopez de Ipiña said, is to identify the disease before the first clinical symptoms appear.
The work done by these researchers differs from Cecchi’s because it relies more heavily on sounds — and the rhythms at which they’re emitted — than on language. Ultimately, however, both approaches rely on computers to analyze the connections that we make in our brains. "What our studies show is that we can measure mental states analytically without the intervention of a psychiatrist looking at the interview," Cecchi says.
eliminating psychologists and physicians isn’t the objective
Of course, eliminating psychologists and physicians isn’t the objective. For Cecchi, the goal is to "codify" medical interviews for future use, so doctors at different hospitals in different cities, for instance, can make use of the data when a patient moves. "Psychiatrists don’t have the time to codify or measure in a way that can used by different psychiatrists," Cecchi says, adding that "we aren’t talking about therapy here, but the decision that is made or the diagnosis that’s made after an interview that happens in 30 minutes."
As for Langner and Shrivastav, both believe that their research will help strengthen previous diagnostic procedures by supplying an additional layer of objective testing. "The goal is to prevent misdiagnosis," Langner says. At the moment, a kid diagnosed with ADHD will have been tested using questionnaires and interviews with a doctor. In these instances, Langer says, a doctor’s impressions are crucial. "In many cases, these are good impressions," he says, "but it still has a great subjective component to it."
Private exchanges in foreign tongues
Despite promising results, many challenges remain. One limitation is that some speech features are very personal and specific to an individual, Cecchi says. Another is culture. "European languages have a lot of things in common, not just language, but also culturally," he says, adding that his group has done voice analyses on people who speak Portuguese, English, and Spanish with similar results. "Now, what will happen with Chinese — we don’t know."
Langner hypothesizes that results will vary widely. "More problems occur when we go to Arabic, Farsi, and Mandarin," he says. "I think if you want to work with these languages, major adjustments will have to be done." But Shrivastav isn’t so sure. "There are differences across languages," he says, "but there are some hallmarks of certain diseases that will impact all of the layers in all the conditions, so the trick is to find those changes."
But the most worrisome aspect of this sort of research is probably the hit to privacy. "With more and more mobile phones, so much speech is being recorded and analyzed, it becomes such an easy signal to access," Shrivastav says. "I think in the next several years you will see a lot more neat things — not just for speech diagnosis." This is exactly the attitude that some critics worry about: already, researchers are working on a phone app to help doctors predict when someone with bipolar disorder might have a manic episode, so it’s possible that technology will soon be used by the public, and the government.
"An Orwellian 1984 world where our sleepiness state is no longer private."
"We could suffer from an Orwellian 1984 world where our sleepiness state is no longer private," Krajewski said. He thinks health and safety concerns may one day legitimize the use of this technology to monitor emotional and physical states. Someone who has a cold and is waiting for a bus might not be allowed inside the vehicle, for example. "According to a public-health regulation you will be not allowed to enter public transport — the bus door remains closed for you."
But the potential for that scenario remains years off. It’ll take a lot of time, and myriad willing participants, to unlock the information that our voices carry, Langner says. "Our brain is a giant network where everything is connected." This means that if we have problems in one location, it will have consequences in other regions of the brain, "especially in the parts that control speech projection," he says. "From that we hope that we can find traces of many other illnesses in speech sounds — but to find those solutions will be a very long process."