In Vegas this January, Dan Emodi had an emotional breakthrough.
He was showing off a new product at the Consumer Electronics Show, doing back-to-back meetings and demos for days on end, promoting his company Beyond Verbal to anyone who would listen. Beyond Verbal's product is an unusual one, a program that listens to streams of audio and sends back a quick summary of the speaker's emotional state. The company has already used it as an enterprise product, outfitting call centers with an easy way to assess callers, but Emodi wanted to get the capability to consumers, moving to a larger stage than they'd ever seen before.
After a few days, the grind took its toll. At one late-afternoon demo, Emodi tried to project enthusiasm only to have the app spit back a fatigue reading, with undercurrents of loneliness. He didn't understand: "I said, 'I'm sorry, there must be a problem with our servers,' And I pressed the play button and I heard my voice and at that point I understood that the problem was not the app, the problem was myself. I was washed out."
This, roughly speaking, is the promise of emotionally intelligent software. Natural-language processors like Siri and Google Voice ignore intonation entirely, breaking a given statement down into written text, but they lose a huge amount of information that way. When human beings talk, our intonation carries most of the crucial details. How urgent is that project, really? Is now a good time to ask for a favor? Tapping into that data opens all sorts of new capabilities in empathetic design, anticipatory advertising and even basic computer interaction. If a Siri-like interface could pick up on emotional cues, it could anticipate needs in a completely new way. It's also a crowded field, with face-reading emotion software being developed in university labs and a recent phone prototype from Samsung's R&D group.
If a Siri-like interface could pick up on emotional cues, it could anticipate needs in a completely new way
Beyond Verbal's system relies entirely on the voice, to the point that the app itself comes off as disarmingly simple. It's called Moodies, arriving on iOS today, and if you talk to it for 20 seconds, it will produce an on-the-spot emotional diagnosis. Testing it out around The Verge offices, the big surprise was how much the diagnosis varied depending on what a person was talking about. Call someone into the boss's office and you're most likely to get Self Control. If someone was excited about a project, it showed up as Love, with undertones of Command and Belonging. If someone was reading from a book, as we tried with various early experiments, it threw off the readings entirely. And while the results seemed obvious on a second listen, they were almost always a surprise at the time. More than anything, emotion-readers seem to tell us that we don’t really know how we’re feeling.
Naturally, the guts of the program are more complex than the app would suggest. Beyond Verbal's scheme is based on the work of Yeshiva University psychologist Robert Plutchik, who broke down emotions into eight basic components. Even complex and ambiguous emotions can be represented by combining or intensifying the same eight elements. The result is a strange kind of emotional math. Anxiety is coded as "anticipation + fear." Disappointment is "surprise + sadness." Beyond Verbal's algorithms add an extra layer of complexity, connecting those eight central emotions to a patchwork of more than 400 vocal patterns. A pattern of stammering might link up with fear while a certain quavering vocal tone signifies anticipation. If the machine notices both of them together, it follows Plutchik's math to diagnose anxiety.
The more data the program swallowed, the more powerful it became
Of course, the theory is just the beginning. On top of that, Emodi's team added almost a decade of data collection and fine tuning. For four years, Beyond Verbal ran a kind of public beta on their website, taking voice samples from random users and asking them to grade the results. In total, they worked through more than 40,000 voices before taking it to an even bigger stage. Starting in 2009, the company started selling their software to call centers, where they could pull not just voice recordings but intricately structured data on the circumstances and resolution of each call. That data also meant they could show the program working. The team built a special module for the call centers, showing a red light when a caller was angry and a green light when they were happy enough to be forwarded to the sales department. Operators were monitored too, and guided towards more personable emotional cues. According to Emodi, the average center saw productivity gains of 10 to 40 percent. The more data the program swallowed, the more powerful it became.
The upcoming partnerships, begun by Emodi's fatigue-ridden CES meetings, would make the program even stronger. Building a voice sensing program into a multipurpose app or a mobile OS would unleash a torrent of new data, letting the algorithm grow even more intuitive about the cues of human feeling. It's a virtuous cycle, with good data feeding better results, which feed back into even more data. Taking this route, the path to emotional intelligent programs is just a straight line through larger and larger quantities of audio.
An important and inconvenient question: What are these tests actually measuring? Behind the highbrow theory and ground-level research, what does Beyond Verbal's diagnosis actually mean? Emodi will tell you it comes down to brute biology. He traces the vocal inflections to changes in the limbic system, the physical equivalent of the sensations we know as emotion. On the theory side, Plutchik goes even deeper, tracing the component emotions to human reactions to the basic problems of life, like identity, hierarchy, and temporality. We feel joy in gaining something, and sadness in losing it; as long as human beings are gaining and losing, Plutchik tells us we will feel something akin to joy and sadness. It’s a powerful idea, especially when it’s combined with data-driven voice analysis. Following the assumptions of Plutchik and Beyond Verbal, you can trace the thread of a vocal tic all the way back to the elemental struggles of human existence.
There's a rush that comes with putting a word to another person's emotional state
But from another angle, that tic could mean nothing at all. Plutchik has a good theory but it's just a theory, and it's not the only one. As critics point out, his theory doesn't have much to say about emotions like pride or shame. Beyond Verbal had help from results on the ground: they really did see improved performance at the call centers. But why? Had they tapped into the basic emotions of the callers, or did they just train the operators to respond more effectively? Is Moodies scanning for emotion, or just measuring and cataloging the quirks of a person's voice? Can the two be separated at all? They're hard questions to dismiss entirely, especially as emotion sensors like Beyond Verbal make their play for top-dollar acquisitions. Taking a stand on emotional intelligence also means taking a stand on the nature of emotion itself, which can make for shaky ground.
At the same time, the trick itself isn't getting old. There's a rush that comes with putting a word to another person's emotional state, perhaps knowing more about them than they know themselves. "You can see all the emotion, you can follow it from one emotion to another," Emodi says. "You can take that into cars, TVs, and smart appliances." After years of trying out the tech, he's addicted. He uses it to practice pitches, screen potential hires, and monitor his tone when he needs to scold his kids. And why wouldn't you? For Emodi, it's the best kind of information there is. "We say what we want to say, but what's really important is how we feel."