Skip to main content

A sound expert explains the past and future of the human voice

A sound expert explains the past and future of the human voice


From evolution to artificial intelligence

Share this story

Illustration by Alex Castro / The Verge

“Talking is just something you do, isn’t it?” asks Trevor Cox. “We take chatting for granted, and yet it’s such an important part of our lives and such a complicated thing.”

Cox is a professor of acoustic engineering at the University of Salford and an expert in the science of sound. He’s also the author of Now You’re Talking: Human Conversational from Neanderthals to Artificial Intelligence.

The Verge spoke to Cox about the history of the human voice, AI, and the strangeness of the inner voice.

This interview has been lightly edited for clarity.

Photo: University of Salford

To begin, can you tell me about the evolution of human language? From your book, it seems like we still don’t know exactly when or how it happened.

We don’t know for sure exactly how spoken language developed. The controversy at the moment is about whether Homo sapiens were the only people who ever spoke or whether Neanderthals did, too. And the evidence is pointing more and more to Neanderthals having spoken, too.

From my point of view, it seems quite likely that they could talk, and I’d place the advent of language at maybe half a million years old. There are people looking for key events — what triggered the start of speech — and it seems likely that the important trigger was us getting bigger brains. It might have nothing to do with speech. We could have gotten bigger brains because of the need to control our hands, and they just created higher intelligence, which was the starting point for speech. Our primate ancestors cried and made calls, and we would have had that, too.

What is special about the human voice? What is lacking?

Of course, animals like birds can do some quite remarkable things. What sets us apart from primates, our nearest cousins, is that we have very complicated control from our brain. Our brains are really good at controlling the hundred or so muscles needed to talk. We can communicate incredibly rapidly, and that really comes from how we control all those muscles. Part of that is better parts, part is physiology. We have a lower larynx that frees up our tongue to change our words incredibly quickly.

You’ve talked about the efforts of people to preserve their voices. There are some surgical options, but what are the noninvasive ones?

One of the remarkable things about the human voice is that it ages remarkably slowly. I’m just over 50 and I have gray hair, but my voice actually isn’t that different from when I was younger. I think the key is just to keep talking.

Really, your voice is quite robust as long as you’re not shouting or screaming. There are lots of muscles involved, and there are lots of connections in your brain to those muscles that need to be kept working. The more you use it, they keep refreshing themselves. It’s like going down to the gym and moving weights.

Joining a choir is a good idea, not only because you need to exercise the vocal system but because music-making prevents some of the isolation of old age, when you stop chatting with people.

One part of your book is about the “inner voice.” Can you tell me more about that?

The inner voice is fascinating, it’s something that’s with us all our lives, but we rarely think about it. One interesting project I learned about involves researchers who talked to authors about how they use voices in their work. And they said that they have to almost eavesdrop on the conversations of their characters to find their “voice” to be able to write it. So it’s not simple dictation. There’s something about actually hearing the characters in your head. It’s really complicated, it’s like you’re verbalizing thoughts, but other times, it’s not quite you.

You’re working on a project about intelligibility issues in live theatre, TV, and film. Why would we have these issues?

Acting styles and singing styles have changed over the past 120 years. People don’t try to project their voice. They don’t talk in this actory way. If you don’t have to do that, then you can whisper and do different accents. That’s great for freeing up the actors, but it makes people less audible and less intelligible. So you get this situation where people are acting more naturally, but it’s harder to pick up the words. In most conversations, you miss some of the words anyway. But when you’re watching TV, people find it unacceptable that you can’t find a few words. So I’m involved in projects about remixing drama.

How will AI change voices and how we interact with technology?

As soon as you give something a voice, it seems to have agency. I think it’ll make us treat computers differently. It’ll seem to have its own character. We’ll start treating them more like humans. We won’t think of computers as dumb slaves. Maybe they’ll be more like a pet.

Artificial speech is getting better, and we’re going to have more and more people using it as a scam. You’ll see phone calls that you think are from a human but are from a computer. You know those phishing scams that are from friends saying, “I’m lost in some country, can you help me?” Imagine if that was a voice message on a phone. It would be so much more potent. I’m sure we’re going to see fake uses, and people will try to use that as a way to extract money from us. There are lots of consequences down the line.