Inside the world of audio branding with Skype’s new pings, bounces, and pops
By Adi Robertson | Illustrations by Peter Steineck
The year that Skype launched its calling service, the world was in the midst of a sonic crisis: the ringtone.
Mobile phones — to which Skype was an indirect competitor — were becoming ubiquitous, and so were the personalized sounds that went with them. Shortly before the company put out the first of several betas in August of 2003, an analyst report predicted that ringtone sales would soon bring in more money than CD singles.
"In 2003, it seems that a person’s most valued and public expression of self seems to be embodied in the customized features of his cell phone," wrote one woman in a BBC opinion poll. "With priorities like these, it’s no wonder we have so many problems in the world today."
Ringtones weren’t just a signal that someone wanted to talk to you — they said something about who you were. And they were a sign of how profoundly a simple interface choice could change an entire environment.
Where cellphones had severed the link between telephones and landlines, Skype went a step further: it separated voice calling from the telephone entirely. Developed by two Scandinavians who’d previously worked on file-sharing service Kazaa, Skype wasn’t the first company to offer voice over internet protocol (or VoIP) services. But it was free, simple, and released at a time when internet speeds were climbing. By the time it introduced a "version 2.0" with video calls and a new design in 2005, it boasted 54 million registered users worldwide.
People went to Skype to hear someone’s voice, and the sounds that accompanied the application were central to the user’s experience. They were carefully designed to reflect a mix of pleasant, familiar noises: a call was marked by the traditional, pulsing ring of a telephone, while other actions triggered a combination of whimsical bounces, pops, whispers, and zooms. But each one was also supposed to say something about Skype — and to help the company’s name become synonymous with online calling.
Audio branding is as old as jingles or the MGM lion’s roar , but it’s only been recognized as a specific field more recently. The Audio Branding Academy, an industry group founded in 2009, says it was aware of 145 agencies worldwide in 2013, up from 126 in 2010. Companies might come to these agencies for everything from a handful of recordings to a sonic identity — a whole catalog of sounds that can be remixed for commercials, online videos, or user interfaces.
This year, Skype is revamping its sonic identity for the first time in 10 years, and it’s turning to a New York-based sonic branding agency called Listen. While reimagining noises like incoming chat pings, call sounds, and error notifications, the small team of audio engineers and designers needs to integrate new apps like Skype for Business, formerly known as work messaging app Lync. It needs to fit Skype into the larger scheme of Windows and Microsoft, another of its clients. And it needs to do so while preserving the identity of one of the most recognizable online communication tools in the world.
For the overwhelming majority of humanity’s existence, the tools we’ve used have come with their own set of audio signals, often unintentional ones. Hammers aren’t designed to give acoustic feedback, but we can hear when we hit a nail squarely . An axe is a rudimentary object, but its aural cues tell us not only whether it’s successfully biting into a tree, but how deep it’s going, and how close it is to striking through.
As our tools and machines have become increasingly digital, they’ve also become increasingly silent — and many of those natural cues and signals have disappeared. Instead, we rely on noises that have been selected or created to give a specific effect. Electric cars with silent motors mimic noisy gas-powered vehicles, for example, because a motor gives bystanders surprisingly complex warnings — how near a car is, how powerful it might be, and how fast it’s going. While physical keyboards opt for silent rubber buttons instead of clicky mechanical springs , we put time and energy into creating sounds for the digital keyboards on our touchscreen devices.
There’s no such thing as a "natural" computer-interface sound. But for decades, an entire industry of musicians, engineers, and advertisers has devoted itself to creating these acoustic signifiers, from the moment we boot up a machine to the moment we shut it down.
In the 1970s and 1980s, one of the most influential computing achievements was the graphical user interface — the switch from entering text commands to arranging tools and folders on a metaphorical "desktop." But there was no equivalent revolution in audio interfaces. Sound is ill-suited to the kinds of interactions we expect from computers. Unlike our eyes, our ears don’t let us shut out irrelevant input or save information until we’re ready to pay attention.
But William Gaver, who did some of the earliest work on sound and personal computing, thought that "auditory icons" could convey as much information as visual ones. As a graduate student in the early 1980s, Gaver studied under engineer and psychologist Donald Norman, whose work focused on how humans interacted with the objects around them. Gaver began his own work on audio design, which led to an internship with Apple in 1986. There, he began an ambitious project: creating an audio counterpart to the Macintosh computer’s recently introduced file manager, Finder.
Gaver argued that users were already accustomed to relying on a computer’s unintended noises: They estimated system activity by listening to the whir of a hard drive, diagnosed printer malfunctions based on their clicking, and used the sound of a modem to tell when they’d gotten online. He imagined extending that idea to complement the Finder interface, a system he called "Sonic Finder." With the help of the Finder team, Gaver went through the code and applied recordings of real-world actions like tapping a metal container and breaking dishes to digital actions like dragging files and opening documents. Gaver eschewed the easiest metaphors. The action of copying a file could make the sound of a photocopier, for example, but a computer file didn’t have separate "pages." So he decided it made more sense to represent progress with the sound of water pouring into a glass, the frequency changing as it got closer to finishing .
More ambitiously, the Sonic Finder differentiated between different types of files and elements of the desktop. The sound of moving a big file, for instance, would be lower-pitched than moving a small one, like dragging a heavy object compared to a light one.
But Sonic Finder didn’t actually become part of Finder, and Gaver left Apple for one of the epicenters of computing research: the Xerox Palo Alto Research Center, or PARC. Programmers at PARC had created the original "desktop" interface, and while Gaver moved on from audio design to other forms of human-computer interaction, other PARC researchers were looking for new ways to let people interact with computers using sound.
A project called Audio Aura played with the audio equivalent of augmented-reality glasses. Relying on wireless headphones and infrared location-tracking badges, Audio Aura dropped sound clips around an office, (ideally) subtly alerting employees to new emails or how long a coworker had been away from their desk. Its creators imagined a combination of voices, musical snippets, and "earcons" that sounded like waves and birds. The project was a rough prototype, and the sounds could trend towards the bizarre: The cry of seagulls, for example, might mean new emails, with a volume of birds proportionate to the unread messages .
As personal computers became ubiquitous in the 1990s, most people’s experience of audio interfaces would be far more mundane. But the decade also produced some of the most iconic sounds in computing history: things like the Apple chime, Windows 95’s start-up fanfare, and the five-note "Intel Inside" sequence.
One of these — the lush major chord that plays whenever a Mac is turned on — was a simple fix for a serious aural misstep. It was created by Apple sound designer Jim Reekes to replace the Macintosh II’s tritone boot sound — an unsettling chord sometimes known as the "devil’s interval." The new tone was supposed to act as a refreshing "palette cleanser" on startup, putting users at ease. ("I was thinking about, you know, you’re going to hear the sound every time it crashes," he later said.) Though Apple’s larger developer team apparently wouldn’t agree to add the sound, Reekes said he surreptitiously added it to the Mac OS firmware anyway. It’s been part of the operating system ever since, with only slight modifications.
Although Sonic Finder never got traction at Apple, the idea of unified soundscapes did. In the mid-’90s, another of Apple’s designers, Jim McKee, started developing audio palettes for one of the company’s new big ideas: multiple, customizable Mac OS themes. Touching or scrolling through just about anything would make a sound, mixed from over a hundred small office noises like binder clips snapping, pencils scraping, and drafting implements being dragged around.
The design was supposed to be subtle enough that users barely noticed. "It added texture to the UI, more than sound," says McKee. "It made it feel like you were actually touching things and moving things." It was even set to slightly vary the pitch and volume of each button click, in the same way that tapping a real-world object doesn’t produce the same sound every time.
Unfortunately, Steve Jobs had just retaken the helm at Apple, and the company was in chaos. McKee designed five palettes for Mac OS 8.5, which were supposed to include customizable visual and audio themes. Apple stripped all but one of those themes out at the last minute. "Steve returned, and pretty much shut it down," McKee says. "What I heard someone say was that he came and reviewed it and said ‘Nobody wants sound coming out of their computers.’"
But if there was sound, companies decided, it should convey exactly what they stood for. The perfect time for this was during the computer’s start-up, which essentially displayed a static advertisement already. When Microsoft approached ambient music pioneer Brian Eno during the development of Windows 95, for instance, it wanted an entire company manifesto packed into the space of a boot screen . "The thing from the agency said, ‘We want a piece of music that is inspiring, universal, blah-blah, da-da-da, optimistic, futuristic, sentimental, emotional,’ this whole list of adjectives," Eno said in a 1996 interview. "And then at the bottom it said, ‘And it must be 3 1/4 seconds long.’" He apparently liked the idea so much he created 84 of them.
Microsoft and Skype’s current sonic branding seems equally complicated. In July, I headed to the New York office of audio-branding studio Listen, where Steve Milton and a handful of other designers have spent six months defining the sounds of Skype’s next iteration.
Milton, who co-founded Listen in 2012, owes his career in part to an airplane trip. When the plane’s speakers dinged to tell passengers they’d reached cruising altitude, Milton noticed that the chord they were playing was a minor third . "I remember being scared and interpreting it as a negative thing, because everyone in Western culture understands a minor third to be sad," he says, as we sit in one of Listen’s modest conference rooms. With a half-step difference, he thought, he would have heard a friendly alert, instead of a sinister warning. "And I remember thinking, Why that sound? Who made that decision? Why is it that way?"
Friendliness, in fact, was central to Skype’s original sound interface. The company’s current design director Steve "Buzz" Pearce remembers the pitch for the program as "the landline of the future," but the team wanted Skype to feel as natural as answering an old phone. When users got a notification that someone was calling, they wanted it to be an intimate, unobtrusive extension of the person at the other end of the line.
Skype’s original sounds, recorded by outside studio Soundtree Music, eschewed the style and tone of the chintzy cellphone ring. "We didn’t want it to become like a brainworm," says Pearce, humming the "Grande Valse" — Nokia’s famous and sometimes tooth-grinding ringtone . "Not like that."
Skype and Soundtree opted to use original sounds as basic building-block elements. "All the actual components [were] recorded organic sounds like wind, water, pops, people’s voices," says Pearce. Wind, he says, provided the white noise in a notification. A bubble pop could be recorded from a ketchup bottle, a glass, or a human gasp or gulp. "We don’t like technical things, even though we are a technical company," he adds.
Once recorded, the sounds were layered on top of each other, creating something abstract but acoustically natural. Skype’s most memorable element was the five-beat incoming call notice, mixed from recordings of a human breath, water, and voices. "If you actually ask people to hum or sing the Skype ringtone, they can’t, accurately," says Pearce. "We did that on purpose, because we don’t want it sticking."
When Listen took on the task of redesigning Skype’s sounds, Milton knew there was a high bar to meet. "The old brand director would talk about how whenever the Skype ringtone would occur, his kids would come running in, and they would anticipate seeing or hearing grandma," he says. "Having that sound and knowing an association is important, so we don’t want to lose the essence of that."
Like other brands, Skype has its own set of key "identity" words, which the interface is supposed to embody — the service is supposed to evoke terms like "clean" and "optimistic," compared to Microsoft’s "trustworthiness" and "security." ("That’s not to say Skype wasn’t trusted," clarifies Maria Ramos, who until recently managed Skype’s brand. "But Skype was seen very much as a quirky, fun brand that you use occasionally.") This identity unifies everything from the full musical ringtone to the short blips of an incoming text message — and provided a roadmap for Listen’s audio designers to follow.
Like Skype’s original audio, the new interface sounds were recorded from organic sources like voices and bubbles bursting, then mixed into individual elements with names like "Whale" (a voice-based whoosh, not an actual whale) and "Pop." When I visit, a couple of these elements are mapped to keys on one of Listen’s keyboards — you can play a song, albeit a very simple one, with them. Some of Skype’s new interface cues, like the ringtone, just feel like higher-quality re-recordings. Others feel more muted and less attention-grabbing than their original counterparts. A call in the old Skype ended with two lively, slightly metallic pops; in the new one, it’s a short burst that quickly trails off.
The differences between services are starker. When Milton plays me the new tones from Skype and Skype for Business, there’s a clear distinction: Skype’s default ringtone is bouncy, while Skype for Business is a smooth ripple that only indirectly suggests the original. I can see the latter feeling more professional, less intrusive.
But it’s harder to link up a concept as anthropomorphic as "trustworthy." Can a couple of chimes and pops actually convey something that complex? When success just means making something people like to hear, is trying to nail down the sound of optimism and creativity worth the extra effort?
To someone who knows almost nothing about music theory, there are moments in Milton’s analysis that seem both clear and clever. Shortly into his explanation of Skype, he points out the roundness of the visual interface, full of bubbles and curves. Then, he plays Skype’s ringtone and pulls up the rolling half-circles of a sine wave. "Literally, that sound looks like this," he says. When he plays the same sound re-filtered to match the more jagged sawtooth wave, it’s tinnier, more abrasive. Skype is circles all the way down.
At other times, sonic branding sounds like a set of obvious rules mixed with an attempt at subliminal messaging. As I ask more about the system, Milton pulls up a 1990 meta-study that maps certain elements of sound to the emotions they could provoke in marketing. The chart, whose conclusions he doesn’t necessarily follow, cross-references musical elements like pitch with emotions like "serene" or "sentimental." For, say, a "humorous" sound, you might want to combine a high pitch, flowing rhythm, major mode, and medium volume.
None of this is unexpected or necessarily nefarious, but seeing it clearly laid out before me provokes a reflexive, almost cliché moment of suspicion — the feeling that I’m being manipulated. Shouldn’t I be the one deciding whether I can trust a brand?
I call Southern Illinois University professor emeritus Gordon Bruner II, the author of the study, and ask. "This idea that we control people is ridiculous," says Bruner. While audio design can elicit certain emotional responses, he suggests, changing a customer’s relationship to a company through sound design is an unrealistic expectation. If I am manipulated, the ultimate effect may be negligible. "We barely can influence them, especially in an economy or a situation where you are constantly bombarded by all kinds of stuff," he says.
At Listen, the words describing a brand’s identity are so abstract that they ultimately feel like a creative prompt. A term can suggest which basic musical guidelines a designer might want to follow, or it could draw them towards specific cultural associations — Listen approached one of its non-Skype compositions by dissecting an "anthemic" Queen song. "This is just creative strategy," Milton tells me, once he’s shown me the sine and saw waves. "This is not a science."
On a second visit to Listen’s recording room, I have a confession. "I think I eventually replaced all the sounds in my phone," I tell Milton. Actually, I swapped out the notifications on an old phone (with some truly grating video game sounds) and then turned them off forever. I don’t actually remember most apps’ audio interfaces, because I’ve spent years going out of my way to avoid them. When I’m talking to McKee, the reason snaps into focus.
One of the only audio interface elements I really like, I tell him, is the paper-crunching sound of emptying the trash in Mac OS X. It doesn’t evoke nostalgia for tossing paper in trash bins — something I’ve never done in real life — but I get a little rush of satisfaction whenever I do it. It’s a well-designed sound, but that’s not all that’s going on.
In addition to the trash crunch, people like the "send mail" sound a lot too, says McKee. "It’s a redundant thing. They end up liking it because it gives you that feedback that ‘Yes, you’ve done something.’"
Clearing the trash means I’ve just made my computer a little more organized. The send email sound means I’ve checked a task off my to-do list. The noises I turn off are usually the ones that give me more things to do: new email notifications, phone calls that interrupt my day. It doesn’t matter how well-designed those noises are — I’ve ruined some of my favorite Android and iOS sounds by using them as morning alarms.
Pearce and Milton both recognize that part of the reason people like Skype’s alerts isn’t the sounds themselves; it’s what a Skype call represents. "It happens in the comfort of people’s homes. It’s a very intimate event, generally speaking, with your loved ones," says Pearce. For many people, it’s been the rare service that’s used mostly to talk to people you know and like, and only when you’ve both agreed to talk to each other.
The new sounds for Skype will be rolling out over the coming months. In general, Pearce says the platform is more about evolution than sudden change. But when he speculates on the future, it sounds radically different. "In many ways, my mission at Skype is to move away from what we call ‘the calling paradigm,’" he says. He imagines a world where you can connect with people instantly instead — "Much like me simply lifting up my head and seeing you straight away."
And a new kind of calling platform would require a different approach to sound branding. If Skype is supposed to feel like interacting with a person instead of a computer, the ultimate goal should be making the brand itself inaudible. "We could easily envisage a world where everyone has their own sonic identity, that you effectively carry for life," says Pearce. We’d be back to the days of the personal ringtone — except that instead of changing a tone on your phone or computer, you’d make a unique sound on everyone else’s. "Skype is not about human-computer interaction, it’s about human-to-human interaction," he says. "Actually, Skype endeavors to get out of the way as much as possible."
And Milton, for his part, sees Listen’s larger mission as simple. "Everything, more and more, will need sound," he says — from planes to computers to cars. "One of the reasons that we started doing this is because we really just wanted to make things sound better," he tells me, as we finish going through the sounds they’ve made for Skype. "Make the world sound better. So it’s like — how can we help to do that?"