Play Live Radio
Next Up:
0:00
0:00
0:00 0:00
Available On Air Stations

New Voices For The Voiceless: Synthetic Speech Gets An Upgrade

Ever since she was a small child, Samantha Grimaldo has had to carry her voice with her.

Grimaldo was born with a rare disorder, Perisylvian syndrome, which means that though she's physically capable in many ways, she's never been able to speak. Instead, she's used a device to speak. She types in what she wants to say, and the device says those words out loud. Her mother, Ruane Grimaldo, says that when Samantha was very young, the voice she used came in a heavy gray box.

The text-to-speech iPhone app that Samantha Grimaldo uses has three voice options for her to choose from.
/ Ellen Webber for NPR
/
Ellen Webber for NPR
The text-to-speech iPhone app that Samantha Grimaldo uses has three voice options for her to choose from.

"She used to have to carry this device around that was at least 4 or 5 pounds," Ruane says, "and she was only, like, 70 pounds herself. The poor thing had to carry this back and forth to school every day on the bus." It was miserable having to lug her voice around that way — a clunky box sitting on the seat next to her.

Today, fortunately, Samantha's voice takes up much less space. She types into a special program on an iPhone or iPad, and a synthesized voice in the program says the words aloud. The voice, one of several types on the market, is called "Heather." That's a nice enough name — easygoing and accessible — but Grimaldo doesn't like to use the voice if she can help it.

Her mother has noticed that when the family goes out to restaurants, Samantha prefers to write out her menu choices. Apparently, as she explains to her mother, this is because Samantha has some reservations about the voice itself — the cold metal sound of it.

"Because [it's] weird," Samantha says of the mechanical voice — speaking in the voice itself.

It's not just that the voice is artificial and disjointed. It sounds, Samantha says, "older." Samantha is only 17, and the sound of the voice — deep, methodical, mature — doesn't exactly align with her sense of herself. Like any teenager, she feels self-conscious about it.

"I don't want [people to] hear," she says.

The Voice For The Voiceless

If you don't have a voice, who speaks for you? Today there are more than 60 different options for people who need to use synthetic voices to communicate, but for the majority of people who use them, there is a single answer to that question: "Perfect Paul."

Rupal Patel, a speech scientist at Northeastern University, estimates that between 50 and 60 percent of the people who use synthetic voices use the same one — the Perfect Paul voice. If you have ever heard Stephen Hawking speak, or listened to the weather radio, you have heard the voice of Perfect Paul.

Perfect Paul is used so widely because some studies have shown that his voice is easiest to understand in a variety of situations, including classrooms and public outdoor spaces. Still, some in the community of people who rely on synthetic voices have found the Perfect Paul version frustrating — not because it's a bad voice, but because it's limiting.

In fact, it was through confronting the clear limits of Perfect Paul that speech scientist Patel came to the conclusion that people like Samantha Grimaldo needed new options.

It happened around 10 years ago when Patel was at a conference for the makers and users of synthetic voices.

Rupal Patel is a speech scientist at Northeastern University.
/ Courtesy of Mary Knox Merrill/Northeastern University
/
Courtesy of Mary Knox Merrill/Northeastern University
Rupal Patel is a speech scientist at Northeastern University.

"I was watching a demonstration of a new technology, and someone came up and said something in their synthesized voice, and then someone else came up," Patel says.

Both spoke in the same voice — Perfect Paul's. Then a third person arrived, and another.

"It was the same voice saying different things," says Patel. "And sometimes they were saying the same phrase, but off by a few seconds ... so it felt like it was this echo going on. It was just a strange thing."

Standing there, in the middle of all these radically different people with the exact same voice, Patel had an idea: Isn't there something we can do to make these voices more individuated?

So, around seven years ago, Patel started working to change synthetic voices. When a person speaks, two things are happening. First, the source of speech comes from the voice box, which vibrates to produce sound. Then, the mouth shapes those sounds into speech.

In many people who have speech disorders, it's mainly the second part of the system that doesn't work. "In people with speech disorders, the source is pretty preserved," Patel says. "I thought, 'That's where the melody is — that's where someone's identity is, in terms of their vocal identity.' "

So Patel decided to capture the melody of a voice. She primarily works with kids, and so she asked kids with speech disorders who can still make some sounds to come into her lab and do something really simple. "We just need them to say a sustained sound, like ahhhhh," she says.

Patel can take that sound, run it through a computer and find out all kinds of things about how that person would sound if that person could speak words. "We can determine their pitch, the loudness, the breathiness of their voice, the changes in clarity," she says.

She then takes a recording of the voice of what she calls a "healthy donor" — for example, the voice of a child who is roughly the same age as the child she's trying to help — and gets them to say a large number of words. So she ends up with samples of the sounds they produce when they talk. She then combines that voice with the pitch, breathiness and other characteristics of the child with the voice disorder.

Patel played me examples of two different voices she's created. If you listen, you can clearly hear different pitch and clarity in the different voices.

These voices Patel can make are unique for each individual. Which brings us back to Samantha Grimaldo.

'You Need A Voice'

When Patel was getting started, Samantha was one of the first kids with a voice disorder who came to her lab to give a voice sample. At the time, Patel wasn't at the stage where she was actually constructing voices. But she's since figured it out, and recently, she created a new voice using Samantha's ahhhhh sample.

Last week, she gave the personalized voice to Ruane and Samantha so they could hear it. The voice was constructed from a sample taken when Samantha was much younger. For a current version of Samantha's voice, you'd need to take a new sample. Still, it was the first time that Samantha and her mother had heard anything close to Samantha's voice.

Ruane had listened earlier in the day, when Samantha was still at school, and was clearly deeply moved by the experience. It made her realize in a fresh way, she says, how difficult it had been for her to never hear her daughter's voice.

"When I heard it, I thought, 'Yeah! This could be it!' " Ruane says through tears. To her ear, the voice had a sweetly familiar quality. "My son — my son Nicholas — I could hear some of his voice in it," she says.

And so, when Samantha got home from school that afternoon, they sat down together to listen. Samantha's young voice, it turns out, is clear and light.

Ruane told me that when Samantha heard the voice, her eyes lit up and a smile broke out on her face. Both thought that the voice sounded happy.

Personalized voices like these aren't yet available to everyone. Patel has figured out how to do it, but not how to make it work on all of the different electronic devices that people use to play a synthetic voice. But Ruane Grimaldo hopes that voices like these will be available one day, very soon.

"You need a voice," she says. "You need a voice."

Copyright 2021 NPR. To see more, visit https://www.npr.org.

Alix Spiegel has worked on NPR's Science Desk for 10 years covering psychology and human behavior, and has reported on everything from what it's like to kill another person, to the psychology behind our use of function words like "and", "I", and "so." She began her career in 1995 as one of the founding producers of the public radio program This American Life. While there, Spiegel produced her first psychology story, which ultimately led to her focus on human behavior. It was a piece called 81 Words, and it examined the history behind the removal of homosexuality from the Diagnostic and Statistical Manual of Mental Disorders.