Mispronunciations Aside, Computer Speech Has Come A Long Way

Published October 5, 2015 at 4:27 PM EDT

KELLY MCEVERS, HOST:

Well, mispronunciations aside, you'd think computer speech would have come a long way in the last few decades. So let's find out. The first mention of the technology we could find from this program was back in 1984.

ROBERT SIEGEL, HOST:

So we went to the archives to listen to a report from then science correspondent Ira Flatow. Here's a bit of it.

(SOUNDBITE OF ARCHIVED BROADCAST)

COMPUTER-GENERATED VOICE #1: I am very pleased to introduce our next guest for you. Today, Beautiful Betty (ph) will sing "All Of Me."

BEAUTIFUL BETTY: (Singing) All of me. Why not take all of me? Can't you see?

IRA FLATOW, BYLINE: This virtuoso performance is not the product of a rising young star. Neither is the bad imitation of Lawrence Welk.

COMPUTER-GENERATED VOICE #1: Fantastic, just fantastic.

SIEGEL: Actually, I thought it was a pretty good Lawrence Welk for a robot, anyway.

MCEVERS: Again, this was 1984, and Ira Flatow reported on possible uses for machine voices.

(SOUNDBITE OF ARCHIVED BROADCAST)

FLATOW: If you want to leave messages for people now, what do you do? You have to call their homes. You have to call their secretaries. Sometimes you get their answering machines. You never know where your message is going to end up. But if you have a message system like the ADF system developed at IBM's Thomas Watson Research Center in Yorktown, N.Y., you can do this.

You dial the person's office. The computer gets on the line and says...

COMPUTER-GENERATED VOICE #2: Please key press your last name. John Connor (ph) - please key press your password.

MCEVERS: Sounds just like voicemail these days.

SIEGEL: Ira's report on computer-generated voices makes clear the basics are still the basics. Each word is constructed by stacking up the building blocks of speech called phonemes.

(SOUNDBITE OF ARCHIVED BROADCAST)

FLATOW: Let's synthesize a familiar phrase from scratch as they do at the Watson Research Center in Yorktown, N.Y. We'll begin with just the loudness and timing. Remember; this is just the loudness and timing.

COMPUTER-GENERATED VOICE #3: (Unintelligible).

FLATOW: Sounds more like Morse code than a phrase. Now we'll add the overall pitch of the sentence.

COMPUTER-GENERATED VOICE #3: (Unintelligible).

FLATOW: Now one of the most important factors - the resonance made by the vocal track...

COMPUTER-GENERATED VOICE #3: (Unintelligible).

FLATOW: ...And some of the formants.

COMPUTER-GENERATED VOICE #3: To be or not to be? That is the question.

FLATOW: Should be familiar by now. Incidentally, you may think you heard it say the word to or the T. But it didn't, says Thomas.

JOHN THOMAS: Your brain did something called phoneme restoration, and you imagined that the ta (ph) was there, but it's really not.

FLATOW: OK. Let's put the ta in there. That's a fricative, a hard sound. We'll add a little bit of nasalization. Your nose does influence the sound. And finally, you wind up with a phrase that even Shakespeare might recognize.

COMPUTER-GENERATED VOICE #3: To be or not to be? That is the question.

SIEGEL: And the answer is that computer-generated speech was very much to be. But it was just taking baby steps toward permeating our lives 31 years ago.

MCEVERS: That's when Ira Flatow talked to IBM researcher John Thomas for that report on ALL THINGS CONSIDERED. Transcript provided by NPR, Copyright NPR.