[bookshare-discuss] Synthesizing human emotions

  • From: "Shelley L. Rhodes" <juddysbuddy@xxxxxxxxxxxx>
  • To: <k1000@xxxxxxxxxxxxxxxxxxxxxxxx>, <bookshare-discuss@xxxxxxxxxxxxx>, <nabs@xxxxxxxxxxxxxxxxx>
  • Date: Thu, 9 Dec 2004 13:05:35 -0500

Things have come a long way from the days of the Apple Echo Synthesizer, I 
had one of those, to the modern natural speech synths like Neo Speech and 
the ATT Naturals.  Though sometimes they still sound annoyed to be forced 
into reading for us, smile.  This would be quite neat.



    AP Worldstream
Monday, November 29, 2004

Synthesizing human emotions

By By Michael Stroh, Sun Staff

Speech: Melding acoustics, psychology and linguistics, researchers teach 
computers to laugh and sigh, express joy and anger.

Shiva Sundaram spends his days listening to his computer laugh at him. 
Someday, you may know how it feels.

The University of Southern California engineer is one of a growing number of 
researchers trying to crack the next barrier in computer speech synthesis - 
emotion. In labs around the world, computers are starting to laugh and sigh, 
express joy and anger, and even hesitate with natural ums and ahs.

Called expressive speech synthesis, "it's the hot area" in the field today, 
says Ellen Eide of IBM's T.J. Watson Research Center in Yorktown Heights, 
N.Y., which plans to introduce a version of its commercial speech 
synthesizer that incorporates the new technology.

It is also one of the hardest problems to solve, says Sundaram, who has 
spent months tweaking his laugh synthesizer. And the sound? Mirthful, but 
still machine-made.

"Laughter," he says, "is a very, very complex process."

The quest for expressive speech synthesis - melding acoustics, psychology, 
linguistics and computer science - is driven primarily by a grim fact of 
electronic life: The computers that millions of us talk to every day as we 
look up phone numbers, check portfolio balances or book airline flights 
might be convenient but, boy, can they be annoying.

Commercial voice synthesizers speak in the same perpetually upbeat tone 
whether they're announcing the time of day or telling you that your 
retirement account has just tanked. David Nahamoo, overseer of voice 
synthesis research at IBM, says businesses are concerned that as the 
technology spreads, customers will be turned off. "We all go crazy when we 
get some chipper voice telling us bad news," he says.

And so, in the coming months, IBM plans to roll out a new commercial speech 
synthesizer that feels your pain. The Expressive Text-to-Speech Engine took 
two years to develop and is designed to strike the appropriate tone when 
delivering good and bad news.

The goal, says Nahamoo, is "to really show there is some sort of feeling 
there." To make it sound more natural, the system is also capable of 
clearing its throat, coughing and pausing for a breath.

Scientist Juergen Schroeter, who oversees speech synthesis research at AT&T 
Labs, says his organization wants not only to generate emotional speech but 
to detect it, too.

"Everybody wants to be able to recognize anger and frustration 
automatically," says Julia Hirschberg, a former AT&T researcher now at 
Columbia University in New York.

For example, an automated system that senses stress or anger in a caller's 
voice could automatically transfer a customer to a human for help, she says. 
The technology also could power a smart voice mail system that prioritizes 
messages based on how urgent they sound.

Hirschberg is developing tutoring software that can recognize frustration 
and stress in a student's voice and react by adopting a more soothing tone 
or by restating a problem. "Sometimes, just by addressing the emotion, it 
makes people feel better," says Hirschberg, who is collaborating with 
researchers at the University of Pittsburgh.

So, how do you make a machine sound emotional?

Nick Campbell, a speech synthesis researcher at the Advanced 
Telecommunications Research Institute in Kyoto, Japan, says it first helps 
to understand how the speech synthesis technology most people encounter 
today is created.

The technique, known as "concatenative synthesis," works like this: 
Engineers hire human actors to read into a microphone for several hours. 
Then they dice the recording into short segments. Measuring in the 
milliseconds, each segment is often barely the length of a single vowel.

When it's time to talk, the computer picks through this audio database for 
the right vocal elements and stitches them together, digitally smoothing any 
rough transitions.

Commercialized in the 1990s, concatenative synthesis has greatly improved 
the quality of computer speech, says Campbell. And some companies, such as 
IBM, are going back to the studio and creating new databases of emotional 
speech from which to work.

But not Campbell.

"We wanted real happiness, real fear, real anger, not an actor in the 
studio," he says.

So, under a government-funded project, he has spent the past four years 
recording Japanese volunteers as they go about their daily lives.

"It's like people donating their organs to science," he says.

His audio archive, with about 5,000 hours of recorded speech, holds samples 
of subjects experiencing everything from earthquakes to childbirth, from 
arguments to friendly phone chat. The next step will be using those sounds 
in a software-based concatenative speech engine.

If he succeeds, the first customers are likely to be Japanese auto and toy 
makers, who want to make their cars, robots and other gadgets more 
expressive. As Campbell puts it, "Instead of saying, 'You've exceeded the 
speed limit,' they want the car to go, "Oy! Watch it!"

Some researchers, though, don't want to depend on real speech. Instead, they 
want to create expressive speech from scratch using mathematical models. 
That's the approach Sundaram uses for his laugh synthesizer, which made its 
debut this month at the annual meeting of the Acoustical Society of America 
in San Diego.

Sundaram started by recording the giggles and guffaws of colleagues. When he 
ran them through his computer to see the sound waves represented 
graphically, he noticed that the sound waves trailed off as the person's 
lungs ran out of air. It reminded him of how a weight behaves as it bounces 
to a stop on the end of a spring. Sundaram adopted the mathematical 
equations that explain that action for his laugh synthesizer.

But Sundaram and others know that synthesizing emotional speech is only part 
of the challenge. Yet another is determining when and how to use it.

"You would not like to be embarrassing," says Jurgen Trouvain, a linguist at 
Saarland University in Germany who is working on laughter synthesis.

Researchers are turning to psychology for clues. Robert R. Provine, a 
psychologist at the University of Maryland, Baltimore County who pioneered 
modern laughter research, says the truth is sometimes counterintuitive.

In one experiment, Provine and his students listened in on discussions to 
find out when people laughed. The big surprise?

"Only 10 to 15 percent of laughter followed something that's remotely 
jokey," says Provine, who summarized his findings in his book Laughter: A 
Scientific Investigation.

The one-liners that elicited the most laughter were phrases such as "I see 
your point" or "I think I'm done" or "I'll see you guys later." Provine 
argues that laughter is an unconscious reaction that has more to do with 
smoothing relationships than with stand-up comedy.

Provine recorded 51 samples of natural laughter and studied them with a 
sound spectrograph. He found that a typical laugh is composed of expelled 
breaths chopped into short, vowel-like "laugh notes": ha, ho and he.

Each laugh note lasted about one-fifteenth of a second, and the notes were 
spaced one-fifth of a second apart.

In 2001, psychologists Jo-Anne Bachorowski of Vanderbilt University and 
Michael Owren of Cornell found more surprises when they recorded 1,024 
laughter episodes from college students watching the films Monty Python and 
the Holy Grail and When Harry Met Sally.

Men tended to grunt and snort, while women generated more songlike laughter. 
When some subjects cracked up, they hit pitches in excess of 1,000 hertz, 
roughly high C for a soprano. And those were just the men.

Even if scientists can make machines laugh, the larger question is how will 
humans react to machines capable of mirth and other emotions?

"Laughter is such a powerful signal that you need to be cautious about its 
use," says Provine. "It's fun to laugh with your friends, but I don't think 
I'd like to have a machine laughing at me."


--------------------------------------------------------------------------------

To hear clips of synthesized laughter and speech, visit 
www.baltimoresun.com/computer

The first computer speech synthesizer was created in the late 1960s by 
Japanese researchers. AT&T wasn't far behind. To hear how the technology 
sounded in its infancy, visit 
http://sal.shs.arizona.edu/~asaspeechcom/PartD.html

Today's most natural sounding speech synthesizers are created using a 
technique called "concatenative synthesis," which starts with a prerecorded 
human voice that is chopped up into short segments and reassembled to form 
speech. To hear an example of what today's speech synthesizers can do, all 
you need to do is dial 411. Or visit this AT&T demo for its commercial 
speech synthesizer: http://www.naturalvoices.com/demos/

Many researchers are now working on the next wave of voice technology, 
called expressive speech synthesis. Their goal: to make machines that can 
sound emotional. In the coming months, IBM will roll a new expressive speech 
technology. To hear an early demo, visit http://www.research.ibm.com/tts/

For general information on speech synthesis research, visit 
http://www.aaai.org/AITopics/html/speech.html

Copyright © 2004, The Baltimore Sun

http://www.baltimoresun.com/news/health/bal-te.voice29nov29,1,550833.story?coll=bal-news-nation




--
BlindNews mailing list

Archived at: http://GeoffAndWen.com/blind/
Address message to list by sending mail to: BlindNews@xxxxxxxxxxxxxxxxxxxx

Access your subscription info at: 
http://blindprogramming.com/mailman/listinfo/blindnews_blindprogramming.com 



Other related posts: