| Topic: |
Religions > Atheism |
| User: |
"Uncle Clover" |
| Date: |
05 Sep 2006 10:04:59 PM |
| Object: |
OT: Computer speech algorithm |
I've been formulating an idea for software that would produce entirely
natural-sounding human speech. I wanted to bounce some ideas around to see if
they have any merit, or even if they've already been done.
Basically, the program would consist of a set of sliders for the various
components of oral speech. A virtual larynx would produce sound, and the
sliders would modify it in various ways.
Here are some of the slider parameters I feel would be needed:
<begin parameters>
Lips (a) - 1. Fully closed; 2. Slightly open; 3. Open; 4. Wide open and
5. Fully open.
Tongue tip longitude (b) - 1. On the floor of the mouth; 2. Slightly
raised; 3. Raised; 4. Against the roof of the mouth.
Tongue tip lattitude (c) - 1. Against the lips; 2. On the teeth; 3.
Behind the teeth; 4. Middle of mouth; 5. Back of mouth.
Tongue middle longitude (d) - 1. Front of mouth (unable to reach lips or
teeth with that part of the tongue); 2. Middle of mouth; 3. Back of
mouth/throat.
Tongue middle lattitude (e) - 1. Resting; 2. Slightly raised; 3. Raised;
4. Elevated.
Voice (f) - 1. None; 2. Whisper; 3. Soft; 4. Normal; 5. Loud; 6.
Screaming.
Stut/hard stop (g) - 1. None (like s, f, r, etc...); 2. Slight (like
german "ch"); 3. Full (like the letter "t").
Nasal (h) - 1. Fully open (fully nasal like French "n"); 2. Partially
open (like for the letter "m"); 3. Fully closed (like for "k").
<end parameters>
So for the letter "K", for instance, the string of values would be something
like this (in order of a - h):
2 (lips slightly open);
1 (tip of the tongue on the floor of the mouth);
3 (tip of the tongue behind the teeth);
3 (middle of the tongue at the back fo the throat);
4 (middle of the tongue against the roof of the mouth);
1 (no voice);
3 (full stut);
3 (fully closed nasal passages).
The emotional tone of the voice is less a consideration for me than simply
getting natural-sounding speech. The program would produce a sine wave, each
parameter having a unique effect upon the sound. It would produce more
natural-sounding speech because the sliders wouldn't instantly adjust to each
new individual sound. Rather, they would slide from value to value, mimicing
the movement of the human oratory system as it moves from sound to sound.
Furthermore, the sliders could actually be represented by a series of concentric
circular dial, the "Northern" axis of the dial representing the combination of
dial values which are presently being spoken. This would enable some rather
interesting tricks in that you could adjust the location of the "voice" part of
the dial. In so doing, you can make it say something intelligible at one place
of the dial, then turn the "voice" around to other locations on the dial. The
normal speech can only be heard at one location on the dial, all other locations
would produce a completely different series of sounds which would bear a
mathematical relationship to the original. I think it'd be -really- interesting
to take a spoken sentence and listen to it from the opposite side of the dial
from which it's intelligible - such would be "reverse speech" in a sense.
Anyway, there's the idea. Whatcha' think? :-?
--
L8r,
Uncle Clover
************************************************
The true mark of a civilized society is when its
citizens know how to hate each other peacefully.
************************************************
"A disappearance is when someone has vanished.
A tragedy is when they were photogenic."
- a.t-c's Bo Raxo, paraphrased.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"If you look at the whole life of the planet,
man has only been around for a few blinks of an
eye. So if the infection wipes us all out,
that _is_ a return to normality..."
- Sergeant Farrell, "28 Days Later"
++++++++++++++++++++++++++++++++++++++++++++++++
.
|
|

|
Related Articles |
|
|