Program to Turn Speech to Phonemes (or Visemes)

  • Follow


Does a program exist that will turn recorded audio into a text file
containing all the phonemes (or visemes) spoken, with time tags?

Ie "[00:00:00] b, [00:00:01] ah"

Or should I look to the SAPI API to write one? :-)
0
Reply matturn (6) 11/26/2004 2:03:18 AM

matt cook wrote:

> Does a program exist that will turn recorded audio into a text file
> containing all the phonemes (or visemes) spoken, with time tags?

A few exist, and none succeed with the kind of accuracy you probably
want.  For example, download Sphinx-2 and use "allphone" mode.

> Or should I look to the SAPI API to write one? :-)

Yes, please.  I've not tried that since they got rid of phoneme
segmentation back in the SAPI 4a-to-SAPI 5 "upgrade."  However, you
still might be able to make a word from the pronunciation of all
phonemes including silence in parallel, and try to recognize zero
or more iterations of that word, but I'm not sure if you would get
the actual phonemes back from the result.  You might have to make
one word per each phoneme, and try to recognized zero or more
iterations of the parallel choice of all the word-phonemes, but
you still might run into trouble with interword durational assumptions;
not to mention other assumptions, that most people don't try that.

Sincerely,
James Salsman
-- 
www.readsay.com - maker of the ReadSay PROnounce English literacy system
  400 MHz PDA included:  $499 --  http://www.readsay.com/PROnounce.html
0
Reply James 11/27/2004 10:54:20 PM


> > Does a program exist that will turn recorded audio into a text file
> > containing all the phonemes (or visemes) spoken, with time tags?
>

have a look at Babel2Lips product from Babel Technologies/Acapela Group,
you can get phonemes (not visemes directly) with little delay. It's used
in real-time 3D-face animation software. It works too on recorded
file.

http://www.babeltech.com

of course, it's not freeware...


0
Reply Moi 11/29/2004 1:49:13 PM

James Salsman <james.at.readsay.com@nospam.net> wrote in message news:<wA7qd.677177$8_6.637456@attbi_s04>...
> matt cook wrote:
> 
> > Does a program exist that will turn recorded audio into a text file
> > containing all the phonemes (or visemes) spoken, with time tags?
> 
> A few exist, and none succeed with the kind of accuracy you probably
> want.  For example, download Sphinx-2 and use "allphone" mode.

The accuracy required is low. Thanks for the tip. But unfortunatly I
don't have access to a linux box :( I might have to dual-boot
something...
0
Reply matturn 11/30/2004 7:21:53 AM

"Moi" <moi@moi.tk.invalid> wrote in message news:<cof935$mbl$1@ikaria.belnet.be>...
> > > Does a program exist that will turn recorded audio into a text file
> > > containing all the phonemes (or visemes) spoken, with time tags?
> >
> 
> have a look at Babel2Lips product from Babel Technologies/Acapela Group,
> you can get phonemes (not visemes directly) with little delay. It's used
> in real-time 3D-face animation software. It works too on recorded
> file.
> 
> http://www.babeltech.com
> 
> of course, it's not freeware...

Thanks for that. It looks promising.
0
Reply matturn 11/30/2004 7:25:22 AM

matt cook wrote:

> Does a program exist that will turn recorded audio into a text file
> containing all the phonemes (or visemes) spoken, with time tags?
> 
> Ie "[00:00:00] b, [00:00:01] ah"
> 
> Or should I look to the SAPI API to write one? :-)

Been there, done that.  :)

I wrote a little command-line exe which uses the MS dictation engine to 
try to extract the phonemes (and timings) from a recorded prompt (wav 
format).  It's amusingly inaccurate in its transcriptions, but if you're 
using the output for rough lipsyncing, the results can be surprisingly 
effective.

Email me if you want me to send you the exe (only 28KB).

Cheers,
James Anderson
0
Reply James 11/30/2004 4:04:10 PM

>>> Does a program exist that will turn recorded audio into a text file
>>> containing all the phonemes (or visemes) spoken, with time tags?
>>
>> A few exist, and none succeed with the kind of accuracy you probably
>> want.  For example, download Sphinx-2 and use "allphone" mode.
> 
> The accuracy required is low. Thanks for the tip. But unfortunatly I
> don't have access to a linux box....

Sphinx-2 easily compiles and runs under Win32, unix, and I even
have it running under Mac OS 9, using a prefix file for Metroworks 
CodeWarrior with these eleven lines:

typedef char *         caddr_t;
typedef unsigned char  u_char;
typedef unsigned int   u_int;
typedef unsigned long  u_long;
typedef unsigned short u_short;
#define CARBONLIB     1
#define FAST8B        1
#define MAXPATHLEN 1024
#ifndef _MAC
#define _MAC 1
#endif

For Windows, you can use MSVC to compile Sphinx-2 much more easily.

Sincerely,
James Salsman
-- 
www.readsay.com - maker of the ReadSay PROnounce English literacy system
  400 MHz PDA included:  $499 --  http://www.readsay.com/PROnounce.html
0
Reply James 12/1/2004 12:48:43 AM

"matt cook" <matturn@gmail.com> wrote in message
news:2fd338e.0411292321.2aa97a09@posting.google.com...
> James Salsman <james.at.readsay.com@nospam.net> wrote in message
news:<wA7qd.677177$8_6.637456@attbi_s04>...
> > matt cook wrote:
> >
> > > Does a program exist that will turn recorded audio into a text file
> > > containing all the phonemes (or visemes) spoken, with time tags?
> >
> > A few exist, and none succeed with the kind of accuracy you probably
> > want.  For example, download Sphinx-2 and use "allphone" mode.
>
> The accuracy required is low. Thanks for the tip. But unfortunatly I
> don't have access to a linux box :( I might have to dual-boot
> something...

if by any chance you're interested in animating faces according to speech
(either thru
ASR or TTS), have a look at Digital Vanity products too, they have some
good 3D demo:
http://www.digital-vanity.com
btw, they are using the Babel Technologies engine for the Voice2Video demos.


0
Reply Moi 12/1/2004 9:06:23 AM

7 Replies
307 Views

(page loaded in 0.292 seconds)

Similiar Articles:

7/24/2012 1:44:12 AM


Reply: