[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Lojban Text to Speech



I'm in Atlanta, and have lived most of my life in the South. But I'm 
not responsible for the apparent southern accent (which I had indeed 
noticed myself).

How did I get FestVox to speak Loglan? I decided, about a month or
so 
before Thanksgiving, that I could learn Lojban a lot faster if I
could 
listen to it on cassette while traveling, walking, etc. But, not a 
lot of speech files out there. So it occurred to me that I might be 
able to make my own. I had never done TTS before, so I decided it 
would be a hoot to try. I discovered Sun's JSDK, and (after a brief 
investigation of MS SDK) I followed the links ultimately to FestVox 
(which has some support for JSDK, although I haven't tried it out). 
It was open source, whereas MS SDK was more of a free (for now) black 
box.

Well,after a lot of rtfm, I first tried recording all the diphones 
(1000+, based on info in the reference grammar). But, I was coming 
down with a cold, and my voice wasn't up to it. I didn't end the day 
with the same voice I started with.

So I gave that up in disgust for a while, until the idea struck me 
over the Thanksgiving holiday: Why not at least try to fake it with 
alreadly available synthetic voices? So when I returned home, I 
decided on Festival's kal_diphone (US male voice) mixed with a little 
of their el_diphone (Spanish male voice). The Spanish voice provides 
the much prized Lojban x diphone--the Spanish j in juan, of course, 
and I think only about 50 diphones came from that voice. The 
combination of phonemes fills the slate, and theoretically within 
Lojban phonemic pronunciation parameters. Although most of the r's 
are untrilled (but for a few that slipped in from Spanish. However, 
this enabled me not to have to feel guilty much about giving short 
shrift to r as a syllabic consonant.) 

So most of the voice is kal_diphone, and kal_diphone by itself did
not 
sound particularly Southern. The accent must be an emergent 
phenomenon!
I first tried it out on northwind.txt, which includes the word 
'darlu.' (Imagine 'get along little darlu.') I think the untrilled 
r's contribute a lot to the impression of a Southern accent, by the 
way.

Yes, I had to set up a Lojban phoneme set, and write the code to 
convert the Lojban into the phonemes, but there couldn't be an easier 
language for that. I also had to write the stress rules and code,
but 
again, that's very regular. I decided not to stress cmavo at all, 
since I didn't have to. I also wrote rules for parsing numeric
stuff, 
but haven't tested it much (seemed to work though).

I originally synthesized at 16 khz, but it occurred to me that a lot 
of people might have modem dialup connections, so I decimated it to
8k 
to cut the file size in half. As for quality, I'm frankly surprised 
that I got away with mixing synthesized voices at all! I'd be happy 
for someone else with leather tonsils, a quiet recording studio, 
plenty of time, patience, and consistency to try it again. They
could 
use my diphones for generating prompts.

By the way, the first time I heard it, I thought it was fairly 
horrendous, and I could barely make out a word. But, after listening 
with the text in front of me, it seemed to begin to "clear up" after 
several listenings. Although I had begun to be fairly proficient at 
reading Lojban, it's a different experience to try to follow it 
aurally, in real-time. Different parts of the brain involved, no 
doubt. Then after some familiarity and practice, I was able to record 
it onto a cassette, and listen to it in the car on in a walkman. All 
in all, I think it helps a lot to hear it "spoken."

Thanks for the response,
-Jack

--- In lojban@y..., "randl. nortmn." <lojbanlists@w...> wrote:
> On Thu, Jan 03, 2002 at 06:23:13PM -0000, buzzwyrd wrote:
> [...]
> > I have added a link to a wav file that I generated from the
Lojban 
> > text. It was made with CMU's Festvox TTS system. It's by no 
means 
> > perfect, but it does seem to help me to hear spoken lojban, even 
if
> > it is a little gravelly.
> [...]
> 
> I'm interested in knowing how you got Festvox to speak Lojban. Did
> you actually write the frontend that parses Lojban text into 
phonemes,
> or did you first convert the Lojban to English text, and then send
> that through the standard English frontend? If the former, then all
> that remains is to record a set of high-quality diphones for Lojban.
> (Also, where did you get the fricative 'k' ('x' in Lojban) sound?)
> 
> By the way, for those that have commented on the poor quality and
> apparent Southern twang in the sound file, both are due to the fact
> that very old, poor quality 8kHz diphone samples are being used, and
> probably the English ones. With those samples, even English sounds
> terrible. I think the Southern twang is just an accident, largely 
due
> to the fact that the samples are not of the pure Lojban vowels, but
> the American sliding dipthongy vowels.
> 
> > Oh, and believe me, if you want to record all the diphones, I
hope 
> > you've got a leather larynx.
> 
> Agreed! This is why I pawned that off on others to do. And since 
you
> seem to have done my part of the job already, I get to sit back and
> enjoy the fruits of others' labor! ;-)
> 
> Seriously, good job on both the SHRDLU dialog translation and the
> TTS. That translation is much more ambitious than anything I've
> undertaken so far.
> 
> mu'omi'e randl.