From jcrossco@bellsouth.net Thu Jan 03 21:18:16 2002 Return-Path: X-Sender: jcrossco@bellsouth.net X-Apparently-To: lojban@yahoogroups.com Received: (EGP: mail-8_0_1_3); 4 Jan 2002 05:18:16 -0000 Received: (qmail 74307 invoked from network); 4 Jan 2002 05:18:15 -0000 Received: from unknown (216.115.97.171) by m5.grp.snv.yahoo.com with QMQP; 4 Jan 2002 05:18:15 -0000 Received: from unknown (HELO n23.groups.yahoo.com) (216.115.96.73) by mta3.grp.snv.yahoo.com with SMTP; 4 Jan 2002 05:18:14 -0000 Received: from [216.115.96.155] by n23.groups.yahoo.com with NNFMP; 04 Jan 2002 05:18:14 -0000 Date: Fri, 04 Jan 2002 05:18:11 -0000 To: lojban@yahoogroups.com Subject: Re: Lojban Text to Speech Message-ID: In-Reply-To: <20020104015851.GA545@aerosol> User-Agent: eGroups-EW/0.82 MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Length: 5457 X-Mailer: Yahoo Groups Message Poster From: "buzzwyrd" X-Originating-IP: 24.98.21.225 X-Yahoo-Group-Post: member; u=83886082 X-Yahoo-Profile: buzzwyrd X-Yahoo-Message-Num: 12758 I'm in Atlanta, and have lived most of my life in the South. But I'm not responsible for the apparent southern accent (which I had indeed noticed myself). How did I get FestVox to speak Loglan? I decided, about a month or so before Thanksgiving, that I could learn Lojban a lot faster if I could listen to it on cassette while traveling, walking, etc. But, not a lot of speech files out there. So it occurred to me that I might be able to make my own. I had never done TTS before, so I decided it would be a hoot to try. I discovered Sun's JSDK, and (after a brief investigation of MS SDK) I followed the links ultimately to FestVox (which has some support for JSDK, although I haven't tried it out). It was open source, whereas MS SDK was more of a free (for now) black box. Well,after a lot of rtfm, I first tried recording all the diphones (1000+, based on info in the reference grammar). But, I was coming down with a cold, and my voice wasn't up to it. I didn't end the day with the same voice I started with. So I gave that up in disgust for a while, until the idea struck me over the Thanksgiving holiday: Why not at least try to fake it with alreadly available synthetic voices? So when I returned home, I decided on Festival's kal_diphone (US male voice) mixed with a little of their el_diphone (Spanish male voice). The Spanish voice provides the much prized Lojban x diphone--the Spanish j in juan, of course, and I think only about 50 diphones came from that voice. The combination of phonemes fills the slate, and theoretically within Lojban phonemic pronunciation parameters. Although most of the r's are untrilled (but for a few that slipped in from Spanish. However, this enabled me not to have to feel guilty much about giving short shrift to r as a syllabic consonant.) So most of the voice is kal_diphone, and kal_diphone by itself did not sound particularly Southern. The accent must be an emergent phenomenon! I first tried it out on northwind.txt, which includes the word 'darlu.' (Imagine 'get along little darlu.') I think the untrilled r's contribute a lot to the impression of a Southern accent, by the way. Yes, I had to set up a Lojban phoneme set, and write the code to convert the Lojban into the phonemes, but there couldn't be an easier language for that. I also had to write the stress rules and code, but again, that's very regular. I decided not to stress cmavo at all, since I didn't have to. I also wrote rules for parsing numeric stuff, but haven't tested it much (seemed to work though). I originally synthesized at 16 khz, but it occurred to me that a lot of people might have modem dialup connections, so I decimated it to 8k to cut the file size in half. As for quality, I'm frankly surprised that I got away with mixing synthesized voices at all! I'd be happy for someone else with leather tonsils, a quiet recording studio, plenty of time, patience, and consistency to try it again. They could use my diphones for generating prompts. By the way, the first time I heard it, I thought it was fairly horrendous, and I could barely make out a word. But, after listening with the text in front of me, it seemed to begin to "clear up" after several listenings. Although I had begun to be fairly proficient at reading Lojban, it's a different experience to try to follow it aurally, in real-time. Different parts of the brain involved, no doubt. Then after some familiarity and practice, I was able to record it onto a cassette, and listen to it in the car on in a walkman. All in all, I think it helps a lot to hear it "spoken." Thanks for the response, -Jack --- In lojban@y..., "randl. nortmn." wrote: > On Thu, Jan 03, 2002 at 06:23:13PM -0000, buzzwyrd wrote: > [...] > > I have added a link to a wav file that I generated from the Lojban > > text. It was made with CMU's Festvox TTS system. It's by no means > > perfect, but it does seem to help me to hear spoken lojban, even if > > it is a little gravelly. > [...] > > I'm interested in knowing how you got Festvox to speak Lojban. Did > you actually write the frontend that parses Lojban text into phonemes, > or did you first convert the Lojban to English text, and then send > that through the standard English frontend? If the former, then all > that remains is to record a set of high-quality diphones for Lojban. > (Also, where did you get the fricative 'k' ('x' in Lojban) sound?) > > By the way, for those that have commented on the poor quality and > apparent Southern twang in the sound file, both are due to the fact > that very old, poor quality 8kHz diphone samples are being used, and > probably the English ones. With those samples, even English sounds > terrible. I think the Southern twang is just an accident, largely due > to the fact that the samples are not of the pure Lojban vowels, but > the American sliding dipthongy vowels. > > > Oh, and believe me, if you want to record all the diphones, I hope > > you've got a leather larynx. > > Agreed! This is why I pawned that off on others to do. And since you > seem to have done my part of the job already, I get to sit back and > enjoy the fruits of others' labor! ;-) > > Seriously, good job on both the SHRDLU dialog translation and the > TTS. That translation is much more ambitious than anything I've > undertaken so far. > > mu'omi'e randl.