From arntrich@stud.ntnu.no Fri Feb 16 06:28:01 2001
Return-Path: <arntrich@stud.ntnu.no>
X-Sender: arntrich@stud.ntnu.no
X-Apparently-To: lojban@onelist.com
Received: (EGP: mail-7_0_3); 16 Feb 2001 14:27:57 -0000
Received: (qmail 51626 invoked from network); 16 Feb 2001 14:27:57 -0000
Received: from unknown (10.1.10.27) by l8.egroups.com with QMQP; 16 Feb 2001 14:27:57 -0000
Received: from unknown (HELO due.stud.ntnu.no) (129.241.56.71) by mta2 with SMTP; 16 Feb 2001 14:27:57 -0000
Received: from localhost (localhost [127.0.0.1]) by due.stud.ntnu.no (Postfix) with ESMTP id 47D3217A6D for <lojban@onelist.com>; Fri, 16 Feb 2001 15:27:21 +0100 (CET)
Received: from hff103-26 (dhcp-29183.stud.hf.ntnu.no [129.241.29.183]) by due.stud.ntnu.no (Postfix) with SMTP id EC72117A79 for <lojban@onelist.com>; Fri, 16 Feb 2001 15:26:12 +0100 (CET)
Message-Id: <3.0.5.32.20010216144733.01071380@pop.stud.ntnu.no>
X-Sender: arntrich@pop.stud.ntnu.no
X-Mailer: QUALCOMM Windows Eudora Pro Version 3.0.5 (32)
Date: Fri, 16 Feb 2001 14:47:33 +0100
To: lojban@yahoogroups.com
Subject: Re: [lojban] speech synthesizer
In-Reply-To: <Pine.GSO.4.10_heb2.08.10102151918500.6869-100000@sunshine>
References: <3.0.5.32.20010215162902.01094cd0@pop.stud.ntnu.no>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Virus-Scanned: by AMaViS perl-10
From: Arnt Richard Johansen <arntrich@stud.ntnu.no>
X-Yahoo-Message-Num: 5503

[Word-based Open Source Speech Synthesis]

>Isn't this somewhat, well, not the way to do it, regarding Lojban's
>phonology/morphology? I mean, it would be much easier just to program
>simple syllable-making functions, and let the program do the (amazing
>difficult) task of splitting the words into syllables.

I shall be the first one to admit that a project like this is futile.
However, doing it The Right Way(tm) is beyond our capabilities as well.

Splitting Lojban *text* into syllables is trivial.  What is highly
difficult, however, is segmenting recorded speech in such a way that it
sounds reasonably natural when it is pieced back together.  This has
nothing to do with the level of complexity in a language, but the way the
phones (speech sounds) blend into each other in any language.  You can't
take the the "n" of "kantu", the "e" of "sevzi", and the "i" of "mi",
splice them together, and end up with "nei".  In normal speech, the word
"nei" consists of continuously changing frequencies, and you can't really
tell where the "n" ends and the "e" begins; or where the "e" ends, and the
"i" begins.

Anyone interested in the topic of high-quality speech synthesis might want
to take a look at http://tcts.fpms.ac.be/synthesis/mbrola.html.

>I mean, you can't
>teach the program all lujvo, or all fu'ivla, or even all cmavo
>combinations.

If I'm very bored, I just might! :)
--=20
Arnt Richard Johansen      | - Hvorfor snakker man engelsk p=E5 Internet?
http://people.fix.no/arj/  | - Har du h=F8rt om "minste felles nevner"?
arj@fix.no                 |=20