From lojbab@lojban.org Fri Sep 01 17:56:48 2000
Return-Path: <lojbab@lojban.org>
Received: (qmail 20601 invoked from network); 2 Sep 2000 00:56:45 -0000
Received: from unknown (10.1.10.27) by m3.onelist.org with QMQP; 2 Sep 2000 00:56:45 -0000
Received: from unknown (HELO stmpy-3.cais.net) (205.252.14.73) by mta2 with SMTP; 2 Sep 2000 00:56:45 -0000
Received: from bob (ppp51.net-A.cais.net [205.252.61.51]) by stmpy-3.cais.net (8.10.1/8.9.3) with ESMTP id e820utQ41140 for <lojban@egroups.com>; Fri, 1 Sep 2000 20:56:55 -0400 (EDT) (envelope-from lojbab@lojban.org)
Message-Id: <4.2.2.20000901203826.00ac86c0@127.0.0.1>
X-Sender: vir1036/pop.cais.com@127.0.0.1
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.2 
Date: Fri, 01 Sep 2000 20:54:01 -0400
To: <lojban@egroups.com>
Subject: Re: [lojban] Re: vowel counts
In-Reply-To: <026201c01423$21e2daa0$22191bc1@rus.ger.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
From: "Bob LeChevalier (lojbab)" <lojbab@lojban.org>

At 04:44 PM 09/01/2000 +0200, Daniel Gudlat wrote:
>coi rodo
>.i la jildicnen cu cusku di'e
> > I wrote a short perl script to count the vowels in the gismu in the
> > official word list and it came up with this count:
> >
> > ending i = 448
> > ending a = 335
> > ending u = 251
> > ending e = 158
> > ending o = 150
> >
> > midvowel a = 510
> > midvowel i = 353
> > midvowel u = 201
> > midvowel e = 186
> > midvowel o = 92
> >
> > total a = 845
> > total i = 801
> > total u = 452
> > total e = 344
> > total o = 242
> >
> > 'a' and 'i' win by a fair margin over the others... i wonder why that
> > is.
>
>Several possible reasons come to mind:
>
>a) vowel distribution in the source languages: I don't know anything
>about the vowel distribution in Chinese, Hindu or Russian, but Arab only
>has a, i, u, AFAIK. So this would tend to temper the English prevalence
>of e quite a bit, I imagine.

Ah, but the English prevalence of "e" is in spelling, and not in 
sound. Remember that the Lojban "e" maps only the SHORT e of "bet". The 
long "e" of "meet" is mapped to Lojban as "i", and the schwa of "the" is 
mapped as "y", but in gismu making was mapped as "a".

>b) maximal separation of sounds: As far as vowels are concerned, a and i
>(and u) are maximally separated and thus make for easier word
>recognition in noisy environments. So this may have been a design
>choice.

Not a conscious factor.

>c) Dipthongs: ai, ei, oi, and au are the lojban standard diphthongs and
>strongly favor i and a.

Not directly relevant, but close. In making gismu, we rewrote source 
language words using Lojban phonemes, and those 4 diphthongs are far more 
common than others. More importantly, pretty much all diphthongs have an 
"i" or "u", heightening those sound frequencies.

>Any other takers?

One other factor is that when we mapped Chinese to Lojban, I used a table 
found in a Chinese government publication on IPA mappings of the Chinese 
sounds. Relatively few Chinese sounds map to something in Lojban that 
contains an "o" so "o" in particular is underrepresented in the 
language. Instead, it mapped to schwa which we were habitually mapping to 
"a" at that point. The schwa mapping to "a" enhanced that letter's 
frequency and nearly killed "o" as a vowel, since most Lojban words get one 
if not both vowels from the Chinese. (English also does not use a "long 
"o" sound all that often).

If we were doing the whole thing over again, we might make different rules 
on how to map sounds and spellings, and build gismu so as to heighten 
contrasts between sounds in the source language to maximize recognizability 
in written text, rather than merely mimicking pronunciation norms to 
maximize recognition of spoken sounds. But such a remake would be 
unthinkable at this point other than for intellectual curiosity (and anyone 
with the time for it probably has a million other things more worthwhile to 
do - even with very fast computers, this would be a long and tedious job.

lojbab
--
lojbab lojbab@lojban.org
Bob LeChevalier, President, The Logical Language Group, Inc.
2904 Beau Lane, Fairfax VA 22031-1303 USA 703-385-0273
Artificial language Loglan/Lojban: http://www.lojban.org