Received: from mail-ee0-f60.google.com ([74.125.83.60]:50623) by stodi.digitalkingdom.org with esmtps (TLSv1:RC4-SHA:128) (Exim 4.80.1) (envelope-from ) id 1Xnj54-00046Q-Ea for lojban-list-archive@lojban.org; Sun, 09 Nov 2014 23:08:11 -0800 Received: by mail-ee0-f60.google.com with SMTP id d17sf724182eek.5 for ; Sun, 09 Nov 2014 23:07:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type; bh=GRGNtNom/DqddTMag/EsEaE980BFek1MQCEWSjJzpv4=; b=lrfyRRp6BLECe9xYhSk/yUoDEbou8gwpbkCLkIyGYgqBRt8+2OOaPdXK3Y47X82j2m HFyEXWagkOFeFqGyCmcvwl1rZdJngF68ht0sDuPYLFkWf3mWarZk6rIr1MRR1QzcKeOW G9xEGyJAy+qDTqc3g0DemnJwIqABOa58jAOKFLogGK3oteUaxgn1FFqBtTi4shua8tnK 80fjEsX9ZUnm9rqh7avibZKT7XmXgsuW4fpeRSiXCskSGgtmTfEDYpxjEnf8pypmFVhE 7yG9Z8k73kYjAaenvtAVV+jOF7d8fTizUTWeo9s0DVAsvcJ2twJ7O8AObVc5AyHer2kS 9f5A== X-Received: by 10.152.88.81 with SMTP id be17mr75lab.17.1415603279775; Sun, 09 Nov 2014 23:07:59 -0800 (PST) X-BeenThere: lojban@googlegroups.com Received: by 10.152.87.146 with SMTP id ay18ls27863lab.51.gmail; Sun, 09 Nov 2014 23:07:58 -0800 (PST) X-Received: by 10.112.247.74 with SMTP id yc10mr4900159lbc.8.1415603278573; Sun, 09 Nov 2014 23:07:58 -0800 (PST) Received: from mail-wg0-x22b.google.com (mail-wg0-x22b.google.com. [2a00:1450:400c:c00::22b]) by gmr-mx.google.com with ESMTPS id l4si527622wif.2.2014.11.09.23.07.58 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 09 Nov 2014 23:07:58 -0800 (PST) Received-SPF: pass (google.com: domain of gleki.is.my.name@gmail.com designates 2a00:1450:400c:c00::22b as permitted sender) client-ip=2a00:1450:400c:c00::22b; Received: by mail-wg0-x22b.google.com with SMTP id y10so8087895wgg.16 for ; Sun, 09 Nov 2014 23:07:58 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.194.206.106 with SMTP id ln10mr39866614wjc.90.1415603278427; Sun, 09 Nov 2014 23:07:58 -0800 (PST) Received: by 10.194.103.65 with HTTP; Sun, 9 Nov 2014 23:07:58 -0800 (PST) In-Reply-To: References: Date: Mon, 10 Nov 2014 10:07:58 +0300 Message-ID: Subject: Re: [lojban] Re: se klani be lo kafkylerfu From: Gleki Arxokuna To: "lojban@googlegroups.com" X-Original-Sender: gleki.is.my.name@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of gleki.is.my.name@gmail.com designates 2a00:1450:400c:c00::22b as permitted sender) smtp.mail=gleki.is.my.name@gmail.com; dkim=pass header.i=@gmail.com; dmarc=pass (p=NONE dis=NONE) header.from=gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Content-Type: multipart/alternative; boundary=047d7b874afe8dbe1c05077bd2d7 X-Spam-Score: -1.9 (-) X-Spam_score: -1.9 X-Spam_score_int: -18 X-Spam_bar: - --047d7b874afe8dbe1c05077bd2d7 Content-Type: text/plain; charset=UTF-8 2014-11-10 0:42 GMT+03:00 TR NS : > On Saturday, November 8, 2014 9:31:49 AM UTC-5, la gleki wrote: >> >> >> >> 2014-11-08 17:15 GMT+03:00 TR NS : >> >>> >>> >>> On Sunday, November 2, 2014 12:26:12 PM UTC-5, la gleki wrote: >>>> >>>> just for the record. some stats. >>>> we take irc logs, only sentences in lojban. >>>> we count the number of words with a given letter multiplied by their >>>> frequency divided by the number*frequency of all words. If the same letter >>>> occurs more than once in a word we count it as a singular occurrence. We >>>> limit ourselves only to the first 3000 most frequent words. >>>> >>>> we get: >>>> [x] - found in 2.7% of all spoken in IRC logs words >>>> ['] - 16.8% >>>> [c] - 13.9% >>>> [cx'] - 31.75% (at least one of those letters in each word) >>>> [x'] - 19.47% (at least one of those letters in each word) >>>> [cx] followed by a consonant - 2.11% >>>> >>>> to'u one of three words contains at least one of the three letters: ['] >>>> or [x] or [c]. >>>> >>> >>> So what can be done about it? I think it's clear that there are too >>> many "static noise" sounds in the language. And as a logical language there >>> is no reason that it has to rank so low in sound quality (one need only >>> look at online polls to see that Chinese, which has similar qualities, >>> never ranks well). >>> >> >> What? Impossible. Mandarin has two levels of fighting statis noise: tones >> and the rest part of the sound system that to some degree overlap. >> >> > Hmm... I don't mean static as in "hard to understand" I just mean the > nature of the sound which doesn't rank high as an aesthetically pleasant > sound. Chinese consistently ranks in the top of worst sounding language > polls. > Okay, I thought you were talking about signal to noise ratio. As for Chinese being aesthetically not pleasing then this was certainly a biased poll. Then why would >1 billion of speakers of various dialects still use it? Why won't they start speaking let's say English instead? :D And what if I tell you that I find it aesthetically pleasing? But back to Lojban the solution can be to make /^CiV/ and /^CuV/ cmavo a new alternative sounding preserving the existing sounding. Also if a /V'V/ dipthong is forbidden then it's allowed to pronounce it as /VV/. This will lead to the following options in pronouncing words: {ku'i} => {kui}/{ku'i} (choose the pronunciation that you like). {o'e} => {oe}/{o'e} (choose the pronunciation that you like). However, {i'e} => {i'e} (since /ie/ is an allowed dipthong I tried counting how many ' can be removed this way. Now I get 7.8% of all words. Thus 16.8 - 7.8 = 9. However, I could miss some words. I don't know if 9% of words with ' would be fine to you. Another advantage of such approach is that 7.8% of words can now be pronounced shorter. Also many ' are found in lujvo. If '-less rafsi are dispreferred then the number of ' will be decreased even more. There is nothing wrong in saying {selprami} instead of {selpa'i} unless someone uses the latter as a nickname. As for [x] it covers only 2.7% of all words. This letter can probably be eliminated from gismu by replacing with {k}, short rafsi with {x} can be eliminated at all and the corpus can be corrected since it can have mistakes of another kind anyway. E.g. {xrula} can get an alternative pronunciation of e.g. {flora}. If such alternative make people happier then why not use it? However, notice that 0.9% of all words is {xu}. The word {xamgu} is No. 86 in the frequency list. i xu la'edi'u xamgu da'i do -- You received this message because you are subscribed to the Google Groups "lojban" group. To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com. To post to this group, send email to lojban@googlegroups.com. Visit this group at http://groups.google.com/group/lojban. For more options, visit https://groups.google.com/d/optout. --047d7b874afe8dbe1c05077bd2d7 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


2014-11-10 0:42 GMT+03:00 TR NS <transfire@gmail.com>:=
On Saturday, November 8, 2014 9:31:49= AM UTC-5, la gleki wrote:


2014-11-08 17:15 GMT+03:00 TR NS <tran...@gmail.com>:


On Sunday, November 2, 2014 12:26:12= PM UTC-5, la gleki wrote:
just for the re= cord. some stats.
we take irc logs, only sentences in lojban.
we count the number of words with a given letter multiplied by their frequ= ency divided by the number*frequency of all words. If the same letter occur= s more than once in a word we count it as a singular occurrence. We limit o= urselves only to the first 3000 most frequent words.

we get:
[x] =C2=A0- found in 2.7% of all spoken in IRC logs wo= rds
['] - 16.8%
[c] - 13.9%
[cx'] - 3= 1.75% (at least one of those letters in each word)
[x'] - 19.= 47% (at least one of those letters in each word)
[cx] followed by a cons= onant - 2.11%

to'u one of three words contains= at least one of the three letters: ['] or [x] or [c].

So what can be done about it?=C2=A0= I think it's clear that there are too many "static noise" so= unds in the language. And as a logical language there is no reason that it = has to rank so low in sound quality (one need only look at online polls to = see that Chinese, which has similar qualities, never ranks well).

What? Impossible. Mandarin has two level= s of fighting statis noise: tones and the rest part of the sound system tha= t to some degree overlap.


Hmm... I don't mean static as in "har= d to understand" I just mean the nature of the sound which doesn't= rank high as an aesthetically pleasant sound. Chinese consistently ranks i= n the top of worst sounding language polls.
Okay, I thought you were talking about signal to noise ratio.<= /div>
As for Chinese being aesthetically not pleasing =C2=A0then this w= as certainly a biased poll. Then why would >1 billion of speakers of var= ious dialects still use it? Why won't they start speaking let's say= English instead? :D
And what if I tell you that I find it aesthe= tically pleasing?


But back to Lojba= n the solution can be to make /^CiV/ and /^CuV/ cmavo a new alternative sou= nding preserving the existing sounding. Also if a /V'V/ dipthong is for= bidden then it's allowed to pronounce it as /VV/. This will lead to the= following options in pronouncing words:

{ku'i= } =3D> {kui}/{ku'i} (choose the pronunciation that you like).
<= div>{o'e} =3D> {oe}/{o'e}=C2=A0(choose the pronunciation that yo= u like).
However,
{i'e} =3D> {i'e} (since /i= e/ is an allowed dipthong

I tried counting how man= y ' can be removed this way. Now I get 7.8% of all words. Thus 16.8 - 7= .8 =3D 9. However, I could miss some words.
I don't know if 9= % of words with ' would be fine to you.
Another advantage =C2= =A0of such approach is that 7.8% of words can now be pronounced shorter.
Also many ' are found in lujvo. If '-less rafsi are dispref= erred then the number of ' will be decreased even more. There is nothin= g wrong in saying {selprami} instead of {selpa'i} unless someone uses t= he latter as a nickname.


As for [x]= it covers only 2.7% of all words. This letter can probably be eliminated f= rom gismu by replacing with {k}, short rafsi with {x} can be eliminated at = all and the corpus can be corrected since it can have mistakes of another k= ind anyway. E.g. {xrula} can get an alternative pronunciation of e.g. {flor= a}.

If such alternative make people happier then w= hy not use it?
However, notice that 0.9% of all words is {xu}. Th= e word {xamgu} is No. 86 in the frequency list.

i = xu la'edi'u xamgu da'i do

--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsub= scribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http:= //groups.google.com/group/lojban.
For more options, visit http= s://groups.google.com/d/optout.
--047d7b874afe8dbe1c05077bd2d7--