[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lojban] Etymology of future gismu (if they are to be created)

To: lojban@googlegroups.com
Subject: Re: [lojban] Etymology of future gismu (if they are to be created)
From: Robert LeChevalier <lojbab@lojban.org>
Date: Mon, 14 May 2012 05:03:39 -0400
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:x-ct-class:x-ct-score:x-ct-refid:x-ct-spam :x-authority-analysis:x-cm-score:message-id:date:from:user-agent :x-accept-language:mime-version:to:subject:references:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-google-group-id:list-post :list-help:list-archive:sender:list-subscribe:list-unsubscribe :content-type; bh=bmcEYuQhciZVmtVVAUpoOe/gpf2yN4vaLYxcSXcryW8=; b=1IphYaEoB7EBhBZ74ZBRb0f9jx/6BTUc4R8C6UFd4VvdQhq1PUKVrycztt8lnGuLvk 9ge6VIQFdhB0B+uR/VHgfno9oyQleVsnBjq5V4R/3qNu0v+vfeIWHuuTO4G+rxH4dE94 pQwqhKV8t650wIzzLGdxrs/oApydmuRucJU20=
In-reply-to: <5683060.1568.1336888868482.JavaMail.geo-discussion-forums@vbq19>
List-archive: <http://groups.google.com/group/lojban?hl=en_US>
List-help: <http://groups.google.com/support/?hl=en_US>, <mailto:lojban+help@googlegroups.com>
List-id: <lojban.googlegroups.com>
List-post: <http://groups.google.com/group/lojban/post?hl=en_US>, <mailto:lojban@googlegroups.com>
List-subscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:lojban+subscribe@googlegroups.com>
List-unsubscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:googlegroups-manage+1004133512417+unsubscribe@googlegroups.com>
Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com
References: <7321907.260.1336829133552.JavaMail.geo-discussion-forums@ynz24> <20120512173520.GJ1837@stodi.digitalkingdom.org> <8573463.77.1336885324334.JavaMail.geo-discussion-forums@pbaf5> <CAMKwHj2939gsHa=r6vwSW5taW96dPJ7o4efku9ufnpydrd_e_Q@mail.gmail.com> <5683060.1568.1336888868482.JavaMail.geo-discussion-forums@vbq19>
Reply-to: lojban@googlegroups.com
Sender: lojban@googlegroups.com
User-agent: Mozilla Thunderbird 1.0.7 (Windows/20050923)

gleki wrote:

I know from experience that any and all translation programs are horrid at translation.
Furthermore, I don't see any need to include more languages into the algorithm.


Transliteration *may be* horrid indeed (especially in case of Arabic). However, audio recordings can solve this issue.
The algorithm was chosen to make people from all over the world learn words quicker.

That wasn't quite the reason, though certainly JCB believed that it wastrue. The primary reason was to create a lexicon that was (at leastapparently) NOT biased in favor of any one language to an extent thatexceeded its natural influence. "Cultural neutrality" was thewatchword. There were and are a lot of problems with how JCB formulatedthe problem, and the dominance of American English semantics on theMEANINGS of the words is what I most fear, but that is what we are stuckwith.

We did attempt to gather information using the old LogFlash program todetermine whether indeed recognition scores were predictive of wordlearning. We got maybe a dozen data sets from different people, but mylack of time and statistical analysis skills leaves the analysis of thatold data as one of my never-done tasks.

I suspect that there will be some correlation, but it might only existon those words with higher recognition scores. Since more languageswould lower the average score, learnability would likely be hurt.

If so why limit the number of source languages to 6?

Because any more that 6 was counterproductive, leading to essentiallyrandom words, and even then Arabic in 6th place had very little Lojbanicsignificance (in part because of the nature of Arabic morphology). Theextreme population dominance of Chinese and English (including 2ndlanguage speakers), and the existence of short roots in those languagesmeans that most Lojban words are basically an amalgamation of those twolanguages, with sometimes a little coloration of one of the other languages.

Remember that a word has to match at least 2 letters (and if only 2,they must be in the right place) in order to contribute to a Lojbanrecognition score.

I suspect that any rigorous study would show that the Lojban morphologycannot effectively represent contributions from more than 3 languagefamilies (in essence, three languages with other languages possiblyreinforcing those three when their roots are similar, which happens mostoften when they are in the same language family, or when there has beensignificant borrowing). Most often, only two languages/families arerepresented.

A couple percentage points different, and Lojban would look like anamalgamation of Chinese and Hindi. Indeed, per the numbers below, thatis what would probably happen now.

We did experiments with more languages, ranging up to 12, but additionallanguages merely gave lower recognition scores (sometimes leading to tiescores between entirely different strings), and rarely, a letter mightchange because it gave a couple more points.

If I had it to do over again, I would make a couple changes in Chinesetransliteration (which would give us more "o" and less "a" in thelanguage, and perhaps try to find a way to decrease the reinforcing offricative sounds that aren't really alike in Chinese). And I would useentirely different rules for Arabic, because vowels count so little intheir roots compared to consonants, but the Lojban algorithm weightsconsonants and vowels more or less equally.

At one point in the 90s, I fiddled with the program to try to do this,but the original program no longer works properly (parts had been codedin assembler to speed up the innermost loops back in the 8086 era when asingle word run would take several minutes rather than a few seconds)and I was a little too rusty on my coding skills.

Russian is no longer among first 6.

Actually, I think it still is, though I haven't done the calculations inrecent years. The last time I did so, in 2004, it had dropped from 5thinto 6th place, but it was still solidly ahead of Bengali because ofsecond language speakers; it is probably closer now because Bengalicontinues to grow, while Russian is stagnant or waning; both areprobably in the neighborhood of 250 million total speakers. But Russianisn't very influential in the wordmaking any more than Arabic is, thoughit is primarily because Russian roots are quite long. Bengali wouldlikely have a little more influence, but only to the extent that itsroots reinforce Hindi roots, skewing the language more towards theChinese/Hindi amalgam mentioned above.

Next after Bengali is Portuguese, because Indonesian is still primarilya second language for most people who speak it, and second languagespeakers are halved.


The 2004 weighting would have been
Chinese .33
Hindi   .21
English .18
Spanish .12
Arabic  .09
Russian .07

The 1987 weights were
Chinese .36
Hindi   .16
English .21
Spanish .12
Arabic  .07
Russian .09

If Bengali replaced Russian or were added, this would slightlystrengthen Hindi. But its weight would be on the same order as Russian,not enough to actually participate in word-making except where itreinforces the weight of a Hindi root. Even Spanish has insufficientweight to participate in many words, except when it reinforces anEnglish root.

Portuguese would probably significantly reinforce Spanish, perhapsenough to enable it to match English in weight, but otherwise wouldnever make any contribution.

Indonesia wouldn't reinforce anything except where it uses a borrowedword, and thus would have even less effect than Arabic.

And do those 6 languages really represent the majority of the population of the planet?

Actually yes, but not by much (In 2004, the 6 languages represented 2.7billion first language speakers and 1.5 billion 2nd language speakers(with some overlap, especially in Hindi/English speakers, but probablynot so much to not exceed half of the current 7 billion).


But that wasn't the intent.

The most trustworthy answer is the following.
If adding more languages changes the resulting sounding then 6 languages are not enough.

Redoing the words with the current Hindi weighting would have a bigchange in the language. So would the change in Chinese transliteration.Any Arabic change would probably help some, but not enough tosignificantly change the sound of the language. Adding additionallanguages would probably not change the words much (though there mightbe some randomization effects), but would lower the recognition scores.

(Masochists who know old Turbo Pascal might be able to do something withthe program, including running some trials with different weightings.The source is still floating around somewhere on my machine. But IIRC,the code is poorly-enough documented so that a good programmer couldwrite something from scratch almost as fast, that would allow them totry additional languages and see for themselves that it doesn't buy much.)


lojbab

--
You received this message because you are subscribed to the Google Groups "lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban?hl=en.

Follow-Ups:
- Re: [lojban] Etymology of future gismu (if they are to be created)
  - From: gleki <gleki.is.my.name@gmail.com>

References:
- [lojban] Etymology of future gismu (if they are to be created)
  - From: gleki <gleki.is.my.name@gmail.com>
- Re: [lojban] Etymology of future gismu (if they are to be created)
  - From: Robin Lee Powell <rlpowell@digitalkingdom.org>
- Re: [lojban] Etymology of future gismu (if they are to be created)
  - From: gleki <gleki.is.my.name@gmail.com>
- Re: [lojban] Etymology of future gismu (if they are to be created)
  - From: Jonathan Jones <eyeonus@gmail.com>
- Re: [lojban] Etymology of future gismu (if they are to be created)
  - From: gleki <gleki.is.my.name@gmail.com>

Prev by Date: [lojban] Re: "almost"
Next by Date: Re: Word changing and word creation (was Re: [lojban] bugs in jbovlaste)
Previous by thread: Re: [lojban] Etymology of future gismu (if they are to be created)
Next by thread: Re: [lojban] Etymology of future gismu (if they are to be created)
Index(es):
- Date
- Thread