From LOJBAN@CUVMB.BITNET Sat Mar 6 22:46:07 2010 Reply-To: Logical Language Group Sender: Lojban list Date: Tue Dec 19 02:34:18 1995 From: Logical Language Group Subject: Re: The gismu creation algorithm X-To: BARRETO%VELAHF@ECCSA.TR.UNISYS.COM X-cc: lojban@cuvmb.cc.columbia.edu To: John Cowan Status: OR X-From-Space-Date: Tue Dec 19 02:34:18 1995 X-From-Space-Address: LOJBAN%CUVMB.BITNET@UBVM.CC.BUFFALO.EDU Message-ID: >la lojbab di'e cusku >> It is certainly a coincidence. If you think you have a better match >> for a word, I still have all 20 meg of gismu data runs around [...] > >Well, only some bytes are enough :-) I'm not saying different gismu >would have better scores, only that the current gismu seem to have >higher scores when letter order in not taken into account. > >Let me illustrate my point with one example for each source language; >this is interesting even if only a coincidence. It seems that if there >is an ordered match, then a longer unordered match is likely, and you >--More-- >didn't have to code a more complex algorithm. > > gismu etymology score w/ order score w/o order > ----- --------------- -------------- --------------- > jdari Chinese 'jian' 2 3 > fagri English 'fair' 3 4 > palta Hindi 'tal' 2 3 > canre Spanish 'aren' 3 4 > kabri Russian 'kubak' 2 3 > sumji Arabic 'juml' 2 3 Well, in there cases, it is clear that both the Spanish and the English examples could not be remade so as to get higher scores, since the out of orger phoneme is the second vowels that has to be in final position. The Arabic example is clearly coincidence since the main ety,mological components were sum from English (probably reinforced by Spanish, I guess without verifying) and "ji" from Chinese. Arabic always loses against the other languages %^( jumji would have reduced the English score to benefit Arabic, and sumli would have reduced the Chinese score to benefit Arabic. JCB observed a long time ago that most of the gismu consisted of jamming the English and Chinese togerther optimally, with the other languages serving to make minor adjestments. This is still essentially true, though on occasion it is Chinese and Hindi, especially when the English is not reinforced by a Spanish or Russian near cognate. lojbab