From LOJBAN%CUVMB.BITNET@UBVM.CC.BUFFALO.EDU Mon Dec 18 07:56:05 1995 Reply-To: BARRETO%VELAHF@ECCSA.TR.UNISYS.COM Date: Mon Dec 18 07:56:05 1995 Sender: Lojban list From: Paulo Barreto Subject: Re: The gismu creation algorithm X-To: lojban%cuvmb.cc.columbia.edu@TRSVR.BITNET To: John Cowan Status: OR Message-ID: <-wDlyQc0-wF.A.3EG.7u0kLB@chain.digitalkingdom.org> mi di'e cusku > I frequently feel as if letter order was not really considered > in that process. la lojbab di'e cusku > It is certainly a coincidence. If you think you have a better match > for a word, I still have all 20 meg of gismu data runs around [...] Well, only some bytes are enough :-) I'm not saying different gismu would have better scores, only that the current gismu seem to have higher scores when letter order in not taken into account. Let me illustrate my point with one example for each source language; this is interesting even if only a coincidence. It seems that if there is an ordered match, then a longer unordered match is likely, and you didn't have to code a more complex algorithm. gismu etymology score w/ order score w/o order ----- --------------- -------------- --------------- jdari Chinese 'jian' 2 3 fagri English 'fair' 3 4 palta Hindi 'tal' 2 3 canre Spanish 'aren' 3 4 kabri Russian 'kubak' 2 3 sumji Arabic 'juml' 2 3 co'o mi'e paulos. Paulo S. L. M. Barreto -- Software Analyst -- Unisys Brazil *** Alternative e-mail address: *** Standard disclaimer applies ("I do not speak for Unisys", etc.) e'osai ko sarji la lojban.