From LOJBAN%CUVMB.BITNET@UBVM.CC.BUFFALO.EDU Mon Dec 18 07:56:05 1995
Reply-To: BARRETO%VELAHF@ECCSA.TR.UNISYS.COM
Date: Mon Dec 18 07:56:05 1995
Sender: Lojban list <LOJBAN@CUVMB.BITNET>
From: Paulo Barreto <BARRETO%VELAHF@ECCSA.TR.UNISYS.COM>
Subject:      Re: The gismu creation algorithm
X-To:         lojban%cuvmb.cc.columbia.edu@TRSVR.BITNET
To: John Cowan <cowan@LOCKE.CCIL.ORG>
Status: OR
Message-ID: <-wDlyQc0-wF.A.3EG.7u0kLB@chain.digitalkingdom.org>

mi di'e cusku
> I frequently feel as if letter order was not really considered
> in that process.

la lojbab di'e cusku
> It is certainly a coincidence.  If you think you have a better match
> for a word, I still have all 20 meg of gismu data runs around [...]

Well, only some bytes are enough :-) I'm not saying different gismu
would have better scores, only that the current gismu seem to have
higher scores when letter order in not taken into account.

Let me illustrate my point with one example for each source language;
this is interesting even if only a coincidence. It seems that if there
is an ordered match, then a longer unordered match is likely, and you
didn't have to code a more complex algorithm.

    gismu   etymology         score w/ order   score w/o order
    -----   ---------------   --------------   ---------------
    jdari   Chinese 'jian'    2                3
    fagri   English 'fair'    3                4
    palta   Hindi   'tal'     2                3
    canre   Spanish 'aren'    3                4
    kabri   Russian 'kubak'   2                3
    sumji   Arabic  'juml'    2                3

co'o mi'e paulos.

    Paulo S. L. M. Barreto  --  Software Analyst  --  Unisys Brazil
    ***  Alternative e-mail address:  <pbarreto@unisys.com.br>  ***
    Standard disclaimer applies ("I do not speak for Unisys", etc.)
                       e'osai ko sarji la lojban.