Return-Path: LOJBAN%CUVMB.BITNET@vms.dc.LSOFT.COM Received: from SEGATE.SUNET.SE (segate.sunet.se [192.36.125.6]) by xiron.pc.helsinki.fi (8.7.1/8.7.1) with ESMTP id OAA09940 for ; Mon, 18 Dec 1995 14:39:37 +0200 Message-Id: <199512181239.OAA09940@xiron.pc.helsinki.fi> Received: from listmail.sunet.se by SEGATE.SUNET.SE (LSMTP for OpenVMS v1.0a) with SMTP id BA9D335B ; Mon, 18 Dec 1995 13:39:33 +0100 Date: Mon, 18 Dec 1995 07:38:00 LCL Reply-To: BARRETO%VELAHF@ECCSA.TR.UNISYS.COM Sender: Lojban list From: Paulo Barreto Subject: Re: The gismu creation algorithm X-To: lojban%cuvmb.cc.columbia.edu@TRSVR.BITNET To: Veijo Vilva Content-Length: 1409 Lines: 32 mi di'e cusku > I frequently feel as if letter order was not really considered > in that process. la lojbab di'e cusku > It is certainly a coincidence. If you think you have a better match > for a word, I still have all 20 meg of gismu data runs around [...] Well, only some bytes are enough :-) I'm not saying different gismu would have better scores, only that the current gismu seem to have higher scores when letter order in not taken into account. Let me illustrate my point with one example for each source language; this is interesting even if only a coincidence. It seems that if there is an ordered match, then a longer unordered match is likely, and you didn't have to code a more complex algorithm. gismu etymology score w/ order score w/o order ----- --------------- -------------- --------------- jdari Chinese 'jian' 2 3 fagri English 'fair' 3 4 palta Hindi 'tal' 2 3 canre Spanish 'aren' 3 4 kabri Russian 'kubak' 2 3 sumji Arabic 'juml' 2 3 co'o mi'e paulos. Paulo S. L. M. Barreto -- Software Analyst -- Unisys Brazil *** Alternative e-mail address: *** Standard disclaimer applies ("I do not speak for Unisys", etc.) e'osai ko sarji la lojban.