[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lojban] Historical "finprims" gismu algorithm weights and scores

To: lojban@googlegroups.com
Subject: Re: [lojban] Historical "finprims" gismu algorithm weights and scores
From: Robert LeChevalier <lojbab@lojban.org>
Date: Thu, 06 Mar 2014 20:58:06 -0500
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=message-id:date:from:reply-to:user-agent:mime-version:to:subject :references:in-reply-to:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe:content-type; bh=iIl59cOnbbc7U98ExcyxtqpLJBz+/9yvpNmM6MYhk3E=; b=fLx/hNEdS8+NeNAw4TfwbrEGzOciIWjHNHS/du35NtM3yME+Eoe4pPTs3KmWTYbc+J a7rn++LzEG6X/0kkha6/fonGicXJUcoiFutFs0KbBwWDYaXPyJa6e2KaHE1tBgjT32Zd eIbSqybYDLCf0XV1BNwhRxzT/7J6KjtaeYt+M++fFiyRoK2ZJabaItzN5D7hU6ijtpGw enqAKqZlbO5U714U5CZ9iyN/JI7Y75O01sTTaixc9B5cCeia4iNmHvF9fSseJSNqQO42 GxhMTH2iBd+piaEzNKRkDv4q3C9aMfvia3P0EsVbKnYNLKBJaUgnPk++L8iPh3Ehb1Hv WboQ==
In-reply-to: <Z4y71n00V56Cr6M014y8Sr>
List-archive: <http://groups.google.com/group/lojban>
List-help: <http://groups.google.com/support/>, <mailto:lojban+help@googlegroups.com>
List-id: <lojban.googlegroups.com>
List-post: <http://groups.google.com/group/lojban/post>, <mailto:lojban@googlegroups.com>
List-subscribe: <http://groups.google.com/group/lojban/subscribe>, <mailto:lojban+subscribe@googlegroups.com>
List-unsubscribe: <http://groups.google.com/group/lojban/subscribe>, <mailto:googlegroups-manage+1004133512417+unsubscribe@googlegroups.com>
Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com
References: <Z4y71n00V56Cr6M014y8Sr>
Reply-to: lojban@googlegroups.com
Sender: lojban@googlegroups.com
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0

On 3/3/2014 11:57 AM, Riley Martinez-Lynch wrote:

 2. Can anyone confirm the weights that I derived from finprims, or
    alternately, identify issues in the methodology I'm using to
    generate scores?

I'll have to get back to you. The programs are in TurboPascal 3 (andoriginally were in TP1 or 2) and haven't been run in 20 years or so. Ivaguely recall that they are correct, and that mamta generated a lessthan 100 score because of rounding errors.

Update: I think the numbers you derived are correct for the earlywords. At some point we realized the rounding error implicit in theweights that you derived, and changed things so that the weights werenormalized to 200 instead of 100, allowing for 1/2 percent accuracy,which allowed the weights to sum properly. The actual weights from thefinal version of the program were


     Weight[1]  := 67; { Chinese   }
     Weight[2]  := 36; { English   }
     Weight[3]  := 33; { Spanish   }
     Weight[4]  := 25; { Hindi     }
     Weight[5]  := 24; { Russian   }
     Weight[6]  := 15; { Arabic    }

and if you divide each of those by 2 and round down, you get the numbersyou derived.

 3. If these weights are confirmed, is there a record of how were they
    derived? Have they been previously published?


If there is a record, then I have it.  Finding it may be non-trivial.

Update: I have two "final" versions of the program, in source andexecutable, but cannot recall what the difference is. The first wasalmost certainly used for all the 1987 prim runs, while we may have usedthe second one for the words added later.

I also think I have the full set of outputs of the data runs, whichgives the numbers that eventually went into finprims. (There were acouple of intermediate steps - finprims was generated by me manuallyafter all the word runs were made, and I had picked the "winners".)

 4. Does anyone with a memory of the gismu-making process remember how
    decimal precision and rounding was handled in calculating the
    scores?

Erroneously %^). There was a bug that we found later that explained themamta numbers adding up to less than 100.

For example, the letter sequence length scores (2-5) for
    each input word are divided by the length of each corresponding
    input word. I'd be curious to know how the precision of these
    numbers were handled before they were multiplied by the language
    weighs. I'd also like to know how the precision of the products was
    handled, before or after they were summed to make the scores.

Thank you for your consideration. I'm enjoying getting to know lojban!

I'm making a guess based on 25 year old memories, but I think we wereusing integer arithmetic because it ran too slow otherwise (my brotherin law eventually recoded the inside loop in assembler, which spedthings up by an order of magnitude, but it was still incredibly slow bytoday's standards, 5-100 minutes per source-word trial.)

IIRC, we handled the decimals by shifting two places and dividing thetotal weight by 100, but we were using integer arithmetic whichintroduced some errors.

If you are willing to wade into the old Turbo-Pascal code, I may be ableto find it and send it to you. But we may have fixed the bug (decidingnot to rerun the erroneous ones since the error was a scaling error thatwould change the scores but not likely the resulting order). I don'tknow if the archived code is that which ran most of the data. (weactually kept track of such things at the time, but no one has askedquestions like this in 20 years, so I think a lot of old versions havebeen discarded.


lojbab

--

You received this message because you are subscribed to the GoogleGroups "lojban" group.To unsubscribe from this group and stop receiving emails from it, sendan email to lojban+unsubscribe@googlegroups.com.

To post to this group, send email to lojban@googlegroups.com.
Visit this group at http://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.

Prev by Date: Re: [lojban] Historical "finprims" gismu algorithm weights and scores
Next by Date: Re: [lojban] Historical "finprims" gismu algorithm weights and scores
Previous by thread: [lojban] OT
Next by thread: Re: [lojban] Historical "finprims" gismu algorithm weights and scores
Index(es):
- Date
- Thread