[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lojban] Historical "finprims" gismu algorithm weights and scores



On 3/3/2014 11:57 AM, Riley Martinez-Lynch wrote:
 2. Can anyone confirm the weights that I derived from finprims, or
    alternately, identify issues in the methodology I'm using to
    generate scores?

I'll have to get back to you. The programs are in TurboPascal 3 (and originally were in TP1 or 2) and haven't been run in 20 years or so. I vaguely recall that they are correct, and that mamta generated a less than 100 score because of rounding errors.

Update: I think the numbers you derived are correct for the early words. At some point we realized the rounding error implicit in the weights that you derived, and changed things so that the weights were normalized to 200 instead of 100, allowing for 1/2 percent accuracy, which allowed the weights to sum properly. The actual weights from the final version of the program were

     Weight[1]  := 67; { Chinese   }
     Weight[2]  := 36; { English   }
     Weight[3]  := 33; { Spanish   }
     Weight[4]  := 25; { Hindi     }
     Weight[5]  := 24; { Russian   }
     Weight[6]  := 15; { Arabic    }
and if you divide each of those by 2 and round down, you get the numbers you derived.

 3. If these weights are confirmed, is there a record of how were they
    derived? Have they been previously published?

If there is a record, then I have it.  Finding it may be non-trivial.

Update: I have two "final" versions of the program, in source and executable, but cannot recall what the difference is. The first was almost certainly used for all the 1987 prim runs, while we may have used the second one for the words added later.

I also think I have the full set of outputs of the data runs, which gives the numbers that eventually went into finprims. (There were a couple of intermediate steps - finprims was generated by me manually after all the word runs were made, and I had picked the "winners".)


 4. Does anyone with a memory of the gismu-making process remember how
    decimal precision and rounding was handled in calculating the
    scores?

Erroneously %^). There was a bug that we found later that explained the mamta numbers adding up to less than 100.

For example, the letter sequence length scores (2-5) for
    each input word are divided by the length of each corresponding
    input word. I'd be curious to know how the precision of these
    numbers were handled before they were multiplied by the language
    weighs. I'd also like to know how the precision of the products was
    handled, before or after they were summed to make the scores.

Thank you for your consideration. I'm enjoying getting to know lojban!

I'm making a guess based on 25 year old memories, but I think we were using integer arithmetic because it ran too slow otherwise (my brother in law eventually recoded the inside loop in assembler, which sped things up by an order of magnitude, but it was still incredibly slow by today's standards, 5-100 minutes per source-word trial.)

IIRC, we handled the decimals by shifting two places and dividing the total weight by 100, but we were using integer arithmetic which introduced some errors.

If you are willing to wade into the old Turbo-Pascal code, I may be able to find it and send it to you. But we may have fixed the bug (deciding not to rerun the erroneous ones since the error was a scaling error that would change the scores but not likely the resulting order). I don't know if the archived code is that which ran most of the data. (we actually kept track of such things at the time, but no one has asked questions like this in 20 years, so I think a lot of old versions have been discarded.

lojbab

--
You received this message because you are subscribed to the Google Groups "lojban" group. To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.