Received-SPF: neutral (google.com: 68.230.241.218 is neither permitted nor denied by best guess record for domain of lojbab@lojban.org) client-ip=68.230.241.218;
Message-ID: <531927AE.70506@lojban.org>
Date: Thu, 06 Mar 2014 20:58:06 -0500
From: Robert LeChevalier <lojbab@lojban.org>
Reply-To: lojban@googlegroups.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0
MIME-Version: 1.0
To: lojban@googlegroups.com
Subject: Re: [lojban] Historical "finprims" gismu algorithm weights and scores
References: <Z4y71n00V56Cr6M014y8Sr>
In-Reply-To: <Z4y71n00V56Cr6M014y8Sr>
Precedence: list
Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com
Sender: lojban@googlegroups.com
Content-Type: text/plain; charset=UTF-8; format=flowed
X-Spam_score: -0.0
X-Spam_score_int: 0
X-Spam_bar: /

On 3/3/2014 11:57 AM, Riley Martinez-Lynch wrote:
>  2. Can anyone confirm the weights that I derived from finprims, or
>     alternately, identify issues in the methodology I'm using to
>     generate scores?

I'll have to get back to you.  The programs are in TurboPascal 3 (and 
originally were in TP1 or 2) and haven't been run in 20 years or so. I 
vaguely recall that they are correct, and that mamta generated a less 
than 100 score because of rounding errors.

Update:  I think the numbers you derived are correct for the early 
words.  At some point we realized the rounding error implicit in the 
weights that you derived, and changed things so that the weights were 
normalized to 200 instead of 100, allowing for 1/2 percent accuracy, 
which allowed the weights to sum properly.  The actual weights from the 
final version of the program were

      Weight[1]  := 67; { Chinese   }
      Weight[2]  := 36; { English   }
      Weight[3]  := 33; { Spanish   }
      Weight[4]  := 25; { Hindi     }
      Weight[5]  := 24; { Russian   }
      Weight[6]  := 15; { Arabic    }
and if you divide each of those by 2 and round down, you get the numbers 
you derived.

>  3. If these weights are confirmed, is there a record of how were they
>     derived? Have they been previously published?

If there is a record, then I have it.  Finding it may be non-trivial.

Update: I have two "final" versions of the program, in source and 
executable, but cannot recall what the difference is.  The first was 
almost certainly used for all the 1987 prim runs, while we may have used 
the second one for the words added later.

I also think I have the full set of outputs of the data runs, which 
gives the numbers that eventually went into finprims.  (There were a 
couple of intermediate steps - finprims was generated by me manually 
after all the word runs were made, and I had picked the "winners".)


>  4. Does anyone with a memory of the gismu-making process remember how
>     decimal precision and rounding was handled in calculating the
>     scores?

Erroneously %^).  There was a bug that we found later that explained the 
mamta numbers adding up to less than 100.

> For example, the letter sequence length scores (2-5) for
>     each input word are divided by the length of each corresponding
>     input word. I'd be curious to know how the precision of these
>     numbers were handled before they were multiplied by the language
>     weighs. I'd also like to know how the precision of the products was
>     handled, before or after they were summed to make the scores.
>
> Thank you for your consideration. I'm enjoying getting to know lojban!

I'm making a guess based on 25 year old memories, but I think we were 
using integer arithmetic because it ran too slow otherwise (my brother 
in law eventually recoded the inside loop in assembler, which sped 
things up by an order of magnitude, but it was still incredibly slow by 
today's standards, 5-100 minutes per source-word trial.)

IIRC, we handled the decimals by shifting two places and dividing the 
total weight by 100, but we were using integer arithmetic which 
introduced some errors.

If you are willing to wade into the old Turbo-Pascal code, I may be able 
to find it and send it to you.  But we may have fixed the bug (deciding 
not to rerun the erroneous ones since the error was a scaling error that 
would change the scores but not likely the resulting order).  I don't 
know if the archived code is that which ran most of the data.  (we 
actually kept track of such things at the time, but no one has asked 
questions like this in 20 years, so I think a lot of old versions have 
been discarded.

lojbab

-- 
You received this message because you are subscribed to the Google 
Groups "lojban" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to lojban+unsubscribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups "lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.