Received: from mail-pb0-f56.google.com ([209.85.160.56]:35347) by stodi.digitalkingdom.org with esmtps (TLSv1:RC4-SHA:128) (Exim 4.80.1) (envelope-from ) id 1WLk3A-0001rY-2U for lojban-list-archive@lojban.org; Thu, 06 Mar 2014 17:58:22 -0800 Received: by mail-pb0-f56.google.com with SMTP id jt11sf695893pbb.1 for ; Thu, 06 Mar 2014 17:58:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=message-id:date:from:reply-to:user-agent:mime-version:to:subject :references:in-reply-to:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe:content-type; bh=iIl59cOnbbc7U98ExcyxtqpLJBz+/9yvpNmM6MYhk3E=; b=fLx/hNEdS8+NeNAw4TfwbrEGzOciIWjHNHS/du35NtM3yME+Eoe4pPTs3KmWTYbc+J a7rn++LzEG6X/0kkha6/fonGicXJUcoiFutFs0KbBwWDYaXPyJa6e2KaHE1tBgjT32Zd eIbSqybYDLCf0XV1BNwhRxzT/7J6KjtaeYt+M++fFiyRoK2ZJabaItzN5D7hU6ijtpGw enqAKqZlbO5U714U5CZ9iyN/JI7Y75O01sTTaixc9B5cCeia4iNmHvF9fSseJSNqQO42 GxhMTH2iBd+piaEzNKRkDv4q3C9aMfvia3P0EsVbKnYNLKBJaUgnPk++L8iPh3Ehb1Hv WboQ== X-Received: by 10.50.66.133 with SMTP id f5mr5189igt.13.1394157485806; Thu, 06 Mar 2014 17:58:05 -0800 (PST) X-BeenThere: lojban@googlegroups.com Received: by 10.50.43.167 with SMTP id x7ls719148igl.28.canary; Thu, 06 Mar 2014 17:58:05 -0800 (PST) X-Received: by 10.50.29.33 with SMTP id g1mr204372igh.4.1394157485187; Thu, 06 Mar 2014 17:58:05 -0800 (PST) Received: from eastrmfepo203.cox.net (eastrmfepo203.cox.net. [68.230.241.218]) by gmr-mx.google.com with ESMTP id y10si1456082qcg.3.2014.03.06.17.58.04 for ; Thu, 06 Mar 2014 17:58:05 -0800 (PST) Received-SPF: neutral (google.com: 68.230.241.218 is neither permitted nor denied by best guess record for domain of lojbab@lojban.org) client-ip=68.230.241.218; Received: from eastrmimpo210 ([68.230.241.225]) by eastrmfepo203.cox.net (InterMail vM.8.01.05.15 201-2260-151-145-20131218) with ESMTP id <20140307015804.VHJY30677.eastrmfepo203.cox.net@eastrmimpo210> for ; Thu, 6 Mar 2014 20:58:04 -0500 Received: from [192.168.0.102] ([72.209.248.61]) by eastrmimpo210 with cox id aRy41n0061LDWBL01Ry4J5; Thu, 06 Mar 2014 20:58:04 -0500 X-CT-Class: Clean X-CT-Score: 0.00 X-CT-RefID: str=0001.0A020206.531927AC.00B7,ss=1,re=0.000,fgs=0 X-CT-Spam: 0 X-Authority-Analysis: v=2.0 cv=aZC/a2Ut c=1 sm=1 a=z9jnGXjs1dxvEuWvIXKNSw==:17 a=ygNaTn0in3EA:10 a=iNf2ss3PG7gA:10 a=xmHE3fpoGJwA:10 a=TWqP3F-lQ3wA:10 a=IkcTkHD0fZMA:10 a=8YJikuA2AAAA:8 a=4RBUngkUAAAA:8 a=8Ph_vcHEAAAA:20 a=1XWaLZrsAAAA:8 a=HgjsdBX7IAwrhJ4ZT0gA:9 a=QEXdDO2ut3YA:10 a=c4S9Whzb7AQA:10 a=cTOFpzTIQwwA:10 a=Bm6qEjDGwGEA:10 a=-lqtppCEcwu1K_hu:21 a=e564DRO5RyqSNk4E:21 a=z9jnGXjs1dxvEuWvIXKNSw==:117 X-CM-Score: 0.00 Message-ID: <531927AE.70506@lojban.org> Date: Thu, 06 Mar 2014 20:58:06 -0500 From: Robert LeChevalier Reply-To: lojban@googlegroups.com User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 To: lojban@googlegroups.com Subject: Re: [lojban] Historical "finprims" gismu algorithm weights and scores References: In-Reply-To: X-Original-Sender: lojbab@lojban.org X-Original-Authentication-Results: gmr-mx.google.com; spf=neutral (google.com: 68.230.241.218 is neither permitted nor denied by best guess record for domain of lojbab@lojban.org) smtp.mail=lojbab@lojban.org Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: text/plain; charset=UTF-8; format=flowed X-Spam-Score: -0.0 (/) X-Spam_score: -0.0 X-Spam_score_int: 0 X-Spam_bar: / On 3/3/2014 11:57 AM, Riley Martinez-Lynch wrote: > 2. Can anyone confirm the weights that I derived from finprims, or > alternately, identify issues in the methodology I'm using to > generate scores? I'll have to get back to you. The programs are in TurboPascal 3 (and originally were in TP1 or 2) and haven't been run in 20 years or so. I vaguely recall that they are correct, and that mamta generated a less than 100 score because of rounding errors. Update: I think the numbers you derived are correct for the early words. At some point we realized the rounding error implicit in the weights that you derived, and changed things so that the weights were normalized to 200 instead of 100, allowing for 1/2 percent accuracy, which allowed the weights to sum properly. The actual weights from the final version of the program were Weight[1] := 67; { Chinese } Weight[2] := 36; { English } Weight[3] := 33; { Spanish } Weight[4] := 25; { Hindi } Weight[5] := 24; { Russian } Weight[6] := 15; { Arabic } and if you divide each of those by 2 and round down, you get the numbers you derived. > 3. If these weights are confirmed, is there a record of how were they > derived? Have they been previously published? If there is a record, then I have it. Finding it may be non-trivial. Update: I have two "final" versions of the program, in source and executable, but cannot recall what the difference is. The first was almost certainly used for all the 1987 prim runs, while we may have used the second one for the words added later. I also think I have the full set of outputs of the data runs, which gives the numbers that eventually went into finprims. (There were a couple of intermediate steps - finprims was generated by me manually after all the word runs were made, and I had picked the "winners".) > 4. Does anyone with a memory of the gismu-making process remember how > decimal precision and rounding was handled in calculating the > scores? Erroneously %^). There was a bug that we found later that explained the mamta numbers adding up to less than 100. > For example, the letter sequence length scores (2-5) for > each input word are divided by the length of each corresponding > input word. I'd be curious to know how the precision of these > numbers were handled before they were multiplied by the language > weighs. I'd also like to know how the precision of the products was > handled, before or after they were summed to make the scores. > > Thank you for your consideration. I'm enjoying getting to know lojban! I'm making a guess based on 25 year old memories, but I think we were using integer arithmetic because it ran too slow otherwise (my brother in law eventually recoded the inside loop in assembler, which sped things up by an order of magnitude, but it was still incredibly slow by today's standards, 5-100 minutes per source-word trial.) IIRC, we handled the decimals by shifting two places and dividing the total weight by 100, but we were using integer arithmetic which introduced some errors. If you are willing to wade into the old Turbo-Pascal code, I may be able to find it and send it to you. But we may have fixed the bug (deciding not to rerun the erroneous ones since the error was a scaling error that would change the scores but not likely the resulting order). I don't know if the archived code is that which ran most of the data. (we actually kept track of such things at the time, but no one has asked questions like this in 20 years, so I think a lot of old versions have been discarded. lojbab -- You received this message because you are subscribed to the Google Groups "lojban" group. To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com. To post to this group, send email to lojban@googlegroups.com. Visit this group at http://groups.google.com/group/lojban. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "lojban" group. To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com. To post to this group, send email to lojban@googlegroups.com. Visit this group at http://groups.google.com/group/lojban. For more options, visit https://groups.google.com/d/optout.