Received: from mail-yk0-f189.google.com ([209.85.160.189]:58540) by stodi.digitalkingdom.org with esmtps (TLSv1:RC4-SHA:128) (Exim 4.80.1) (envelope-from ) id 1WLkLo-0001xn-BV for lojban-list-archive@lojban.org; Thu, 06 Mar 2014 18:17:41 -0800 Received: by mail-yk0-f189.google.com with SMTP id 131sf2119542ykp.6 for ; Thu, 06 Mar 2014 18:17:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=message-id:date:from:reply-to:organization:user-agent:mime-version :to:subject:references:in-reply-to:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe:content-type; bh=kBDkrxTDqD0dQRDCFw0derKJqhps+4MGmxvpGNiVq+Q=; b=Prm/vWRf0SpYLFrdpA9Yaauk5z6WH9m1QKXLh6kUW7cj4yfD7wg1OoUCl8QTU5KT1n hTEFDX8beNZQdWBp8pXj89J+tzmibdIiIVyMzeA5ZdRwBcYrPEeb6Ojpnrpw/hWTcDkY ViJWuQL0NOq1KIVTHqFyRp+fHDiOV0PJNfC3myr07B47V1KdSWXPiIcxqnUeSFVIDI/G JOPLMa19rcZkfNyN54M6YZru5hWbVC1CLydJvOgvp+hlu0jXz6nuTEMwD1ntR6DaENtU YckcUgXDUysTZtBG0OiJQE6myfLF7D+DGs0W8WRZ+6U6LXB2ecSvSovmxabv2x3faaAt J8Ag== X-Received: by 10.50.82.98 with SMTP id h2mr5957igy.3.1394158642127; Thu, 06 Mar 2014 18:17:22 -0800 (PST) X-BeenThere: lojban@googlegroups.com Received: by 10.50.32.98 with SMTP id h2ls711048igi.24.gmail; Thu, 06 Mar 2014 18:17:21 -0800 (PST) X-Received: by 10.66.253.9 with SMTP id zw9mr7658312pac.38.1394158641641; Thu, 06 Mar 2014 18:17:21 -0800 (PST) Received: from eastrmfepo103.cox.net (eastrmfepo103.cox.net. [68.230.241.215]) by gmr-mx.google.com with ESMTP id y10si1468097qcg.3.2014.03.06.18.17.21 for ; Thu, 06 Mar 2014 18:17:21 -0800 (PST) Received-SPF: neutral (google.com: 68.230.241.215 is neither permitted nor denied by best guess record for domain of lojbab@lojban.org) client-ip=68.230.241.215; Received: from eastrmimpo306 ([68.230.241.238]) by eastrmfepo103.cox.net (InterMail vM.8.01.05.15 201-2260-151-145-20131218) with ESMTP id <20140307021721.RRCY28095.eastrmfepo103.cox.net@eastrmimpo306> for ; Thu, 6 Mar 2014 21:17:21 -0500 Received: from [192.168.0.102] ([72.209.248.61]) by eastrmimpo306 with cox id aSHL1n00B1LDWBL01SHL6d; Thu, 06 Mar 2014 21:17:21 -0500 X-CT-Class: Clean X-CT-Score: 0.00 X-CT-RefID: str=0001.0A020208.53192C31.0037,ss=1,re=0.000,fgs=0 X-CT-Spam: 0 X-Authority-Analysis: v=2.0 cv=Q4UMFfKa c=1 sm=1 a=z9jnGXjs1dxvEuWvIXKNSw==:17 a=ygNaTn0in3EA:10 a=iNf2ss3PG7gA:10 a=xmHE3fpoGJwA:10 a=TWqP3F-lQ3wA:10 a=IkcTkHD0fZMA:10 a=8YJikuA2AAAA:8 a=4RBUngkUAAAA:8 a=xgj5fzYIn_g3oIJbbPsA:9 a=QEXdDO2ut3YA:10 a=z9jnGXjs1dxvEuWvIXKNSw==:117 X-CM-Score: 0.00 Message-ID: <53192C33.3000901@lojban.org> Date: Thu, 06 Mar 2014 21:17:23 -0500 From: "Bob LeChevalier, President and Founder - LLG" Reply-To: lojban@googlegroups.com Organization: The Logical Language Group, Inc. User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 To: lojban@googlegroups.com Subject: Re: [lojban] Historical "finprims" gismu algorithm weights and scores References: In-Reply-To: X-Original-Sender: lojbab@lojban.org X-Original-Authentication-Results: gmr-mx.google.com; spf=neutral (google.com: 68.230.241.215 is neither permitted nor denied by best guess record for domain of lojbab@lojban.org) smtp.mail=lojbab@lojban.org Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: text/plain; charset=UTF-8; format=flowed X-Spam-Score: -0.0 (/) X-Spam_score: -0.0 X-Spam_score_int: 0 X-Spam_bar: / On 3/6/2014 8:58 PM, Robert LeChevalier wrote: > Update: I have two "final" versions of the program, in source and > executable, but cannot recall what the difference is. The first was > almost certainly used for all the 1987 prim runs, while we may have used > the second one for the words added later. The two programs differ only in a couple of lines. Because most Chinese source words were 2-3 letters, and Russian on the other extreme often had words that were much longer, possibly as many as 10 characters, we tried normalizing all inputs as if they were 5 characters long, so that a Chinese 2/2 character match would get weighted the much lower 2/5 and a Russian 10 characters with 5 matching characters would get 5/5 rather than 5/10. I don't recall whether we used these altered weightings or just did trials to see the difference. If we did, it would show up in the words made after 1987. But if we did, I might not have noted this in Finprims. We tried other experiments to improve the results, but I haven't found them. lojbab -- You received this message because you are subscribed to the Google Groups "lojban" group. To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com. To post to this group, send email to lojban@googlegroups.com. Visit this group at http://groups.google.com/group/lojban. For more options, visit https://groups.google.com/d/optout.