Received: from mail-pa0-f57.google.com ([209.85.220.57]:45121) by stodi.digitalkingdom.org with esmtps (TLSv1:RC4-SHA:128) (Exim 4.76) (envelope-from ) id 1UKngJ-00025R-MT; Wed, 27 Mar 2013 03:34:18 -0700 Received: by mail-pa0-f57.google.com with SMTP id kp14sf1354897pab.2 for ; Wed, 27 Mar 2013 03:34:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=x-received:x-beenthere:x-received:received-spf:from:to:subject:date :message-id:user-agent:in-reply-to:references:mime-version :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-google-group-id:list-post :list-help:list-archive:sender:list-subscribe:list-unsubscribe :content-transfer-encoding:content-type; bh=8pCVRzWEHvSDq9z7nb9aZ1jJB6VZP5Y1yedZdQiH2Hw=; b=Sou7Ft7su/Tp07GC7JIvRwKUka0J7OCciUoupvdlhCBm3F4WNULzgTIfFfae6wvyKw ECH5iqYxNuqUcRLwd06N7s1wpK5caU4Z5A3hpkh6/F+7zy15apXDnyfag6UYhbZ4qmzk E5MGaohy05q+hbQG4gKBCAJ0+53O1H6/ubA1rBhwutPFwJpTfE63+BGonnEHJflSa6kH O7aJYr8IvrhJleOJpI4eIEjjrkw2zWkHI1tOzgn0eJLdKABRfWD6kgaqL1zmrbvwiiIB P05SyWBUxNUzpXlNJ7QNpvdqtVsZCDx3LJ5/YrqoMRgH5nASa4q8jeZwzQ6ZkirOMpPd wfEw== X-Received: by 10.49.37.39 with SMTP id v7mr1326548qej.27.1364380445141; Wed, 27 Mar 2013 03:34:05 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.49.0.244 with SMTP id 20ls1639181qeh.77.gmail; Wed, 27 Mar 2013 03:34:04 -0700 (PDT) X-Received: by 10.224.157.1 with SMTP id z1mr7621595qaw.8.1364380444390; Wed, 27 Mar 2013 03:34:04 -0700 (PDT) Received: from leopard.ixazon.lan ([2001:470:8:42:10a0:d8a7:3c5b:3bc0]) by gmr-mx.google.com with ESMTP id x1si2885957qci.2.2013.03.27.03.34.03; Wed, 27 Mar 2013 03:34:04 -0700 (PDT) Received-SPF: neutral (google.com: 2001:470:8:42:10a0:d8a7:3c5b:3bc0 is neither permitted nor denied by best guess record for domain of phma@bezitopo.org) client-ip=2001:470:8:42:10a0:d8a7:3c5b:3bc0; Received: from caracal.localnet (unknown [IPv6:2001:470:8:42:c509:fddf:e704:7600]) by leopard.ixazon.lan (Postfix) with ESMTPS id 47DB919E0 for ; Wed, 27 Mar 2013 06:34:02 -0400 (EDT) From: Pierre Abbat To: lojban@googlegroups.com Subject: Re: [lojban] Need some jbovlaste programming help. Date: Wed, 27 Mar 2013 06:34 -0400 Message-ID: <2679640.AmcDiyQbsE@caracal> User-Agent: KMail/4.8.5 (Linux/3.2.0-38-generic; KDE/4.8.5; x86_64; ; ) In-Reply-To: <20130326231316.GH22685@stodi.digitalkingdom.org> References: <20130325180820.GU6328@stodi.digitalkingdom.org> <3520631.PfPziTd2d6@caracal> <20130326231316.GH22685@stodi.digitalkingdom.org> MIME-Version: 1.0 X-Original-Sender: phma@bezitopo.org X-Original-Authentication-Results: gmr-mx.google.com; spf=neutral (google.com: 2001:470:8:42:10a0:d8a7:3c5b:3bc0 is neither permitted nor denied by best guess record for domain of phma@bezitopo.org) smtp.mail=phma@bezitopo.org Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=KOI8-R X-Spam-Score: 0.0 (/) X-Spam_score: 0.0 X-Spam_score_int: 0 X-Spam_bar: / On Tuesday, March 26, 2013 16:13:16 Robin Lee Powell wrote: > The problems are mostly in Russian, due to an import script I fucked > up. You can see the issue thuswise: >=20 > select word,meaning,langid from natlangwords where word in (select word f= rom > natlangwords group by word, meaning, langid having count(*) > 1) order by > langid; I've loaded the database and verified the problem. I added ",word" to the e= nd=20 of the query and got this: =D1=DA=D9=CB | | 5 =D1=DA=D9=CB | | 5 =D1=DA=D9=CB | =DE=C1=D3=D4=D8 =D4=C5=CC=C1 = | 5 =D1=DA=D9=CB | =D2=C5=DE=D8 | 5 =D1=DA=D9=CB (=CF=D2=C7=C1=CE) | | 5 =D1=DA=D9=CB (=CF=D2=C7=C1=CE) | | 5 Besides the duplicates, which I'll get rid of, "=D1=DA=D9=CB (=CF=D2=C7=C1= =CE)" shoudn't be in=20 there at all; it means the same as "=D1=DA=D9=CB|=DE=C1=D3=D4=D8 =D4=C5=CC= =C1". The query produces fake duplicates for numbers in English: .001 | | 2 1E12 | | 2 1E-12 | | 2 1E15 | | 2 1E-15 | | 2 1E18 | | 2 1E-18 | | 2 1E21 | | 2 1E-21 | | 2 1E24 | | 2 1E-24 | | 2 1E6 | | 2 1E-6 | | 2 1E-9 | | 2 MEX | | 2 Apparently it thinks those are equal for sorting purposes. The same entries= =20 are really duplicated in Russian. Pierre --=20 sei do'anai mi'a djuno puze'e noroi nalselganse srera --=20 You received this message because you are subscribed to the Google Groups "= lojban" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsubscribe@googlegroups.com. To post to this group, send email to lojban@googlegroups.com. Visit this group at http://groups.google.com/group/lojban?hl=3Den. For more options, visit https://groups.google.com/groups/opt_out.