From lojban+bncCK30vq5WEM6YnOQEGgTtGn88@googlegroups.com Tue Sep 07 21:09:02 2010 Received: from mail-pv0-f189.google.com ([74.125.83.189]) by chain.digitalkingdom.org with esmtp (Exim 4.72) (envelope-from ) id 1OtBxw-0004a2-VK; Tue, 07 Sep 2010 21:09:02 -0700 Received: by pvc7 with SMTP id 7sf544102pvc.16 for ; Tue, 07 Sep 2010 21:08:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:x-beenthere:received:received:received :received:received-spf:received:received:date:from:to:subject :message-id:mail-followup-to:references:mime-version:in-reply-to :user-agent:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:sender:list-subscribe:list-unsubscribe:content-type :content-disposition; bh=mLn06mLQcdJzJhvB0/9zeJBPhGMJYolrtqurvMTAPhg=; b=ZfBb3w19zGb9lF8Nw2ElJKs0kYtfB63JY7PKOqQ0XpuTI6hotD+OIZHxYvPjdue8dx eVe1PRs7ebayyd+ahRjS6+b3QBpbsAXVuT1efFijovYnikM9sXluagsdjCm7C81zPv+F 1gVsnAChLiXBHBynr8Gqz9jUFoSoXh7wBcw10= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:date:from:to:subject:message-id :mail-followup-to:references:mime-version:in-reply-to:user-agent :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type :content-disposition; b=0viVujNIbyqTwtuP1n82vXSgfRsLTMs0W7M5abJ+XR3NkXxChDZz1oHXvdYEho6Qz7 kZS5yk2qL6D5IGAUGwtCkJGzlkOIdIKqOOcpT9fHXqeKYiuylJqLZNxYB1Dh2KZvcgzw UrkFztJP6fUKVBXeCdL5Iqn6uR1XAiCE1/A/c= Received: by 10.115.39.26 with SMTP id r26mr399518waj.26.1283918926748; Tue, 07 Sep 2010 21:08:46 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.115.67.12 with SMTP id u12ls6432771wak.3.p; Tue, 07 Sep 2010 21:08:45 -0700 (PDT) Received: by 10.114.158.3 with SMTP id g3mr1800985wae.32.1283918925737; Tue, 07 Sep 2010 21:08:45 -0700 (PDT) Received: by 10.114.158.3 with SMTP id g3mr1800984wae.32.1283918925686; Tue, 07 Sep 2010 21:08:45 -0700 (PDT) Received: from chain.digitalkingdom.org (chain.digitalkingdom.org [64.81.66.169]) by gmr-mx.google.com with ESMTP id d12si11981599wam.3.2010.09.07.21.08.45; Tue, 07 Sep 2010 21:08:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of nobody@digitalkingdom.org designates 64.81.66.169 as permitted sender) client-ip=64.81.66.169; Received: from nobody by chain.digitalkingdom.org with local (Exim 4.72) (envelope-from ) id 1OtBxk-0004Zn-FI for lojban@googlegroups.com; Tue, 07 Sep 2010 21:08:44 -0700 Received: from rlpowell by chain.digitalkingdom.org with local (Exim 4.72) (envelope-from ) id 1OtBxk-0004Ze-0G for lojban-list@lojban.org; Tue, 07 Sep 2010 21:08:44 -0700 Date: Tue, 7 Sep 2010 21:08:43 -0700 From: Robin Lee Powell To: lojban-list@lojban.org Subject: [lojban] CLL diffs Message-ID: <20100908040843.GL5990@digitalkingdom.org> Mail-Followup-To: lojban-list@lojban.org References: <20100611173115.GM7321@digitalkingdom.org> <20100730181130.GS4511@digitalkingdom.org> <20100730183052.GA38308@alice.local> <20100907233227.GI5990@digitalkingdom.org> <20100908035951.GM38255@alice.local> MIME-Version: 1.0 In-Reply-To: <20100908035951.GM38255@alice.local> User-Agent: Mutt/1.5.20 (2009-06-14) X-Original-Sender: rlpowell@digitalkingdom.org X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: best guess record for domain of nobody@digitalkingdom.org designates 64.81.66.169 as permitted sender) smtp.mail=nobody@digitalkingdom.org Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline On Tue, Sep 07, 2010 at 09:59:51PM -0600, Alan Post wrote: > My favorite change so far is the following: > > [-forbidden.-] {+forbilien .+} > > Someone changed forbidden to forbilien, twice no less. > > My largest challenge in this project are the fact that I did not > get consistent conversion of non-ASCII characters, so the wdiff > patch is very noisy--anytime a non-ascii character, or an ascii > character with a non-ascii representation (e.g., single and double > quote) appears, it shows up as a diff. I've managed to remove > certain classes of these, and am still finding patterns as I go. There's a unix command called "recode" which can almost certainly fix those problem, just so you know. > How would you like to proceed? Looking at my numbers, I'm wondering > if I've taken the wrong strategy regarding character encoding and > questioning whether I should revisit my pipeline and try to > systematically solve the character encoding problem, rather than > fixing the results of it. That seems like a good plan, yes. > I don't know how many changes to expect to see at the end My > guess is that I should see 500 legitimate changes, but the error > bar an that makes the number a bit specious. *nod* Not sure how I can help, but if you want to send me the two things you're working from I can try my own hand at the encoding problems. -Robin -- http://singinst.org/ : Our last, best hope for a fantastic future. Lojban (http://www.lojban.org/): The language in which "this parrot is dead" is "ti poi spitaki cu morsi", but "this sentence is false" is "na nei". My personal page: http://www.digitalkingdom.org/rlp/ -- You received this message because you are subscribed to the Google Groups "lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/lojban?hl=en.