From lojban+bncCIGHwM2rDhDWoZzkBBoEJVmTsg@googlegroups.com Tue Sep 07 21:28:22 2010 Received: from mail-gw0-f61.google.com ([74.125.83.61]) by chain.digitalkingdom.org with esmtp (Exim 4.72) (envelope-from ) id 1OtCGg-0004Xd-3T; Tue, 07 Sep 2010 21:28:22 -0700 Received: by gwb11 with SMTP id 11sf9623658gwb.16 for ; Tue, 07 Sep 2010 21:28:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:x-beenthere:received:received:received :received:received:received-spf:received:received:received:received :received:date:from:to:subject:message-id:mail-followup-to :references:mime-version:in-reply-to:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe:content-type:content-disposition; bh=8OswWrSTeN3nBlNdbbK0QeZeUwqQex3W/9+nMToo9Fo=; b=Wgb67728S9ymhHlneT2uf73Z2jkzrEWmBlJ20yLSz4CmZ0qMxTiW5K8jElS/Ct6C3K nzS/g+9/A+rcDfCc/n4I8HhlDaKyJS1ObBgXELP4APk5p1zSyX6JpomUM+TsnY7Gt7Bo j7AlXmvWBzeAVyL15lmYgwPi6AaJM0kjhBFog= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:date:from:to:subject:message-id :mail-followup-to:references:mime-version:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type :content-disposition; b=PKgXDrH+fGRaT6uTpdR8vQN+uJ/ATgWSkNkuNuRhSav2T+JC/qMgFEI2kn3Nmo4zGS o5q0DynGGf5qopM9+fIPcT5NS0AnuiSFyk2JVcbCqwiYVNnDFxwhWxMPOSILRmrjyf0f CLDRtrsGbJOZ2gSbVdMxLWVcu4fhJzFW/Z96E= Received: by 10.91.215.15 with SMTP id s15mr981971agq.19.1283920086679; Tue, 07 Sep 2010 21:28:06 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.150.17.2 with SMTP id 2ls1795314ybq.0.p; Tue, 07 Sep 2010 21:28:05 -0700 (PDT) Received: by 10.151.62.9 with SMTP id p9mr1837507ybk.13.1283920085898; Tue, 07 Sep 2010 21:28:05 -0700 (PDT) Received: by 10.114.127.10 with SMTP id z10mr1742865wac.15.1283919615096; Tue, 07 Sep 2010 21:20:15 -0700 (PDT) Received: by 10.114.127.10 with SMTP id z10mr1742864wac.15.1283919615020; Tue, 07 Sep 2010 21:20:15 -0700 (PDT) Received: from chain.digitalkingdom.org (chain.digitalkingdom.org [64.81.66.169]) by gmr-mx.google.com with ESMTP id d12si11993695wam.3.2010.09.07.21.20.13; Tue, 07 Sep 2010 21:20:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of nobody@digitalkingdom.org designates 64.81.66.169 as permitted sender) client-ip=64.81.66.169; Received: from nobody by chain.digitalkingdom.org with local (Exim 4.72) (envelope-from ) id 1OtC8q-0005F7-As for lojban@googlegroups.com; Tue, 07 Sep 2010 21:20:12 -0700 Received: from mail-gy0-f181.google.com ([209.85.160.181]) by chain.digitalkingdom.org with esmtp (Exim 4.72) (envelope-from ) id 1OtC8m-0005De-8C for lojban-list@lojban.org; Tue, 07 Sep 2010 21:20:12 -0700 Received: by gyf1 with SMTP id 1so3200276gyf.40 for ; Tue, 07 Sep 2010 21:20:02 -0700 (PDT) Received: by 10.150.92.9 with SMTP id p9mr230204ybb.198.1283919601861; Tue, 07 Sep 2010 21:20:01 -0700 (PDT) Received: from sunflowerriver.org (c-68-35-167-179.hsd1.nm.comcast.net [68.35.167.179]) by mx.google.com with ESMTPS id q31sm5820983yba.12.2010.09.07.21.19.58 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 07 Sep 2010 21:20:01 -0700 (PDT) Date: Tue, 7 Sep 2010 22:19:49 -0600 From: Alan Post To: lojban-list@lojban.org Subject: Re: [lojban] CLL diffs Message-ID: <20100908041949.GB55480@alice.local> Mail-Followup-To: lojban-list@lojban.org References: <20100611173115.GM7321@digitalkingdom.org> <20100730181130.GS4511@digitalkingdom.org> <20100730183052.GA38308@alice.local> <20100907233227.GI5990@digitalkingdom.org> <20100908035951.GM38255@alice.local> <20100908040843.GL5990@digitalkingdom.org> Mime-Version: 1.0 In-Reply-To: <20100908040843.GL5990@digitalkingdom.org> X-Original-Sender: alanpost@sunflowerriver.org X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: best guess record for domain of nobody@digitalkingdom.org designates 64.81.66.169 as permitted sender) smtp.mail=nobody@digitalkingdom.org Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline On Tue, Sep 07, 2010 at 09:08:43PM -0700, Robin Lee Powell wrote: > Not sure how I can help, but if you want to send me the two things > you're working from I can try my own hand at the encoding problems. > Please take the CLL word document and convert it to a text-based format. I'd like to see if you could create a better version of that than I could. After I got it in a roughly working form, I had to strip poorly formatted html (particularly index tags). I think my sed command (below) was an attempt to work around character encoding (and can therefor be ignored), and I had to remove some internal markers (the mkhtml stuff). I publish that part of my pipeline below, but only for documentation over what I'm *not* having trouble with: <++> Makefile CLL.2.txt: CLL.1.txt grep -v -e 'Revision:' -e 'mkhtml' < CLL.1.txt > CLL.2.txt CLL.1.txt: CLL.0.txt sed "s/\\?\\([a-zA-Z \']*\\)\\?/\\\"\\1\\\"/g" < CLL.0.txt > CLL.1.txt CLL.0.txt: strip CLL.txt ./strip CLL.txt > CLL.0.txt strip: main.o strip.o csc -o strip main.o strip.o main.o: main.scm csc -c -o main.o main.scm strip.o: strip.scm csc -c -o strip.o strip.scm <--> Excuse my lisp code written like C code. This strips html tags from an input file, which I needed to do to clean up the .doc file. <++> strip.scm (declare (unit main)) (use srfi-13) (use extras) (define-syntax 1+ (syntax-rules () ((1+ n) (+ 1 n)))) (define-syntax 1- (syntax-rules () ((1- n) (- n 1)))) (define *i* 0) (define *n* 4096) (define *buffer* (make-string (1+ *n*) #\nul)) (define (lex port) (string-set! *buffer* (read-string! *n* *buffer* port) #\nul)) ;; move any characters in the buffer to the front, ;; and fill the rest of the buffer from |port|. ;; (define (fill port) (string-copy! *buffer* 0 *buffer* *i* *n*) (string-set! *buffer* (read-string! *i* *buffer* port (- *n* *i*)) #\nul) (set! *i* 0)) ;; return the next character from |buffer|. ;; ;; |fill| must be called before |getc| will return ;; a value. if |getc| returns |#\null|, |fill| may ;; be called to refill the buffer. ;; (define (peek) (string-ref *buffer* *i*)) (define (getc) (let ((i *i*)) (set! *i* (1+ *i*)) (string-ref *buffer* i))) (define (ungetc) (set! *i* (1- *i*))) ;; ;; (define (next port) (let ((ch (getc))) (case ch ((#\nul) ; eof (ungetc) (fill port) (if (not (char=? #\nul (peek))) (next port))) ((#\<) (next-tag port (getc))) (else ; text (print* ch) (next port))))) (define (next-tag port ch) (case ch ((#\>) (next port)) ((#\nul) (ungetc) (fill port) (if (not (char=? #\nul (peek))) (next-tag port (getc)))) (else (next-tag port (getc))))) (define (html port) (lex port) (next port)) (define (main file) (call-with-input-file file html)) <--> > -Robin > > -- > http://singinst.org/ : Our last, best hope for a fantastic future. > Lojban (http://www.lojban.org/): The language in which "this parrot > is dead" is "ti poi spitaki cu morsi", but "this sentence is false" > is "na nei". My personal page: http://www.digitalkingdom.org/rlp/ > > -- > You received this message because you are subscribed to the Google Groups "lojban" group. > To post to this group, send email to lojban@googlegroups.com. > To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/lojban?hl=en. > -- ko djuno fi le do sevzi -- You received this message because you are subscribed to the Google Groups "lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/lojban?hl=en.