[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lojban] CLL diffs



On Tue, Sep 07, 2010 at 09:08:43PM -0700, Robin Lee Powell wrote:
> Not sure how I can help, but if you want to send me the two things
> you're working from I can try my own hand at the encoding problems.
> 

Please take the CLL word document and convert it to a text-based
format.  I'd like to see if you could create a better version of
that than I could.  After I got it in a roughly working form, I had to
 strip poorly formatted html (particularly index tags).  I think my
sed command (below) was an attempt to work around character encoding
(and can therefor be ignored), and I had to remove some internal
markers (the mkhtml stuff).  I publish that part of my pipeline
below, but only for documentation over what I'm *not* having trouble
with:

<++> Makefile
CLL.2.txt: CLL.1.txt
        grep -v -e 'Revision:' -e 'mkhtml' < CLL.1.txt > CLL.2.txt

CLL.1.txt: CLL.0.txt
        sed "s/\\?\\([a-zA-Z \']*\\)\\?/\\\"\\1\\\"/g" < CLL.0.txt > CLL.1.txt

CLL.0.txt: strip CLL.txt
        ./strip CLL.txt > CLL.0.txt

strip: main.o strip.o
        csc -o strip main.o strip.o

main.o: main.scm
        csc -c -o main.o main.scm

strip.o: strip.scm
        csc -c -o strip.o strip.scm
<-->

Excuse my lisp code written like C code.  This strips html tags from
an input file, which I needed to do to clean up the .doc file.

<++> strip.scm
(declare (unit main))
(use srfi-13)
(use extras)

(define-syntax 1+
  (syntax-rules ()
    ((1+ n)
     (+ 1 n))))

(define-syntax 1-
  (syntax-rules ()
    ((1- n)
     (- n 1))))

(define *i* 0)
(define *n* 4096)
(define *buffer* (make-string (1+ *n*) #\nul))

(define (lex port)
  (string-set! *buffer*
               (read-string! *n* *buffer* port)
               #\nul))

;; move any characters in the buffer to the front,
;; and fill the rest of the buffer from |port|. 
;;
(define (fill port)
  (string-copy! *buffer* 0
                *buffer* *i* *n*)
  (string-set! *buffer*
               (read-string! *i* *buffer* port (- *n* *i*))
               #\nul)
  (set! *i* 0))

;; return the next character from |buffer|.
;;
;; |fill| must be called before |getc| will return
;; a value.  if |getc| returns |#\null|, |fill| may
;; be called to refill the buffer.
;;
(define (peek)
  (string-ref *buffer* *i*))

(define (getc)
  (let ((i *i*))
    (set! *i* (1+ *i*))
    (string-ref *buffer* i)))

(define (ungetc)
  (set! *i* (1- *i*)))

;;
;;
(define (next port)
  (let ((ch (getc)))
    (case ch
      ((#\nul)                   ; eof
       (ungetc)
       (fill port)
       (if (not (char=? #\nul (peek)))
           (next port)))
      ((#\<)
       (next-tag port (getc)))
      (else                      ; text
       (print* ch)
       (next port)))))

(define (next-tag port ch)
  (case ch
    ((#\>)
     (next port))
    ((#\nul)
     (ungetc)
     (fill port)
     (if (not (char=? #\nul (peek)))
         (next-tag port (getc))))
    (else
     (next-tag port (getc)))))

(define (html port)
  (lex port)
  (next port))

(define (main file)
  (call-with-input-file file html))
<-->


> -Robin
> 
> -- 
> http://singinst.org/ :  Our last, best hope for a fantastic future.
> Lojban (http://www.lojban.org/): The language in which "this parrot
> is dead" is "ti poi spitaki cu morsi", but "this sentence is false"
> is "na nei".   My personal page: http://www.digitalkingdom.org/rlp/
> 
> -- 
> You received this message because you are subscribed to the Google Groups "lojban" group.
> To post to this group, send email to lojban@googlegroups.com.
> To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/lojban?hl=en.
> 

-- 
ko djuno fi le do sevzi

-- 
You received this message because you are subscribed to the Google Groups "lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban?hl=en.