From lojban-out@lojban.org Fri Apr 30 16:40:17 2004 Return-Path: X-Sender: lojban-out@lojban.org X-Apparently-To: lojban@yahoogroups.com Received: (qmail 336 invoked from network); 30 Apr 2004 23:40:17 -0000 Received: from unknown (66.218.66.216) by m17.grp.scd.yahoo.com with QMQP; 30 Apr 2004 23:40:17 -0000 Received: from unknown (HELO chain.digitalkingdom.org) (64.81.49.134) by mta1.grp.scd.yahoo.com with SMTP; 30 Apr 2004 23:40:17 -0000 Received: from lojban-out by chain.digitalkingdom.org with local (Exim 4.31) id 1BJhc5-00049a-Ls for lojban@yahoogroups.com; Fri, 30 Apr 2004 16:40:13 -0700 Received: from dsl081-049-134.sfo1.dsl.speakeasy.net ([64.81.49.134] helo=chain.digitalkingdom.org) by chain.digitalkingdom.org with esmtp (Exim 4.31) id 1BJhbP-0003ZN-19; Fri, 30 Apr 2004 16:39:31 -0700 Received: with ECARTIS (v1.0.0; list lojban-list); Fri, 30 Apr 2004 16:39:26 -0700 (PDT) Received: from rlpowell by chain.digitalkingdom.org with local (Exim 4.31) id 1BJhbD-0003Yy-0j for lojban-list@lojban.org; Fri, 30 Apr 2004 16:39:19 -0700 Date: Fri, 30 Apr 2004 16:39:19 -0700 Message-ID: <20040430233919.GE14939@digitalkingdom.org> Mail-Followup-To: lojban-list@lojban.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.5.1+cvs20040105i X-archive-position: 7602 X-ecartis-version: Ecartis v1.0.0 Sender: lojban-list-bounce@lojban.org Errors-to: lojban-list-bounce@lojban.org X-original-sender: rlpowell@digitalkingdom.org X-list: lojban-list To: lojban@yahoogroups.com X-eGroups-Remote-IP: 64.81.49.134 X-eGroups-From: Robin Lee Powell From: Robin Lee Powell Reply-To: rlpowell@digitalkingdom.org Subject: [lojban] My parser and ZOI X-Yahoo-Group-Post: member; u=116389790 X-Yahoo-Profile: lojban_out X-Yahoo-Message-Num: 22072 I've decided that having a pre-processor just to handle ZOI wasn't compromising my principles too badly. The entire pre-processor is as follows: - cut - #!/bin/sh cd /home/rlpowell/www/hobbies/lojban/grammar/rats cat - | \ perl -p -e "s/(zoi[\s.]+|la'o[\s.]+)([a-zA-Z']+)([\s.]+.*[\s.]+)(\2)([\s.]+|\z)/\1QZOIMARKER\2\3\4\5QZOIMARKER/" | \ /usr/local/java/bin/java xtc/parser/PParser /dev/stdin - cut - The perl isn't nearly as bad as it looks; there's nothing there that egrep can't do, except for the replacement and \z. What this does is replace[1] the zoi boundary word (i.e. gy in "zoi gy whee! gy") with the string QZOIMARKER. That string was chosen to be descriptive. The Q is there to make sure it's not valid Lojban, so that the pre-processor's efforts can never be mis-construed as valid for some reason other than processing ZOI correctly. -Robin [1]: Technically, it prepends it in front of the first boundary word and postpends it after the second; this is so that the entire original text is recoverable. -- http://www.digitalkingdom.org/~rlpowell/ *** I'm a *male* Robin. "Many philosophical problems are caused by such things as the simple inability to shut up." -- David Stove, liberally paraphrased. http://www.lojban.org/ *** loi pimlu na srana .i ti rokci morsi