[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lojban] zoi bug in camxes?

To: lojban@googlegroups.com
Subject: Re: [lojban] zoi bug in camxes?
From: ".alyn.post." <alyn.post@lodockikumazvati.org>
Date: Tue, 25 Jan 2011 06:48:46 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:x-beenthere:received-spf:date:from:to:subject :message-id:mail-followup-to:references:mime-version:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type :content-disposition:content-transfer-encoding; bh=s9OM3R3rOITpAJGp9mBFuURBMYLwR6wgLpqvOZnSP1A=; b=LbBbuNqrf40i54SLWJL0LNX9oULV12uMRGFHxEClvPeqq0AdMrOjnObJBLtD9jmYP0 nT7oBMf9bAUPn5GKpC40QwEUQZ/WAPD9LuGL6Ne9YOUjZHpYyiDfXtW57tzfCbNHGQhR QVFhDF+dltMd3OIvOP1eX2rjKyoVzsHZYTHVU=
Domainkey-signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:date:from:to:subject:message-id :mail-followup-to:references:mime-version:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type :content-disposition:content-transfer-encoding; b=Uc0FtAZdHoCYKMrlB6PlA6QnRDJQ6UqwtwqD6lIMLAeZshT7HDD605PAD5jU66/IMp Nh9wjR6Gf2jX2OVMp4X8d1+4BahmvZlLZOqd+AaIljhMB5us8glFhh0Q5BOw7Vti/BVe EIan8Q01OOz0mPZexpzNmENJy3s2ns6HBJEDo=
In-reply-to: <AANLkTikYaY+fiGffzk7VH-5DYjUQPSTqEwOp+JfXWAKg@mail.gmail.com>
List-archive: <http://groups.google.com/group/lojban?hl=en_US>
List-help: <http://groups.google.com/support/?hl=en_US>, <mailto:lojban+help@googlegroups.com>
List-id: <lojban.googlegroups.com>
List-post: <http://groups.google.com/group/lojban/post?hl=en_US>, <mailto:lojban@googlegroups.com>
List-subscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:lojban+subscribe@googlegroups.com>
List-unsubscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:lojban+unsubscribe@googlegroups.com>
Mail-followup-to: lojban@googlegroups.com
Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com
References: <20110124162504.GB27137@alice.local> <AANLkTi=ycnCu+m=Xs-NdiyNygWfaMobRDUAR4JFZo2Rj@mail.gmail.com> <20110125010310.GA26224@digitalkingdom.org> <AANLkTikYaY+fiGffzk7VH-5DYjUQPSTqEwOp+JfXWAKg@mail.gmail.com>
Reply-to: lojban@googlegroups.com
Sender: lojban@googlegroups.com

On Mon, Jan 24, 2011 at 11:20:40PM -0300, Jorge Llambías wrote:
> On Mon, Jan 24, 2011 at 10:03 PM, Robin Lee Powell
> <rlpowell@digitalkingdom.org> wrote:
> >
> > zoi gy gyrate gy fails in camxes; that seems like a bug (in camxes)
> > to me.  It seems to me that the final zoi delimiter must have a
> > pause on both ends.  But I haven't read the relevant CLL bit in
> > quite some time; what does it say about that?
> 
> CLL: "The cmavo “zoi” (of selma'o ZOI) is a quotation mark for quoting
> non-Lojban text. Its syntax is “zoi X. text .X”, where X is a Lojban
> word (called the delimiting word) which is separated from the quoted
> text by pauses, and which is not found in the written text or spoken
> phoneme stream."
> 
> It doesn't say that the first X need be preceded by a pause, nor that
> the final X need be followed by a pause.
> 
> But even the pauses that CLL does mention aren't always needed. For
> example camxes probably approves of "zoidadida".
> 
> > Certainly for
> >
> >  zoi gy. gyrations .gy.
> >
> > to "work" but
> >
> >  zoi gy gyrate gy
> >
> > to "not work" is a bug in camxes by my standards; it needs to be one
> > or the other.
> 
> Why? From a Lojbanic perspective "gyrations" is a single word, while
> "gyrate" are three words, so there doesn't seem to be a reason (unless
> you know English, but the Lojban parser doesn't) to treat it as one.
> 

I might not be able to forgive you, xorxes, for making me download
and read the source code to the official parser.  Looking at it, I
a) think we can do better and b) think I better understand why the
CLL is confusingly worded.

In the technical description of the parser, the following statement
is made:

    a. If the Lojban word "zoi" (selma'o ZOI) is identified, take the
   following Lojban word (which should be end delimited with a pause for
   separation from the following non-Lojban text) as an opening delimiter.
   Treat all text following that delimiter, until that delimiter recurs
   *after a pause*, as grammatically a single token (labelled 'anything_699'
   in this grammar).  There is no need for processing within this text
   except as necessary to find the closing delimiter.

This seems pretty clear-cut to me, but it has almost nothing to do
with the implementation, which contradicts the opening example in
this thead in how it processes anything_699.

(BTW, I'm not clear as to whether a pause is both space and '.', or
whether it is only '.'.  Help?)

The implementation is contained in filter.c, in particular the
following lines:

        case ZOI_START_MODE:
                tok = lex();
                if (isEnd(tok)) return tok;
                tok->type = any_word_698;
                mode = ZOI_STRING_MODE;
                delim = tok;
                return tok;
        case ZOI_STRING_MODE:
                result = newtoken();
                result->type = anything_699;
                for (;;) {
                        tok = lex();
                        if (isEnd(tok)) return tok;
                        if (strcmp(tok->text, delim->text) == 0) break;
                        tok->type = -1;
                        add(result, tok);
                        }
                mode = ZOI_END_MODE;
                return result;
        case ZOI_END_MODE:
                /* note: token has already been read */
                tok->type = any_word_698;
                mode = NORMAL_MODE;
                return tok;

If you follow lex(), you find getword(), which is the low-level
tokenizer in the parser.  It reads ' ' or '.' delimited strings,
which means it considers "pano" a single token.

As a result, it behaves much like camxes does with gyration, but
I believe it would differ from camxes in parsing "gyrate", which
at this level of processing it would insist on treating as a single
token rather than three Lojban words.

In no case does it go looking for the delimiter inside individual
tokens, a behavior which camxes matches.

The code has the effect of treating everything between the delimiter
words as a single token, but misses edge cases because of the way
the tokenizer works.

-Alan

-- 
You received this message because you are subscribed to the Google Groups "lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban?hl=en.

References:
- [lojban] zoi bug in camxes?
  - From: ".alyn.post." <alyn.post@lodockikumazvati.org>
- Re: [lojban] zoi bug in camxes?
  - From: Jorge Llambías <jjllambias@gmail.com>
- Re: [lojban] zoi bug in camxes?
  - From: Robin Lee Powell <rlpowell@digitalkingdom.org>
- Re: [lojban] zoi bug in camxes?
  - From: Jorge Llambías <jjllambias@gmail.com>

Prev by Date: Re: [lojban] zoi bug in camxes?
Next by Date: Re: [lojban] zoi bug in camxes?
Previous by thread: Re: [lojban] zoi bug in camxes?
Next by thread: Re: [lojban] zoi bug in camxes?
Index(es):
- Date
- Thread