[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lojban] zoi bug in camxes?
On Mon, Jan 24, 2011 at 11:20:40PM -0300, Jorge Llambías wrote:
> On Mon, Jan 24, 2011 at 10:03 PM, Robin Lee Powell
> <rlpowell@digitalkingdom.org> wrote:
> >
> > zoi gy gyrate gy fails in camxes; that seems like a bug (in camxes)
> > to me. It seems to me that the final zoi delimiter must have a
> > pause on both ends. But I haven't read the relevant CLL bit in
> > quite some time; what does it say about that?
>
> CLL: "The cmavo “zoi” (of selma'o ZOI) is a quotation mark for quoting
> non-Lojban text. Its syntax is “zoi X. text .X”, where X is a Lojban
> word (called the delimiting word) which is separated from the quoted
> text by pauses, and which is not found in the written text or spoken
> phoneme stream."
>
> It doesn't say that the first X need be preceded by a pause, nor that
> the final X need be followed by a pause.
>
> But even the pauses that CLL does mention aren't always needed. For
> example camxes probably approves of "zoidadida".
>
> > Certainly for
> >
> > zoi gy. gyrations .gy.
> >
> > to "work" but
> >
> > zoi gy gyrate gy
> >
> > to "not work" is a bug in camxes by my standards; it needs to be one
> > or the other.
>
> Why? From a Lojbanic perspective "gyrations" is a single word, while
> "gyrate" are three words, so there doesn't seem to be a reason (unless
> you know English, but the Lojban parser doesn't) to treat it as one.
>
I might not be able to forgive you, xorxes, for making me download
and read the source code to the official parser. Looking at it, I
a) think we can do better and b) think I better understand why the
CLL is confusingly worded.
In the technical description of the parser, the following statement
is made:
a. If the Lojban word "zoi" (selma'o ZOI) is identified, take the
following Lojban word (which should be end delimited with a pause for
separation from the following non-Lojban text) as an opening delimiter.
Treat all text following that delimiter, until that delimiter recurs
*after a pause*, as grammatically a single token (labelled 'anything_699'
in this grammar). There is no need for processing within this text
except as necessary to find the closing delimiter.
This seems pretty clear-cut to me, but it has almost nothing to do
with the implementation, which contradicts the opening example in
this thead in how it processes anything_699.
(BTW, I'm not clear as to whether a pause is both space and '.', or
whether it is only '.'. Help?)
The implementation is contained in filter.c, in particular the
following lines:
case ZOI_START_MODE:
tok = lex();
if (isEnd(tok)) return tok;
tok->type = any_word_698;
mode = ZOI_STRING_MODE;
delim = tok;
return tok;
case ZOI_STRING_MODE:
result = newtoken();
result->type = anything_699;
for (;;) {
tok = lex();
if (isEnd(tok)) return tok;
if (strcmp(tok->text, delim->text) == 0) break;
tok->type = -1;
add(result, tok);
}
mode = ZOI_END_MODE;
return result;
case ZOI_END_MODE:
/* note: token has already been read */
tok->type = any_word_698;
mode = NORMAL_MODE;
return tok;
If you follow lex(), you find getword(), which is the low-level
tokenizer in the parser. It reads ' ' or '.' delimited strings,
which means it considers "pano" a single token.
As a result, it behaves much like camxes does with gyration, but
I believe it would differ from camxes in parsing "gyrate", which
at this level of processing it would insist on treating as a single
token rather than three Lojban words.
In no case does it go looking for the delimiter inside individual
tokens, a behavior which camxes matches.
The code has the effect of treating everything between the delimiter
words as a single token, but misses edge cases because of the way
the tokenizer works.
-Alan
--
You received this message because you are subscribed to the Google Groups "lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban?hl=en.