From lojban+bncCLr6ktCfBBDGqfvpBBoEqz5ibw@googlegroups.com Tue Jan 25 05:49:13 2011 Received: from mail-pw0-f61.google.com ([209.85.160.61]) by chain.digitalkingdom.org with esmtp (Exim 4.72) (envelope-from ) id 1PhjGc-0001MZ-BS; Tue, 25 Jan 2011 05:49:12 -0800 Received: by pwi2 with SMTP id 2sf131867pwi.16 for ; Tue, 25 Jan 2011 05:49:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:x-beenthere:received-spf:date:from:to:subject :message-id:mail-followup-to:references:mime-version:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type :content-disposition:content-transfer-encoding; bh=s9OM3R3rOITpAJGp9mBFuURBMYLwR6wgLpqvOZnSP1A=; b=LbBbuNqrf40i54SLWJL0LNX9oULV12uMRGFHxEClvPeqq0AdMrOjnObJBLtD9jmYP0 nT7oBMf9bAUPn5GKpC40QwEUQZ/WAPD9LuGL6Ne9YOUjZHpYyiDfXtW57tzfCbNHGQhR QVFhDF+dltMd3OIvOP1eX2rjKyoVzsHZYTHVU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:date:from:to:subject:message-id :mail-followup-to:references:mime-version:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type :content-disposition:content-transfer-encoding; b=Uc0FtAZdHoCYKMrlB6PlA6QnRDJQ6UqwtwqD6lIMLAeZshT7HDD605PAD5jU66/IMp Nh9wjR6Gf2jX2OVMp4X8d1+4BahmvZlLZOqd+AaIljhMB5us8glFhh0Q5BOw7Vti/BVe EIan8Q01OOz0mPZexpzNmENJy3s2ns6HBJEDo= Received: by 10.142.155.13 with SMTP id c13mr297612wfe.60.1295963334381; Tue, 25 Jan 2011 05:48:54 -0800 (PST) X-BeenThere: lojban@googlegroups.com Received: by 10.142.249.41 with SMTP id w41ls731825wfh.1.p; Tue, 25 Jan 2011 05:48:53 -0800 (PST) Received: by 10.142.147.10 with SMTP id u10mr985405wfd.41.1295963333446; Tue, 25 Jan 2011 05:48:53 -0800 (PST) Received: by 10.142.147.10 with SMTP id u10mr985404wfd.41.1295963333403; Tue, 25 Jan 2011 05:48:53 -0800 (PST) Received: from mail-pw0-f48.google.com (mail-pw0-f48.google.com [209.85.160.48]) by gmr-mx.google.com with ESMTPS id w13si15680159wfh.3.2011.01.25.05.48.53 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 25 Jan 2011 05:48:53 -0800 (PST) Received-SPF: neutral (google.com: 209.85.160.48 is neither permitted nor denied by best guess record for domain of alanpost@sunflowerriver.org) client-ip=209.85.160.48; Received: by pwj9 with SMTP id 9so80625pwj.35 for ; Tue, 25 Jan 2011 05:48:53 -0800 (PST) Received: by 10.142.185.8 with SMTP id i8mr5158304wff.169.1295963332984; Tue, 25 Jan 2011 05:48:52 -0800 (PST) Received: from sunflowerriver.org (234.sub-69-96-210.myvzw.com [69.96.210.234]) by mx.google.com with ESMTPS id w22sm18876835wfd.19.2011.01.25.05.48.49 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 25 Jan 2011 05:48:51 -0800 (PST) Date: Tue, 25 Jan 2011 06:48:46 -0700 From: ".alyn.post." To: lojban@googlegroups.com Subject: Re: [lojban] zoi bug in camxes? Message-ID: <20110125134846.GA29851@234.sub-69-96-210.myvzw.com> Mail-Followup-To: lojban@googlegroups.com References: <20110124162504.GB27137@alice.local> <20110125010310.GA26224@digitalkingdom.org> Mime-Version: 1.0 In-Reply-To: X-Original-Sender: alyn.post@lodockikumazvati.org X-Original-Authentication-Results: gmr-mx.google.com; spf=neutral (google.com: 209.85.160.48 is neither permitted nor denied by best guess record for domain of alanpost@sunflowerriver.org) smtp.mail=alanpost@sunflowerriver.org Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: text/plain; charset=windows-1252 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jan 24, 2011 at 11:20:40PM -0300, Jorge Llamb=EDas wrote: > On Mon, Jan 24, 2011 at 10:03 PM, Robin Lee Powell > wrote: > > > > zoi gy gyrate gy fails in camxes; that seems like a bug (in camxes) > > to me. =A0It seems to me that the final zoi delimiter must have a > > pause on both ends. =A0But I haven't read the relevant CLL bit in > > quite some time; what does it say about that? >=20 > CLL: "The cmavo =93zoi=94 (of selma'o ZOI) is a quotation mark for quotin= g > non-Lojban text. Its syntax is =93zoi X. text .X=94, where X is a Lojban > word (called the delimiting word) which is separated from the quoted > text by pauses, and which is not found in the written text or spoken > phoneme stream." >=20 > It doesn't say that the first X need be preceded by a pause, nor that > the final X need be followed by a pause. >=20 > But even the pauses that CLL does mention aren't always needed. For > example camxes probably approves of "zoidadida". >=20 > > Certainly for > > > > =A0zoi gy. gyrations .gy. > > > > to "work" but > > > > =A0zoi gy gyrate gy > > > > to "not work" is a bug in camxes by my standards; it needs to be one > > or the other. >=20 > Why? From a Lojbanic perspective "gyrations" is a single word, while > "gyrate" are three words, so there doesn't seem to be a reason (unless > you know English, but the Lojban parser doesn't) to treat it as one. >=20 I might not be able to forgive you, xorxes, for making me download and read the source code to the official parser. Looking at it, I a) think we can do better and b) think I better understand why the CLL is confusingly worded. In the technical description of the parser, the following statement is made: a. If the Lojban word "zoi" (selma'o ZOI) is identified, take the following Lojban word (which should be end delimited with a pause for separation from the following non-Lojban text) as an opening delimiter. Treat all text following that delimiter, until that delimiter recurs *after a pause*, as grammatically a single token (labelled 'anything_699= ' in this grammar). There is no need for processing within this text except as necessary to find the closing delimiter. This seems pretty clear-cut to me, but it has almost nothing to do with the implementation, which contradicts the opening example in this thead in how it processes anything_699. (BTW, I'm not clear as to whether a pause is both space and '.', or whether it is only '.'. Help?) The implementation is contained in filter.c, in particular the following lines: case ZOI_START_MODE: tok =3D lex(); if (isEnd(tok)) return tok; tok->type =3D any_word_698; mode =3D ZOI_STRING_MODE; delim =3D tok; return tok; case ZOI_STRING_MODE: result =3D newtoken(); result->type =3D anything_699; for (;;) { tok =3D lex(); if (isEnd(tok)) return tok; if (strcmp(tok->text, delim->text) =3D=3D 0) break; tok->type =3D -1; add(result, tok); } mode =3D ZOI_END_MODE; return result; case ZOI_END_MODE: /* note: token has already been read */ tok->type =3D any_word_698; mode =3D NORMAL_MODE; return tok; If you follow lex(), you find getword(), which is the low-level tokenizer in the parser. It reads ' ' or '.' delimited strings, which means it considers "pano" a single token. As a result, it behaves much like camxes does with gyration, but I believe it would differ from camxes in parsing "gyrate", which at this level of processing it would insist on treating as a single token rather than three Lojban words. In no case does it go looking for the delimiter inside individual tokens, a behavior which camxes matches. The code has the effect of treating everything between the delimiter words as a single token, but misses edge cases because of the way the tokenizer works. -Alan --=20 You received this message because you are subscribed to the Google Groups "= lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou= ps.com. For more options, visit this group at http://groups.google.com/group/lojban= ?hl=3Den.