[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lojban] Re: PEG grammar issues
- To: lojban-list@lojban.org
- Subject: [lojban] Re: PEG grammar issues
- From: "Jorge Llambías" <jjllambias@gmail.com>
- Date: Mon, 16 Jun 2008 13:34:33 -0300
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=wrPM+bCV+25ZVjL0ea23ZUvp3LKdQYJCCFTkXDDgPvA=; b=kWg32PzdotUzX6YmiduoWiBHNWtB3MlA9P/JYUoG/zsItKzjc8Md0ejLF659u+7n0C k08HA9Y1ZvS/fNENF80r9zL3nQWnHEEgR3SOLXyMqbngP/+/LKBb4lb96VHdYDXOUdwx QzrxjV/PbXFAY9WF0Hj/UbzAdpRMLiTzo6XtM=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=LAVDDPjEIFFNcu4+Vdpa1K7xhMtko7uXK0tfzibEY691wL8OvKU5WHzJuyI+o+ECs1 PVzOlTuMrveV0TSLS2MOGCcrLqVadS9/ZWsTnWbdeZREtu4To4AO4eZGxchsR+cRvv6C AwHYLLCe1C/g39669ov1QK5q7f/4h7yalccRA=
- In-reply-to: <737b61f30806151939g53bbd8a1s3480b51573d433a1@mail.gmail.com>
- References: <737b61f30806151939g53bbd8a1s3480b51573d433a1@mail.gmail.com>
- Reply-to: lojban-list@lojban.org
- Sender: lojban-list-bounce@lojban.org
On Sun, Jun 15, 2008 at 11:39 PM, Chris Capel <pdf23ds@gmail.com> wrote:
>
> First, the top-level production should fail if it can't parse the
> whole string. Currently 'text' ends with an EOF?, which makes it never
> fail.
I think that was on purpose: parse as much as you can parse, and
discard anything unparsable that follows.
> Second, selbri-3 should parse its child selbri-4 into left-associative
> groups. Currently it just parses them all into one group, which is
> misleading and possibly wrong, depending on your interpretation. I
> tried to figure out a way to fix this, but couldn't find a way to do
> so and avoid left recursion in the definition. So I gave up and added
> a post-parsing step in my own parser to group them properly.
The same applies to statement-1, bridi-tail-1 and sumti-2, right?
> Third, tenses that probably ought to be parsed as part of the bridi
> are currently being parsed as head terms, because of the term-1
> production:
>
> term-1 <- sumti / ( !gek (tag / FA-clause free*) (sumti /
> KU-clause? free*) ) / termset / NA-clause KU-clause free*
>
> {mi} {pu} <klama le zarci>
>
> (In braces are term-1 matches, and in angle brackets is the
> bridi-tail.) 'term-1' matches "mi", and then it matches "pu". 'sumti'
> fails on "pu", but '!gek' and 'tag' succeed, and then since
> 'KU-clause' and 'free' are both optional, the second option of
> 'term-1' succeeds. I'm not exactly sure how this one needs to be
> fixed, but what about this:
>
> term-1 <- sumti / term-2 / termset / NA-clause KU-clause free*
>
> term-2 <- !gek (tag (sumti / KU-clause free*) / FA-clause free*
> (sumti / KU-clause? free*) )
>
> Here, 'term-2' is the second option of the original 'term-1', except
> that the third item in the sequence has been factored into the second,
> and the ? removed from 'KU-clause' after 'tag'. It seems to work in my
> parser!
That makes it impossible to omit {ku} in other positions as well.
For example, {mi ka'e pu klama} would fail.
How about "!gek !selbri" instead of just "!gek" in the original rule?
> Fourth, 'term-sa' only appears to match one term sa under some
> conditions. For instance, it doesn't match this:
>
> mi ba klama lo sa lo sa do
>
> which one might imagine could be said by someone with a stutter.
> Here's one possible fix:
>
> term-sa <- term-start (!term-1 (sa-word / SA-clause !term-1) )*
> SA-clause &term-1
SA ought to be ditched or completely reformulated, IMHO.
mu'o mi'e xorxes
To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.