From nessus@free.fr Fri Dec 13 01:38:47 2002
Return-Path: <nessus@free.fr>
X-Sender: nessus@free.fr
X-Apparently-To: lojban@yahoogroups.com
Received: (EGP: mail-8_2_3_0); 13 Dec 2002 09:38:47 -0000
Received: (qmail 51016 invoked from network); 13 Dec 2002 09:38:46 -0000
Received: from unknown (66.218.66.218)
  by m10.grp.scd.yahoo.com with QMQP; 13 Dec 2002 09:38:46 -0000
Received: from unknown (HELO mel-rto6.wanadoo.fr) (193.252.19.25)
  by mta3.grp.scd.yahoo.com with SMTP; 13 Dec 2002 09:38:46 -0000
Received: from mel-rta8.wanadoo.fr (193.252.19.79) by mel-rto6.wanadoo.fr (6.7.015)
  id 3DF633E0001D62AB for lojban@yahoogroups.com; Fri, 13 Dec 2002 10:38:46 +0100
Received: from tanj (80.9.201.146) by mel-rta8.wanadoo.fr (6.7.015)
  id 3DF62F8400183C05 for lojban@yahoogroups.com; Fri, 13 Dec 2002 10:38:46 +0100
Message-ID: <001f01c2a28b$6facbaa0$92c90950@tanj>
To: <lojban@yahoogroups.com>
Subject: parsing fa'o cmavo
Date: Fri, 13 Dec 2002 10:36:29 +0100
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1106
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106
From: "Lionel Vidal" <nessus@free.fr>
X-Yahoo-Group-Post: member; u=47678341
X-Yahoo-Profile: cmacinf

This is kind of nitpicking really, but a formal specification of the
breaking words algorithm does need to clarify all the cases.

The problem is that the grammar does not require a pause after {fa'o},
(if it does, this whole message is meaningless, and please ignore it)
and so it can be difficult to parse it, because you may find you lost in
parsing what follows {fa'o} (which may not even be lojban text) just
to identify it.
The following example are not really problems,
but can make the life of the parser difficult :-)
- {fa'ojustatest}: is that a name? I would say yes, always
parsing the longuest possible unit.
- {fa'onow}: end of text follow by some english words? ok, but most
parsers will simply bark at the unknow 'w' letter and incorrectly spot
an error. You may have fun trying other things like {fa'omzmz} or
{fa'o<Kanji>}.

And now for something more difficult: a legal brivla including fa'o and
followed by something illegal, like {fa'oFTEmicoy}:
the current backward algorithm rejects it, because of the final {oy} and
a forward algorithmn accepts {fa'oFTEmi}as fu'ivla and barks at {coy}.
But you may just say this is legal: {fa'o} and anything not lojban.

To summarize, the problem is that you may end considering that any
parsing error *after* {fa'o} makes it the true cmavo which would
invalidate its very purpose... that is to stop parsing!

It may well be the case I missed something obvious, or that 
I misunderstood {fa'o} itself, but otherwise I think
this point should be clarify (or corrected) in the grammar.

-- Lionel