On Thu, Jun 21, 2012 at 11:08:21AM +0300, Veijo Vilva wrote:
> I've also added rules to bracket sumti tcita and
> zei lujvo. I had to add rules to the morphology PEG in order to
> keep any quoted non-Lojban text intact - now the quoted text is
> sent as a single non-L word to the grammar PEG.
For all your changes that you believe do not change the language,
can you comment on them at
http://www.lojban.org/tiki/BPFK+Section%3A+Formal+Grammar ?
Mere bracketing shouldn't change the language, but I'll, of course, triple check additions like this before submitting them.
> My present, very simple pretty printer is quite flexible. It can
> produce either the full parse tree, which is probably required
> only for checking the parser, or omit the numbered sub-rules
> (sumti-1,...) or omit any user-defined set of intermediate levels
> from the tree. It would be trivial to add glosses for cmavo and
> gismu to the output. I've also given some thought to passing the
> lujvo split from the morphology PEG.
FWIW, the trick that I've used for programmatic tree pruning, that
works very well, is to prune anything that has only one child.
That, IIRC, is the entire difference between camxes and camxes -v.
That trick definitely simplifies the exclusion rules but don't necessarily make them completely superfluous, and sometimes an additional inclusion list may also help.
> I'll have to do some more testing before releasing the program for
> general consumption.
You might as well throw it up on github for people to play with,
no?
I'll check the PEG -> LPeg translation first as it is all too easy make a few mistakes when mechanically replacing hundreds of concatenations and slashes with the corresponding LPeg operators. The rules which I've rewritten for speed reasons must be triple checked for logic errors - or commented out and replaced with the original versions at this stage.
Veijo