[lojban] LLLP (Lua LPeg Lojban Parser), alpha version available for testing

I've decided to release an alpha version of my parser for testing and comments as I must stop developing it for a while - it is causing me too much loss of sleep and disturbance of domestic peace :)

A package containing the parser scripts and a luajit linux binary with a built-in LPeg library is available at this stage at my web site: http://galactinus.net/vilva/lllp.tgz (235 kb). Mac and Windows users will have to get either lua or luajit2 and the LPeg library from elsewhere.

I've checked the code once more and cleaned it a little bit. There shouldn't be any major problems, but as I haven't yet been able to do extensive testing, I decided to publish yhe program as an alpha version and only on my own site. There may be obscure errors in the PEG so the output of this version shouldn't be taken as gospel. THIS ISN'T A CERTIFIED PARSER. Anyway, it handles most of "Alice".

I've appended the README file from the package.

Veijo

LLLP = Lua LPeg Lojban Parser

(Version = alpha)

Requirements:

lua5.1x (or luajit2) and the LPeg library (either built-in or external)

LuaJIT doesn't seem to offer any benefit for the PEG but makes a difference

in auxiliary operations. However, for reasonable sized texts the difference

is negligible.

lua : http://www.lua.org

luajit : http://luajit.org

LPeg : http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html

http://www.inf.puc-rio.br/~roberto/docs/peg.pdf (theoretical basis)

LLLP files:

lllp.lua the main program script

lllp_morphology.lua the Lojban morphology PEG

lllp_syntax_r.lua the Lojban syntax PEG, a reduced output version

lllp_syntax_f.lua the Lojban syntax PEG, a "full" output version

The reduced output version of the syntax PEG omits the numbered intermediate

rules (e.g. term-1) from the output because the depth of the "full" parse

tree can exceed the maximum number of syntax levels an unmodified lua/luajit

interpreter can handle (200 levels), and increasing the limit can be unsafe.

While parsing "Alice" using the "full" output version, the program hit the

limit at three points. I've set the program to use the reduced output version

as the full output isn't usually required. The version to use can be changed

by editing the main program script.

A luajit linux binary with built-in LPeg library is included in the package.

Running:

luajit lllp.lua lojban_file_name

NB. the output goes to STDOUT and can be re-directed as required

NB. the output parameters can only be set by editing lllp.lua

NB. input text is sliced at blank lines and handled block by block.

This means that terminated structures MUST NOT span blank lines!

NB. punctuation handling is still deficient (this is an alpha version)

Lua commenting conventions can be used within the Lojban files:

-- two or more adjacent dashes mark the rest of the line as comment

--[[ starts a multi-line comment ending at --]] or EOF

A space between -- and [[ can be used to de-activate the commenting

The output can be fine-tuned quite extensively by editing lllp.lua. It is

also possible to add processing stages at various points.

** The program gathers statistics about word usage.

There is no error handling. The processing of a block terminates when a

syntax error is found, and the program continues with the next block if any.

I've tested the parser with both single sentences and the full "Alice", and

there don't seem to be any major problems. Alice does contain a number of

blocks which don't pass the parser, but most do. On a decent PC the process

takes about one minute, and the reduced tree output interleaved with the

source text blocks is about 1,200 A4 pages long (1,300 Letter size), the full

tree would be about 16,000 pages.

web site: http://galactinus.net/vilva/

on Google+: https://plus.google.com/106533767817816079660/posts