Received: from mail-pz0-f61.google.com ([209.85.210.61]:56734) by stodi.digitalkingdom.org with esmtps (TLSv1:RC4-SHA:128) (Exim 4.76) (envelope-from ) id 1ShcRJ-00033e-63; Thu, 21 Jun 2012 01:08:34 -0700 Received: by daek18 with SMTP id k18sf380802dae.16 for ; Thu, 21 Jun 2012 01:08:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:mime-version:date:message-id:subject:from :to:x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-google-group-id:list-post :list-help:list-archive:sender:list-subscribe:list-unsubscribe :content-type; bh=k0vZ2RNh1K/DP/+lfPFz1bRu+/u0brgfvGI951h1lqY=; b=wjYn7C6KeJrDjpNBUEZMxijcU+YPlTENSRBf/pFKjtD85TnJq+0IoW3fYXRH8kNUfO ZjYt20hgwqV062Y0gt5uBXKhyBSJAB4WWVX0dDu8JbUunBeUscFAcLU9jTMZrziRwYEh eBvUaAFu6TRNpIwGTje9BO3aeehljNV2PD7rE= Received: by 10.52.34.8 with SMTP id v8mr1850056vdi.5.1340266102893; Thu, 21 Jun 2012 01:08:22 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.52.69.174 with SMTP id f14ls64673vdu.2.gmail; Thu, 21 Jun 2012 01:08:21 -0700 (PDT) Received: by 10.52.179.69 with SMTP id de5mr18877971vdc.7.1340266101956; Thu, 21 Jun 2012 01:08:21 -0700 (PDT) Received: by 10.52.179.69 with SMTP id de5mr18877970vdc.7.1340266101942; Thu, 21 Jun 2012 01:08:21 -0700 (PDT) Received: from mail-vb0-f48.google.com (mail-vb0-f48.google.com [209.85.212.48]) by gmr-mx.google.com with ESMTPS id u6si8578460vdi.1.2012.06.21.01.08.21 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 21 Jun 2012 01:08:21 -0700 (PDT) Received-SPF: pass (google.com: domain of veijo.vilva@gmail.com designates 209.85.212.48 as permitted sender) client-ip=209.85.212.48; Received: by vbjk17 with SMTP id k17so210487vbj.35 for ; Thu, 21 Jun 2012 01:08:21 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.148.196 with SMTP id q4mr13135948vcv.36.1340266101654; Thu, 21 Jun 2012 01:08:21 -0700 (PDT) Received: by 10.52.159.193 with HTTP; Thu, 21 Jun 2012 01:08:21 -0700 (PDT) Date: Thu, 21 Jun 2012 11:08:21 +0300 Message-ID: Subject: [lojban] Testing Lua/LPeg version of the Lojban PEG From: Veijo Vilva To: lojban@googlegroups.com, lojban-list@lojban.org X-Original-Sender: veijo.vilva@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of veijo.vilva@gmail.com designates 209.85.212.48 as permitted sender) smtp.mail=veijo.vilva@gmail.com; dkim=pass header.i=@gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: multipart/alternative; boundary=f46d0438939de4c3d904c2f704ab X-Spam-Score: -0.7 (/) X-Spam_score: -0.7 X-Spam_score_int: -6 X-Spam_bar: / --f46d0438939de4c3d904c2f704ab Content-Type: text/plain; charset=ISO-8859-1 I,ve now got a preliminary version of the full parser running. Basically the parser is just the re-formatted PEG, the rest is about 150 lines of quite ordinary Lua code for glue between the stages, some very small help functions and pretty printing. The source files (The driver program, the morphology PEG and the grammar PEG) presently total 2560 lines (incl. the comments and the empty separator lines), about 78 kbytes. There is no binary for the parser, which is compiled for each run. The compilation time is about 100 ms on any decent PC, which has helped a lot during the testing and refinement stage. The parser outputs a full parse tree in the form of a Lua table definition instead of something intended for immediate human consumption or interpretation by some external program. The inherently available Lua interpreter is used to compile the definition into a table, which can then be recursively traversed with (very) simple routines to produce any desired kind of output. I've omitted the erasure handling rules as they seemed to cause too much slowdown and rewritten some rules to speed up the parsing process. I've also added rules to bracket sumti tcita and zei lujvo. I had to add rules to the morphology PEG in order to keep any quoted non-Lojban text intact - now the quoted text is sent as a single non-L word to the grammar PEG. The parser will handle multiple paragraph text, but I haven't yet any ideas about error recovery or meaningful error messages. Presently the program just produces an output up to the last structure passing the parser, which isn't very helpful - especially as there sometimes is no output what so ever. My present, very simple pretty printer is quite flexible. It can produce either the full parse tree, which is probably required only for checking the parser, or omit the numbered sub-rules (sumti-1,...) or omit any user-defined set of intermediate levels from the tree. It would be trivial to add glosses for cmavo and gismu to the output. I've also given some thought to passing the lujvo split from the morphology PEG. I'll have to do some more testing before releasing the program for general consumption. I also haven't yet given any thought to the user interface. Presently all the parameters are set by editing the driver program as the compilation time is no problem. Veijo Some examples of the present output: a) a "full" tree (without the numbered sub-rules) text | paragraphs | | paragraph | | | statement | | | | sentence | | | | | terms | | | | | | term | | | | | | | sumti | | | | | | | | KOhA mi | | | | | | term | | | | | | | tag | | | | | | | | tense modal | | | | | | | | | simple tense modal | | | | | | | | | | time | | | | | | | | | | | time offset | | | | | | | | | | | | PU ba | | | | | bridi tail | | | | | | selbri | | | | | | | tanru unit | | | | | | | | BRIVLA gismu zgana | | | | | | tail terms | | | | | | | terms | | | | | | | | term | | | | | | | | | sumti | | | | | | | | | | description | | | | | | | | | | | LE le | | | | | | | | | | | sumti tail | | | | | | | | | | | | selbri | | | | | | | | | | | | | tanru unit | | | | | | | | | | | | | | abstraction | | | | | | | | | | | | | | | NU du'u | | | | | | | | | | | | | | | subsentence | | | | | | | | | | | | | | | | sentence | | | | | | | | | | | | | | | | | terms | | | | | | | | | | | | | | | | | | term | | | | | | | | | | | | | | | | | | | sumti | | | | | | | | | | | | | | | | | | | | name | | | | | | | | | | | | | | | | | | | | | LA la | | | | | | | | | | | | | | | | | | | | | CMENE djan | | | | | | | | | | | | | | | | | | | | joik ek | | | | | | | | | | | | | | | | | | | | | ek | | | | | | | | | | | | | | | | | | | | | | A ji | | | | | | | | | | | | | | | | | | | | | | indicators | | | | | | | | | | | | | | | | | | | | | | | indicator | | | | | | | | | | | | | | | | | | | | | | | | UI kau | | | | | | | | | | | | | | | | | | | | name | | | | | | | | | | | | | | | | | | | | | LA la | | | | | | | | | | | | | | | | | | | | | CMENE djordz | | | | | | | | | | | | | | | | | CU clause | | | | | | | | | | | | | | | | | | CU cu | | | | | | | | | | | | | | | | | bridi tail | | | | | | | | | | | | | | | | | | selbri | | | | | | | | | | | | | | | | | | | tanru unit | | | | | | | | | | | | | | | | | | | | BRIVLA gismu zvati | | | | | | | | | | | | | | | | | | tail terms | | | | | | | | | | | | | | | | | | | terms | | | | | | | | | | | | | | | | | | | | term | | | | | | | | | | | | | | | | | | | | | sumti | | | | | | | | | | | | | | | | | | | | | | description | | | | | | | | | | | | | | | | | | | | | | | LE le | | | | | | | | | | | | | | | | | | | | | | | sumti tail | | | | | | | | | | | | | | | | | | | | | | | | selbri | | | | | | | | | | | | | | | | | | | | | | | | | tanru unit | | | | | | | | | | | | | | | | | | | | | | | | | | BRIVLA gismu panka b) the same tree after omitting some intermediate levels (an ad-lib pruning made by giving a list of rules to omit) paragraph | statement | | sentence | | | sumti | | | | KOhA mi | | | tense modal | | | | time | | | | | time offset | | | | | | PU ba | | | selbri | | | | BRIVLA gismu zgana | | | sumti | | | | description | | | | | LE le | | | | | selbri | | | | | | abstraction | | | | | | | NU du'u | | | | | | | sentence | | | | | | | | sumti | | | | | | | | | name | | | | | | | | | | LA la | | | | | | | | | | CMENE djan | | | | | | | | | ek | | | | | | | | | | A ji | | | | | | | | | | indicator | | | | | | | | | | | UI kau | | | | | | | | | name | | | | | | | | | | LA la | | | | | | | | | | CMENE djordz | | | | | | | | CU clause | | | | | | | | | CU cu | | | | | | | | selbri | | | | | | | | | BRIVLA gismu zvati | | | | | | | | sumti | | | | | | | | | description | | | | | | | | | | LE le | | | | | | | | | | selbri | | | | | | | | | | | BRIVLA gismu panka c) the tree can also indicate any elided terminators paragraph | statement | | sentence | | | sumti | | | | KOhA mi | | | tense modal | | | | time | | | | | time offset | | | | | | PU ba | | | *ELIDED KU | | | *ELIDED CU | | | selbri | | | | BRIVLA gismu zgana | | | sumti | | | | description | | | | | LE le | | | | | selbri | | | | | | abstraction | | | | | | | NU du'u | | | | | | | sentence | | | | | | | | sumti | | | | | | | | | name | | | | | | | | | | LA la | | | | | | | | | | CMENE djan | | | | | | | | | ek | | | | | | | | | | A ji | | | | | | | | | | indicator | | | | | | | | | | | UI kau | | | | | | | | | name | | | | | | | | | | LA la | | | | | | | | | | CMENE djordz | | | | | | | | CU clause | | | | | | | | | CU cu | | | | | | | | selbri | | | | | | | | | BRIVLA gismu zvati | | | | | | | | sumti | | | | | | | | | description | | | | | | | | | | LE le | | | | | | | | | | selbri | | | | | | | | | | | BRIVLA gismu panka | | | | | | | | | | *ELIDED KU | | | | | | | | *ELIDED VAU | | | | | | | *ELIDED KEI | | | | | *ELIDED KU | | | *ELIDED VAU d) a sumti tcita example (bracketing the sumti tcita will simplify enumerating the main sumti at some later stage) paragraph | statement | | sentence | | | sumti | | | | KOhA mi | | | selbri | | | | BRIVLA gismu klama | | | sumti | | | | description | | | | | LE le | | | | | selbri | | | | | | BRIVLA gismu zarci | | | sumti tcita | | | | tense modal | | | | | time | | | | | | time offset | | | | | | | PU ca | | | | sumti | | | | | description | | | | | | LE le | | | | | | selbri | | | | | | | abstraction | | | | | | | | NU nu | | | | | | | | sentence | | | | | | | | | sumti | | | | | | | | | | KOhA do | | | | | | | | | selbri | | | | | | | | | | BRIVLA gismu klama | | | | | | | | | sumti | | | | | | | | | | description | | | | | | | | | | | LE le | | | | | | | | | | | selbri | | | | | | | | | | | | BRIVLA gismu zdani e) two zei lujvo examples from the CLL paragraph | statement | | sentence | | | selbri | | | | zei lujvo | | | | | CMAVO NAhE na'e | | | | | ZEI zei | | | | | CMAVO A a | | | | | ZEI zei | | | | | CMAVO NAhE na'e | | | | | ZEI zei | | | | | CMAVO BY by | | | | BRIVLA lujvo livgyterbilma paragraph | statement | | sentence | | | selbri | | | | zei lujvo | | | | | CMAVO NAhE na'e | | | | | ZEI zei | | | | | CMAVO A a | | | | | ZEI zei | | | | | CMAVO NAhE na'e | | | | | ZEI zei | | | | | CMAVO BY by | | | | | ZEI zei | | | | | BRIVLA lujvo livgyterbilma -- You received this message because you are subscribed to the Google Groups "lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/lojban?hl=en. --f46d0438939de4c3d904c2f704ab Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I,ve now got a preliminary version of the full parser runn= ing.

Basically the parser is just the re-formatted PEG, = the rest is about 150 lines of quite ordinary Lua code for glue between the= stages, some very small =A0help functions and pretty printing. The source = files (The driver program, the morphology PEG and the grammar PEG) presentl= y total 2560 lines (incl. the comments and the empty separator lines), abou= t 78 kbytes. There is no binary for the parser, which is compiled for each = run. The compilation time is about 100 ms on any decent PC, which has helpe= d a lot during the testing and refinement stage.

The parser outputs a full parse tree in the form of a L= ua table definition instead of something intended for immediate human consu= mption or interpretation by some external program. The inherently available= Lua interpreter is used to compile the definition into a table, which can = then be recursively traversed with (very) simple routines to produce any de= sired kind of output.

I've omitted the erasure handling rules as they see= med to cause too much slowdown and =A0rewritten some rules to speed up the = parsing process. I've also added =A0rules to bracket sumti tcita and ze= i =A0lujvo. I had to add rules to the morphology PEG in order to keep any q= uoted non-Lojban text intact - now the quoted text is sent as a single non-= L word to the grammar PEG.

The parser will handle multiple paragraph text, but I h= aven't yet any ideas about error recovery or meaningful error messages.= Presently the program just produces an output up to the last structure pas= sing the parser, which isn't very helpful - especially as there sometim= es is no output what so ever.

My present, very simple pretty printer is quite flexibl= e. It can produce either the full parse tree, which is probably required on= ly for checking the parser, or omit the numbered sub-rules (sumti-1,...) or= omit any user-defined set of intermediate levels from the tree. It would b= e trivial to add glosses for cmavo and gismu to the output. I've also g= iven some thought to passing the lujvo split from the morphology PEG.

I'll have to do some more testing before releasing = the program for general consumption. I also haven't yet given any thoug= ht to the user interface. Presently all the parameters are set by editing t= he driver program as the compilation time is no problem.

=A0 Veijo



<= /div>

Some examples of the present output:
a) a "full" tree (without the numbered sub-rules)

text
| paragraphs
| | paragraph
| |= | statement
| | | | sentence
| | | | | terms
| | | | | | term
| | | | | | | sumti
| | | | | | | | K= OhA mi
| | | | | | term
| | | | | | | tag
| | | | | | | |= tense modal
| | | | | | | | | simple tense modal
| | |= | | | | | | | time
| | | | | | | | | | | time offset
| | | | | | | | | | | | PU ba
| | | | | bridi tail
| | = | | | | selbri
| | | | | | | tanru unit
| | | | | | | |= BRIVLA gismu zgana
| | | | | | tail terms
| | | | | | = | terms
| | | | | | | | term
| | | | | | | | | sumti
| | |= | | | | | | | description
| | | | | | | | | | | LE le
= | | | | | | | | | | | sumti tail
| | | | | | | | | | | | selbri
| | | | | | | | | | | | | tanru unit
| | | | | | | | | | | |= | | abstraction
| | | | | | | | | | | | | | | NU du'u
<= div>| | | | | | | | | | | | | | | subsentence
| | | | | | | | | |= | | | | | | sentence
| | | | | | | | | | | | | | | | | terms
| | | | | | | | | | = | | | | | | | | term
| | | | | | | | | | | | | | | | | | | sumti<= /div>
| | | | | | | | | | | | | | | | | | | | name
| | | | | = | | | | | | | | | | | | | | | | LA la
| | | | | | | | | | | | | | | | | | | | | CMENE djan
| | | |= | | | | | | | | | | | | | | | | joik ek
| | | | | | | | | | | | = | | | | | | | | | ek
| | | | | | | | | | | | | | | | | | | | | | = A ji
| | | | | | | | | | | | | | | | | | | | | | indicators
| | |= | | | | | | | | | | | | | | | | | | | | indicator
| | | | | | | = | | | | | | | | | | | | | | | | | UI kau
| | | | | | | | | | | | = | | | | | | | | name
| | | | | | | | | | | | | | | | | | | | | LA la
| | | | | | = | | | | | | | | | | | | | | | CMENE djordz
| | | | | | | | | | | = | | | | | | CU clause
| | | | | | | | | | | | | | | | | | CU cu
| | | | | | | | | | | | | | | | | bridi tail
| | | | | | | |= | | | | | | | | | | selbri
| | | | | | | | | | | | | | | | | | |= tanru unit
| | | | | | | | | | | | | | | | | | | | BRIVLA gismu = zvati
| | | | | | | | | | | | | | | | | | tail terms
| | | | | | |= | | | | | | | | | | | | terms
| | | | | | | | | | | | | | | | | = | | | term
| | | | | | | | | | | | | | | | | | | | | sumti
| | | | | | | | | | | | | | | | | | | | | | description
| | = | | | | | | | | | | | | | | | | | | | | | LE le
| | | | | | | | |= | | | | | | | | | | | | | | sumti tail
| | | | | | | | | | | | |= | | | | | | | | | | | selbri
| | | | | | | | | | | | | | | | | | | | | | | | | tanru unit
| | | | | | | | | | | | | | | | | | | | | | | | | | BRIVLA gismu panka

b) the same tree after omitting some intermediate lev= els (an ad-lib pruning made by giving a list of rules to omit)

paragraph
| statement
| | sent= ence
| | | sumti
| | | | KOhA mi
| | | tense = modal
| | | | time
| | | | | time offset
| | | | | | PU ba
| | | selbri
| | | | BRIVLA gismu zga= na
| | | sumti
| | | | description
| | | | | = LE le
| | | | | selbri
| | | | | | abstraction
| | | | | | | NU du'u
| | | | | | | sentence
|= | | | | | | | sumti
| | | | | | | | | name
| | | | | |= | | | | LA la
| | | | | | | | | | CMENE djan
| | | | |= | | | | ek
| | | | | | | | | | A ji
| | | | | | | | | | indicator
=
| | | | | | | | | | | UI kau
| | | | | | | | | name
| | | | | | | | | | LA la
| | | | | | | | | | CMENE djordz
| | | | | | | | CU clause
| | | | | | | | | CU cu
= | | | | | | | | selbri
| | | | | | | | | BRIVLA gismu zvati
=
| | | | | | | | sumti
| | | | | | | | | description
| | | | | | | | | | LE le
| | | | | | | | | | selbri
| | | | | | | | | | | BRIVLA gismu panka

= c) the tree can also indicate any elided terminators

paragraph
| statement
| | sentence
= | | | sumti
| | | | KOhA mi
| | | tense modal
| | | | time
| | | | | time offset
| | | | | | PU ba
| | | *ELIDED KU
| | | *ELIDED CU
| | | selbri
| | | | BRIVLA gismu zgana
| | | sumti
| | | | d= escription
| | | | | LE le
| | | | | selbri
| | | | | | abstraction
| | | | | | | NU du'u
| | |= | | | | sentence
| | | | | | | | sumti
| | | | | | | |= | name
| | | | | | | | | | LA la
| | | | | | | | | | C= MENE djan
| | | | | | | | | ek
| | | | | | | | | | A ji
| | = | | | | | | | | indicator
| | | | | | | | | | | UI kau
= | | | | | | | | | name
| | | | | | | | | | LA la
| | | = | | | | | | | CMENE djordz
| | | | | | | | CU clause
| | | | | | | | | CU cu
= | | | | | | | | selbri
| | | | | | | | | BRIVLA gismu zvati
=
| | | | | | | | sumti
| | | | | | | | | description
| | | | | | | | | | LE le
| | | | | | | | | | selbri
| | | | | | | | | | | BRIVLA gismu panka
| | | | | | | | | | *= ELIDED KU
| | | | | | | | *ELIDED VAU
| | | | | | | *EL= IDED KEI
| | | | | *ELIDED KU
| | | *ELIDED VAU

<= /div>
d) a sumti tcita example (bracketing the sumti tcita will simplif= y enumerating the main sumti at some later stage)

paragraph
| statement
| | sentence
= | | | sumti
| | | | KOhA mi
| | | selbri
| | | | BRIVLA gismu = klama
| | | sumti
| | | | description
| | | |= | LE le
| | | | | selbri
| | | | | | BRIVLA gismu zarc= i
| | | sumti tcita
| | | | tense modal
| | | | | ti= me
| | | | | | time offset
| | | | | | | PU ca
| | | | sumti
| | | | | description
| | | | | | LE le=
| | | | | | selbri
| | | | | | | abstraction
| | |= | | | | | NU nu
| | | | | | | | sentence
| | | | | | |= | | sumti
| | | | | | | | | | KOhA do
| | | | | | | | = | selbri
| | | | | | | | | | BRIVLA gismu klama
| | | | | | | | | sum= ti
| | | | | | | | | | description
| | | | | | | | | | = | LE le
| | | | | | | | | | | selbri
| | | | | | | | | = | | | BRIVLA gismu zdani

e) two zei lujvo examples from the CLL
=
paragraph
| statement
| | sentence<= /div>
| | | selbri
| | | | zei lujvo
| | | | | CMAV= O =A0NAhE na'e
| | | | | ZEI zei
| | | | | CMAVO =A0A a
| | | | |= ZEI zei
| | | | | CMAVO =A0NAhE na'e
| | | | | ZEI= zei
| | | | | CMAVO =A0BY by
| | | | BRIVLA lujvo livg= yterbilma


paragraph
| stateme= nt
| | sentence
| | | selbri
| | | | zei lujv= o
| | | | | CMAVO =A0NAhE na'e
| | | | | ZEI zei
| | | | | CMAVO =A0A a
| | | | | ZEI zei
| | | | |= CMAVO =A0NAhE na'e
| | | | | ZEI zei
| | | | | CMA= VO =A0BY by
| | | | | ZEI zei
| | | | | BRIVLA lujvo li= vgyterbilma


--
You received this message because you are subscribed to the Google Groups "= lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou= ps.com.
For more options, visit this group at http://groups.google.com/group/lojban= ?hl=3Den.
--f46d0438939de4c3d904c2f704ab--