Received: from mail-gh0-f189.google.com ([209.85.160.189]:37002) by stodi.digitalkingdom.org with esmtps (TLSv1:RC4-SHA:128) (Exim 4.76) (envelope-from ) id 1SkGxm-00007v-Bq; Thu, 28 Jun 2012 08:49:09 -0700 Received: by ghbf16 with SMTP id f16sf2259635ghb.16 for ; Thu, 28 Jun 2012 08:48:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:mime-version:date:message-id:subject:from :to:x-spam-score:x-spam_score:x-spam_score_int:x-spam_bar:sender :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-google-group-id:list-post :list-help:list-archive:list-subscribe:list-unsubscribe:content-type; bh=VTnbbNE/OF/8RPldN1JebbhnFWp2n/tH7MFuLq8ZX2c=; b=kW/Y2KRfQNPgsKI80zqwAe0YgdNDSMGelks+/y+XKzipyYrjVMczNXYgT9fHsBFUrD vdmQdLoPNCu/Up94HJklV3UtfZEOSxOhEotHOqkoGoBHBa7i5ajlOXlL6HpYoL59jZN9 NsLoct6B0p1cizJXMsinas+jRGGuQ4XhSXE0g= Received: by 10.68.233.74 with SMTP id tu10mr599611pbc.2.1340898531898; Thu, 28 Jun 2012 08:48:51 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.68.229.8 with SMTP id sm8ls8516784pbc.0.gmail; Thu, 28 Jun 2012 08:48:51 -0700 (PDT) Received: by 10.68.241.162 with SMTP id wj2mr2365345pbc.2.1340898531273; Thu, 28 Jun 2012 08:48:51 -0700 (PDT) Received: by 10.68.241.162 with SMTP id wj2mr2365342pbc.2.1340898531249; Thu, 28 Jun 2012 08:48:51 -0700 (PDT) Received: from stodi.digitalkingdom.org (mail.digitalkingdom.org. [173.13.139.236]) by gmr-mx.google.com with ESMTPS id ir9si1277820pbc.1.2012.06.28.08.48.51 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 28 Jun 2012 08:48:51 -0700 (PDT) Received-SPF: pass (google.com: domain of nobody@stodi.digitalkingdom.org designates 173.13.139.236 as permitted sender) client-ip=173.13.139.236; Received: from nobody by stodi.digitalkingdom.org with local (Exim 4.76) (envelope-from ) id 1SkGxd-00007U-8v for lojban@googlegroups.com; Thu, 28 Jun 2012 08:48:49 -0700 Received: from mail-vb0-f53.google.com ([209.85.212.53]:49218) by stodi.digitalkingdom.org with esmtps (TLSv1:RC4-SHA:128) (Exim 4.76) (envelope-from ) id 1SkGxU-00006w-4o for lojban-list@lojban.org; Thu, 28 Jun 2012 08:48:48 -0700 Received: by vbbfc26 with SMTP id fc26so1821206vbb.40 for ; Thu, 28 Jun 2012 08:48:33 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.92.228 with SMTP id cp4mr1733623vdb.42.1340898513294; Thu, 28 Jun 2012 08:48:33 -0700 (PDT) Received: by 10.52.159.193 with HTTP; Thu, 28 Jun 2012 08:48:33 -0700 (PDT) Date: Thu, 28 Jun 2012 18:48:33 +0300 Message-ID: Subject: [lojban] LLLP (Lua LPeg Lojban Parser), alpha version available for testing From: Veijo Vilva To: lojban@googlegroups.com, lojban-list@lojban.org X-Spam-Score: -0.1 (/) X-Spam_score: -0.1 X-Spam_score_int: 0 X-Spam_bar: / Sender: lojban@googlegroups.com X-Original-Sender: veijo.vilva@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of nobody@stodi.digitalkingdom.org designates 173.13.139.236 as permitted sender) smtp.mail=nobody@stodi.digitalkingdom.org; dkim=pass header.i=@gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: List-Subscribe: , List-Unsubscribe: , Content-Type: multipart/alternative; boundary=20cf307f3af090970804c38a43ba X-Spam-Score: -0.7 (/) X-Spam_score: -0.7 X-Spam_score_int: -6 X-Spam_bar: / --20cf307f3af090970804c38a43ba Content-Type: text/plain; charset=ISO-8859-1 I've decided to release an alpha version of my parser for testing and comments as I must stop developing it for a while - it is causing me too much loss of sleep and disturbance of domestic peace :) A package containing the parser scripts and a luajit linux binary with a built-in LPeg library is available at this stage at my web site: http://galactinus.net/vilva/lllp.tgz (235 kb). Mac and Windows users will have to get either lua or luajit2 and the LPeg library from elsewhere. I've checked the code once more and cleaned it a little bit. There shouldn't be any major problems, but as I haven't yet been able to do extensive testing, I decided to publish yhe program as an alpha version and only on my own site. There may be obscure errors in the PEG so the output of this version shouldn't be taken as gospel. THIS ISN'T A CERTIFIED PARSER. Anyway, it handles most of "Alice". I've appended the README file from the package. Veijo LLLP = Lua LPeg Lojban Parser (Version = alpha) Requirements: lua5.1x (or luajit2) and the LPeg library (either built-in or external) LuaJIT doesn't seem to offer any benefit for the PEG but makes a difference in auxiliary operations. However, for reasonable sized texts the difference is negligible. lua : http://www.lua.org luajit : http://luajit.org LPeg : http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html http://www.inf.puc-rio.br/~roberto/docs/peg.pdf (theoretical basis) LLLP files: lllp.lua the main program script lllp_morphology.lua the Lojban morphology PEG lllp_syntax_r.lua the Lojban syntax PEG, a reduced output version lllp_syntax_f.lua the Lojban syntax PEG, a "full" output version The reduced output version of the syntax PEG omits the numbered intermediate rules (e.g. term-1) from the output because the depth of the "full" parse tree can exceed the maximum number of syntax levels an unmodified lua/luajit interpreter can handle (200 levels), and increasing the limit can be unsafe. While parsing "Alice" using the "full" output version, the program hit the limit at three points. I've set the program to use the reduced output version as the full output isn't usually required. The version to use can be changed by editing the main program script. A luajit linux binary with built-in LPeg library is included in the package. Running: luajit lllp.lua lojban_file_name NB. the output goes to STDOUT and can be re-directed as required NB. the output parameters can only be set by editing lllp.lua NB. input text is sliced at blank lines and handled block by block. This means that terminated structures MUST NOT span blank lines! NB. punctuation handling is still deficient (this is an alpha version) Lua commenting conventions can be used within the Lojban files: -- two or more adjacent dashes mark the rest of the line as comment --[[ starts a multi-line comment ending at --]] or EOF A space between -- and [[ can be used to de-activate the commenting The output can be fine-tuned quite extensively by editing lllp.lua. It is also possible to add processing stages at various points. ** The program gathers statistics about word usage. There is no error handling. The processing of a block terminates when a syntax error is found, and the program continues with the next block if any. I've tested the parser with both single sentences and the full "Alice", and there don't seem to be any major problems. Alice does contain a number of blocks which don't pass the parser, but most do. On a decent PC the process takes about one minute, and the reduced tree output interleaved with the source text blocks is about 1,200 A4 pages long (1,300 Letter size), the full tree would be about 16,000 pages. -- web site: http://galactinus.net/vilva/ on Google+: https://plus.google.com/106533767817816079660/posts -- You received this message because you are subscribed to the Google Groups "lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/lojban?hl=en. --20cf307f3af090970804c38a43ba Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I've decided to release an alpha version of my parser = for testing and comments as I must stop developing it for a while - it is c= ausing me too much loss of sleep and disturbance of domestic peace :)

A package containing the parser scripts and a luajit linux b= inary with a built-in LPeg library is available at this stage at my web sit= e: http://galactinus.net/v= ilva/lllp.tgz (235 kb). Mac and Windows users will have to get either l= ua or luajit2 and the LPeg library from elsewhere.

I've checked the code once more and cleaned it a li= ttle bit. There shouldn't be any major problems, but as I haven't y= et been able to do extensive testing, I decided to publish yhe program as a= n alpha version and only on my own site. There may be obscure errors in the= PEG so the output of =A0this version shouldn't be taken as gospel. THI= S ISN'T A CERTIFIED PARSER. Anyway, it handles most of "Alice"= ;.

I've appended the README file from the package.

=A0 =A0Veijo



=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 LLLP =3D Lua LPeg Lojb= an Parser

=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (Version= =3D alpha)



=A0 Requ= irements:

=A0 =A0 =A0lua5.1x (or luajit2) and the = LPeg library (either built-in or external)

=A0LuaJIT doesn't seem to offer any benefit for the= PEG but makes a difference
=A0in auxiliary operations. However, = for reasonable sized texts the difference
=A0is negligible.
=

=A0 lua =A0 =A0: http://www.l= ua.org
=A0 luajit : http://luaj= it.org

=A0LLLP files:
=A0
= =A0 lllp.lua =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0the main program script
=A0 lllp_morphology.lua =A0 =A0 the Lojban morphology PEG
= =A0 lllp_syntax_r.lua =A0 =A0 =A0 the Lojban syntax PEG, a reduced output v= ersion
=A0 lllp_syntax_f.lua =A0 =A0 =A0 the Lojban syntax PEG, a= "full" output version
=A0=A0
=A0 The reduced output version of the syntax PEG omit= s the numbered intermediate
=A0 rules (e.g. term-1) from the outp= ut because the depth of the "full" parse
=A0 tree can e= xceed the maximum number of syntax levels an unmodified lua/luajit
=A0 interpreter can handle (200 levels), and increasing the limit can = be unsafe.
=A0 While parsing "Alice" using the "fu= ll" output version, the program hit the
=A0 limit at three p= oints. I've set the program to use the reduced output version
=A0 as the full output isn't usually required. The version to use = can be changed
=A0 by editing the main program script.
= =A0=A0
=A0 A luajit linux binary with built-in LPeg library is in= cluded in the package.
=A0 =A0 =A0
=A0 Running:

=A0 =A0 = =A0luajit lllp.lua lojban_file_name


=A0 NB. the output goes to STDOUT and can be re-directed as required
=

=A0 NB. the output parameters can only be set by editing lllp.lua

=A0 NB. input text is sliced at blank lines and handle= d block by block.
=A0 =A0 =A0 This means that terminated structur= es MUST NOT span blank lines!
=A0 =A0 =A0=A0
=A0 NB. punctuation handling is still deficie= nt (this is an alpha version)
=A0=A0

=A0= Lua commenting conventions can be used within the Lojban files:
=
=A0 =A0 -- =A0 two or more adjacent dashes mark the rest of the line as com= ment
=A0 =A0=A0
=A0 =A0 --[[ starts a multi-line commen= t ending at --]] or EOF
=A0 =A0 =A0 =A0 =A0A space between -- and= [[ can be used to de-activate the commenting

=A0 The output can be fine-tuned quite extensively by e= diting lllp.lua. It is
=A0 also possible to add processing stages= at various points.
=A0=A0
=A0 ** The program gathers s= tatistics about word usage.=A0
=A0=A0
=A0 There is no error handling. The processing of a b= lock terminates when a
=A0 syntax error is found, and the program= continues with the next block if any.
=A0=A0
=A0 I'= ;ve tested the parser with both single sentences and the full "Alice&q= uot;, and
=A0 there don't seem to be any major problems. Alice does contain = a number of
=A0 blocks which don't pass the parser, but most = do. On a decent PC the process
=A0 takes about one minute, and th= e reduced tree output interleaved with the
=A0 source text blocks is about 1,200 A4 pages long (1,300 Letter size= ), the full
=A0 tree would be about 16,000 pages.

--
You received this message because you are subscribed to the Google Groups "= lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou= ps.com.
For more options, visit this group at http://groups.google.com/group/lojban= ?hl=3Den.
--20cf307f3af090970804c38a43ba--