Received: from mail-gh0-f189.google.com ([209.85.160.189]:42529) by stodi.digitalkingdom.org with esmtps (TLSv1:RC4-SHA:128) (Exim 4.76) (envelope-from ) id 1SetXv-0001w2-Rf; Wed, 13 Jun 2012 12:48:08 -0700 Received: by ghbf16 with SMTP id f16sf845011ghb.16 for ; Wed, 13 Jun 2012 12:47:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:mime-version:in-reply-to:references:date :message-id:subject:from:to:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-google-group-id:list-post:list-help:list-archive:sender :list-subscribe:list-unsubscribe:content-type; bh=YtKI/Tq+8ICejeD0ZTnpuvqplqLmjYiNGSvzl9NFk2Y=; b=H62Nj0TopzAkRTrZ7q/gb2h5L6nXV+MCj2UKzz+TzEy1WhvbyErRwb15sC72+oJYNW J1AcpjVOjJ0oxnE4sFJ+aHlF2ZaQq3P8F6aepDtrZcZ05hA3Pn5Fmp1xVaKZM2qi6/J9 Lq2UYJfxP6gYyiODvY6PDfFjMkkcTJMvaiWxc= Received: by 10.52.70.97 with SMTP id l1mr279547vdu.15.1339616877556; Wed, 13 Jun 2012 12:47:57 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.52.179.69 with SMTP id de5ls3310163vdc.1.gmail; Wed, 13 Jun 2012 12:47:56 -0700 (PDT) Received: by 10.52.138.161 with SMTP id qr1mr25230746vdb.0.1339616876849; Wed, 13 Jun 2012 12:47:56 -0700 (PDT) Received: by 10.52.138.161 with SMTP id qr1mr25230745vdb.0.1339616876829; Wed, 13 Jun 2012 12:47:56 -0700 (PDT) Received: from mail-vc0-f179.google.com (mail-vc0-f179.google.com [209.85.220.179]) by gmr-mx.google.com with ESMTPS id y4si809693vds.2.2012.06.13.12.47.56 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 13 Jun 2012 12:47:56 -0700 (PDT) Received-SPF: pass (google.com: domain of veijo.vilva@gmail.com designates 209.85.220.179 as permitted sender) client-ip=209.85.220.179; Received: by vcbgb22 with SMTP id gb22so549863vcb.10 for ; Wed, 13 Jun 2012 12:47:56 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.218.141 with SMTP id hq13mr17651938vcb.8.1339616876161; Wed, 13 Jun 2012 12:47:56 -0700 (PDT) Received: by 10.52.159.193 with HTTP; Wed, 13 Jun 2012 12:47:55 -0700 (PDT) In-Reply-To: <20120601191346.GM8656@stodi.digitalkingdom.org> References: <20120601191346.GM8656@stodi.digitalkingdom.org> Date: Wed, 13 Jun 2012 22:47:55 +0300 Message-ID: Subject: Re: [lojban] Other Lojban PEG parsers? (Alan) From: Veijo Vilva To: lojban@googlegroups.com, lojban-list@lojban.org X-Original-Sender: veijo.vilva@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of veijo.vilva@gmail.com designates 209.85.220.179 as permitted sender) smtp.mail=veijo.vilva@gmail.com; dkim=pass header.i=@gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: multipart/alternative; boundary=14dae9cfc83009eb3a04c25fdca0 X-Spam-Score: -0.7 (/) X-Spam_score: -0.7 X-Spam_score_int: -6 X-Spam_bar: / --14dae9cfc83009eb3a04c25fdca0 Content-Type: text/plain; charset=ISO-8859-1 On 1 June 2012 22:13, Robin Lee Powell wrote: > > Besides camxes, what have people gotten the Lojban PEG running in? > I've rather intermittently been working on a Lua[1] version using the LPeg library[2], originally out of plain curiosity as this very light-weight combination (Lua compiler, byte code interpreter, VM and basic libraries total about 160 kb and the LPeg library is about 39 kb) allows running a re-notated PEG as a normal Lua program - there is no parser generator nor a specifically written parser program. There is only one drawback - a parser can be relatively slow as the library doesn't employ Packrat methodology. This is because it was primarily designed for pattern matching even in very large, mainly linear data sets, which would choke a Packrat based parser.[3] After defining the non-terminals for the LPeg, which is a necessary step so the Lua compiler knows which operators to overload, the LPeg notation is a rather simple transformation of the original PEG code. Here is an example: final_syllable = onset * -y * -stressed * nucleus * -cmene * #post_word, stressed_syllable = #stressed * syllable + syllable * #stress, stressed_diphthong = #stressed * diphthong + diphthong * #stress, stressed_vowel = #stressed * vowel + vowel * #stress, unstressed_syllable = -stressed * syllable * -stress + consonantal_syllable, unstressed_diphthong = -stressed * diphthong * -stress, unstressed_vowel = -stressed * vowel * -stress, stress = consonant^0 * y^-1 * syllable * pause, stressed = onset * comma^0 * S"AEIOU", In order to handle recursion, these statements are put inside an associative array definition, which then serves as the grammar. The left-hand sides are used as indices and the right-hand sides as array element values. This way the Lua interpreter doesn't need to know anything about the recursion, everything is handled behind the scenes by the LPeg library, which starts from the first element in the array and traverses it using the non-terminal names in the right-hand sides as indices to access the corresponding rules. This is quite an ingenious system utilizing the built-in meta-mechanisms of Lua. I'm just testing the morphology PEG including the classification of cmavo, and my present version seems to work quite decently unless fed lots of somewhat nasty strings like "rafytestudine". A three years old, quite average office PC handles "Alice" in 20 seconds, and the original Asus EeePC (with an 800 MHz Celeron) needs slightly less than 2 minutes, which even that is quite decent for many purposes. The morphology test sentence data set with a lot of nasty words takes 4.5 minutes on the office PC. The source text can be fed to the PEG in arbitrary slices, even the whole test sentence data set as one block. I made three small changes in the morphology PEG, two of which ought not matter in the parser context even in theory and one which might but did not change the output even from the test sentence data set. These changes resulted in an about 100% speedup, but might not matter when using a Packrat parser. 1) removed !cmene from the rule for cmavo 2) removed !gismu !fuhivla !cmavo from the rule for lujvo 3) moved !cmavo from the rule for brivla-head to the rule for fuhivla-head The PEG script is compiled for each run, but it doesn't really matter as the compilation takes only about 50 ms on the office PC. The Lua interpreter is available also during the execution of the program and can be used to run internally generated scripts, which often make things much simpler. A very advanced LuaJIT compiler[4] is also available but doesn't really help at the PEG stage. It can, however, offer a substantial speedup in other parts of the program system. I must still check the conversion and do some tidying up before moving on to the syntax PEG and the glue between the processing stages. Veijo [1] http://www.lua.org [2] http://www.inf.puc-rio.br/~roberto/lpeg/ [3] http://www.inf.puc-rio.br/~roberto/docs/peg.pdf [4] http://luajit.org -- You received this message because you are subscribed to the Google Groups "lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/lojban?hl=en. --14dae9cfc83009eb3a04c25fdca0 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
On 1 June 2012 22:13, Robin Lee Powell &= lt;rlpowel= l@digitalkingdom.org> wrote:

Besides camxes, what have people gotten the Lojban PEG running in?

I've rather intermittently been working on = a Lua[1] version using the LPeg library[2], originally out of plain curiosi= ty as this very light-weight combination (Lua compiler, byte code interpret= er, VM and basic libraries total about 160 kb and the LPeg library is about= 39 kb) allows running a re-notated PEG as a normal Lua program - there is = no parser generator nor a specifically written parser program. There is onl= y one drawback - a parser can be relatively slow as the library doesn't= employ Packrat methodology. This is because it was primarily designed for = pattern matching even in very large, mainly linear data sets, which would c= hoke a Packrat based parser.[3]

After defining the non-terminals for the LPeg, which is= a necessary step so the Lua compiler knows which operators to overload, th= e LPeg notation is a rather simple transformation of the original PEG code.= Here is an example:

final_syllable =3D onset * -y * -stressed * nucleus * -= cmene * #post_word,
stressed_syllable =3D #stressed * syllable + = syllable * #stress,
stressed_diphthong =3D #stressed * diphthong = + diphthong * #stress,
stressed_vowel =3D #stressed * vowel + vowel * #stress,
unst= ressed_syllable =3D -stressed * syllable * -stress + consonantal_syllable,<= /div>
unstressed_diphthong =A0=3D -stressed * diphthong * -stress,
unstressed_vowel =A0=3D -stressed * vowel * -stress,
stress =3D c= onsonant^0 * y^-1 * syllable * pause,
stressed =A0=3D onset * com= ma^0 * S"AEIOU",=A0=A0

In order to handl= e recursion, these statements are put inside an associative array definitio= n, which then serves as the grammar. The left-hand sides are used as indice= s and the right-hand sides as array element values. This way the Lua interp= reter doesn't need to know anything about the recursion, everything is = handled behind the scenes by the LPeg library, which starts from the first = element in the array and traverses it using the non-terminal names in the r= ight-hand sides as indices to access the corresponding rules. This is quite= an ingenious system utilizing the built-in meta-mechanisms of Lua.

I'm just testing the morphology PEG including the c= lassification of cmavo, and my present version seems to work quite decently= unless fed lots of somewhat nasty strings like "rafytestudine". = A three years old, quite average office PC handles "Alice" in 20 = seconds, and the original Asus EeePC (with an 800 MHz Celeron) needs slight= ly less than 2 minutes, which even that is quite decent for many purposes. = The morphology test sentence data set with a lot of nasty words takes 4.5 m= inutes on the office PC. The source text can be fed to the PEG in arbitrary= slices, even the whole test sentence data set as one block.

I made three small changes in the morphology PEG, two o= f which ought not matter in the parser context even in theory and one which= might but did not change the output even from the test sentence data set. = These changes resulted in an about 100% speedup, but might not matter when = using a Packrat parser.

1) removed !cmene from the rule for cmavo
2) = removed !gismu !fuhivla !cmavo from the rule for lujvo
3) moved = =A0 =A0!cmavo from the rule for brivla-head to the rule for fuhivla-head=A0=

The PEG script is compiled for each run, but it doesn&#= 39;t really matter as the compilation takes only about 50 ms on the office = PC. The Lua interpreter is available also during the execution of the progr= am and can be used to run internally generated scripts, which often make th= ings much simpler. A very advanced LuaJIT compiler[4] is also available but= doesn't really help at the PEG stage. It can, however, offer a substan= tial speedup in other parts of the program system.
=A0
I must still check the conversion and do some tidying up= before moving on to the syntax PEG and the glue between the processing sta= ges.

=A0 =A0 Veijo

<= br>

--
You received this message because you are subscribed to the Google Groups "= lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou= ps.com.
For more options, visit this group at http://groups.google.com/group/lojban= ?hl=3Den.
--14dae9cfc83009eb3a04c25fdca0--