Received: from mail-vn0-f56.google.com ([209.85.216.56]:34388) by stodi.digitalkingdom.org with esmtps (TLSv1.2:AES128-GCM-SHA256:128) (Exim 4.80.1) (envelope-from ) id 1YfrhA-0002LM-0n; Wed, 08 Apr 2015 08:15:16 -0700 Received: by vnbf62 with SMTP id f62sf27919030vnb.1; Wed, 08 Apr 2015 08:15:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :content-type:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:sender:list-subscribe:list-unsubscribe; bh=Wg36PHNV6sBDIvn4sGNsn8RVakKek5H//4jBF/HSQF0=; b=HAUiwWVIbgjHldYlStE0Lh4RQUqOBca5LRIxGNZpgVQvlE76HOnyfHBiuM/x7l7md/ d3Lhw4uShZ2Be+S0ChzuuCgdKoKVha0iD2GwaIUVuR1ixPO8w8uvS0OOE3kpJ4K2KUfS fnaV0ydvTmQE6ZS/Rc5PxCiuNjrqCWjjjylCsW4S1aqlnJCuWVyCy7B0JqvSP9Y/fUb3 L1AkjH5lo474BCP/h+lpyWwrSKxMeqTReDlXg75dL+Ostd2yqNo/LbvwLL51CkLjnFX7 xTntZNJNaEkvNoWK+qyNXrn/Vv89P0Wrb8e2/34Y6DB42A3Z2ljvGBCo+0xiI4xZ/Igh 5alw== X-Received: by 10.140.20.40 with SMTP id 37mr236483qgi.26.1428506105776; Wed, 08 Apr 2015 08:15:05 -0700 (PDT) X-BeenThere: bpfk-list@googlegroups.com Received: by 10.140.102.23 with SMTP id v23ls784782qge.68.gmail; Wed, 08 Apr 2015 08:15:05 -0700 (PDT) X-Received: by 10.236.21.179 with SMTP id r39mr30157386yhr.35.1428506105568; Wed, 08 Apr 2015 08:15:05 -0700 (PDT) Received: from mail-qg0-x22f.google.com (mail-qg0-x22f.google.com. [2607:f8b0:400d:c04::22f]) by gmr-mx.google.com with ESMTPS id z1si1794444qcn.2.2015.04.08.08.15.05 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 08 Apr 2015 08:15:05 -0700 (PDT) Received-SPF: pass (google.com: domain of durka42@gmail.com designates 2607:f8b0:400d:c04::22f as permitted sender) client-ip=2607:f8b0:400d:c04::22f; Received: by mail-qg0-x22f.google.com with SMTP id i89so30356117qgf.1 for ; Wed, 08 Apr 2015 08:15:05 -0700 (PDT) X-Received: by 10.140.150.131 with SMTP id 125mr32158268qhw.55.1428506105441; Wed, 08 Apr 2015 08:15:05 -0700 (PDT) Received: from [192.168.1.2] (c-69-249-31-89.hsd1.pa.comcast.net. [69.249.31.89]) by mx.google.com with ESMTPSA id i185sm7659038qhc.49.2015.04.08.08.15.03 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 08 Apr 2015 08:15:04 -0700 (PDT) Date: Wed, 8 Apr 2015 11:15:03 -0400 From: Alex Burka To: bpfk-list@googlegroups.com Message-ID: <166E1503C5E24261B88E4F7B41741A53@gmail.com> In-Reply-To: References: Subject: Re: [bpfk] te sumti detection using PEG X-Mailer: sparrow 1.6.4 (build 1178) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="552545f7_431bd7b7_bb09" X-Original-Sender: durka42@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of durka42@gmail.com designates 2607:f8b0:400d:c04::22f as permitted sender) smtp.mail=durka42@gmail.com; dkim=pass header.i=@gmail.com; dmarc=pass (p=NONE dis=NONE) header.from=gmail.com Reply-To: bpfk-list@googlegroups.com Precedence: list Mailing-list: list bpfk-list@googlegroups.com; contact bpfk-list+owners@googlegroups.com List-ID: X-Google-Group-Id: 972099695765 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: -1.7 (-) X-Spam_score: -1.7 X-Spam_score_int: -16 X-Spam_bar: - --552545f7_431bd7b7_bb09 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline I'm impressed that you got it this far, but as I've said before I really do= n't see the PEG as the place for this. First of all, it's mixing two separa= te steps in the interpretation of a sentence (namely parsing and sumti plac= e resolution). And as you said, this basic functionality requires 60 new ru= les ... and it has severe limitations as a te sumti detector, since it give= s up after the first explicit FA, doesn't interact with SE/GIhE/BE/JAI, etc= . Like I said, good proof of concept, but I'd be surprised if this is the r= oute to a general te sumti detector. =20 mu'o mi'e durkavore =20 On Wednesday, April 8, 2015 at 8:50 AM, Gleki Arxokuna wrote: > Terminology: > *FAM - a term taking a FA-position with FA explicitely filled with {faxiv= eimo'eko'a} where mo'eko'a is a precise number (e.g. not {xo'e} > * ZAM - a bare term taking a FA-position with FA omitted. Positioning rul= es can restore exact value of ko'a in {faxiveimo'eko'a} > * BAM - all other terms, e.g. prefixed with BAI or PU etc. > =20 > So the issue of te sumti detection is to turn all ZAMs into FAMs in the s= yntax tree. > Can we do that using PEG? I'm not that sure because > some brivla have infinite number of places like e.g. {jutsi}. {du} is a s= pecial case since every te sumti of it can just take {faxixo'e} position. > =20 > Currently I'm unaware of any possibilities for remembering values of vari= ables (te sumti numbers) in PEG.js thus we cant increment to any given numb= er of te sumti without first hardcoding all of them in PEG itself. > =20 > However, if we limit ourselves to just 5 places and basic cases of omitti= ng FA then we can do that using PEG. > The current version of my fork of camxes.js produces these outputs: > =20 > 1. ([FAXIPA mi] [CU {prami VAU}]) =20 > 2. ([FAXIPA mi] [CU {djuno VAU}]) =20 > 3. ([FAXIPA mi] [CU {djica = } KEI] KU=C2=B9)> VAU}]) =20 > =20 > FAXIPA, FAXIRE, FAXICI are restored FA. > =20 > =20 > This is how sentence looks now in my PEG: > sentence =3D expr:( > &(terms_1ZAM CU_elidible selbri terms_1ZAM terms_1ZAM !terms_1ZAM) (terms= fa bridi_tail_t1fefi) / /* mi klama do ti*/ > &(terms_1ZAM CU_elidible selbri terms_1ZAM) (termsfa bridi_tail_t1fe) / /= * mi klama do*/ > &(terms_1ZAM terms_1ZAM CU_elidible selbri !terms_1ZAM) (termsfa termsfe = bridi_tail_t1) / /* mi do klama*/ > &(terms_1ZAM terms_1ZAM CU_elidible selbri terms_1ZAM !terms_1ZAM) (terms= fa termsfe bridi_tail_t1fi) / /* mi do klama ti*/ > &(terms_1ZAM terms_1ZAM terms_1ZAM CU_elidible selbri !terms_1ZAM) (terms= fa termsfe termsfi bridi_tail_t1) / /* mi do ti klama*/ > &(terms_1ZAM CU_elidible selbri) (termsfa bridi_tail_t1) / /* mi klama*/ > terms? bridi_tail_t1 (joik_jek bridi_tail / joik_jek stag? KE_clause free= * bridi_tail KEhE_elidible free*)*) {return _node("sentence", expr);} > =20 > =20 > Examples for each case is shown in comments. > This addition to PEG required hardcoding terms for fa,fe,fi separately an= d every case of bridi tail like "selbri x2 x3", "selbri x2", "selbri x3". T= o hardcode them one needs to also hardcode all inner rules until you meet s= elbri at bridi_tail_3 level and till tense_modal for sumti (a modification = of tense_modal where {fa} or {fe} or {fi} are hardcoded correspondingly). > =20 > This required adding 60 new lines to PEG. > Even for supporting these basic cases the work isn't done yet, optimizati= ons of those copy-pasted strings might be possible. > =20 > Anyway this is just a proof of concept. > =20 > To test the current state of alta parser say "alta: mi prami do" on La Na= xle page (http://vrici.lojban.org/~gleki/mediawiki-1.19.2/extensions/ilment= ufa/ircbot/naxle.html). =20 > =20 > -- =20 > You received this message because you are subscribed to the Google Groups= "BPFK" group. > To unsubscribe from this group and stop receiving emails from it, send an= email to bpfk-list+unsubscribe@googlegroups.com (mailto:bpfk-list+unsubscr= ibe@googlegroups.com). > To post to this group, send email to bpfk-list@googlegroups.com (mailto:b= pfk-list@googlegroups.com). > Visit this group at http://groups.google.com/group/bpfk-list. > For more options, visit https://groups.google.com/d/optout. --=20 You received this message because you are subscribed to the Google Groups "= BPFK" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to bpfk-list+unsubscribe@googlegroups.com. To post to this group, send email to bpfk-list@googlegroups.com. Visit this group at http://groups.google.com/group/bpfk-list. For more options, visit https://groups.google.com/d/optout. --552545f7_431bd7b7_bb09 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline
I'm impressed that you got it this far, but as I've sa= id before I really don't see the PEG as the place for this. First of all, i= t's mixing two separate steps in the interpretation of a sentence (namely p= arsing and sumti place resolution). And as you said, this basic functionali= ty requires 60 new rules ... and it has severe limitations as a te sumti de= tector, since it gives up after the first explicit FA, doesn't interact wit= h SE/GIhE/BE/JAI, etc. Like I said, good proof of concept, but I'd be surpr= ised if this is the route to a general te sumti detector.

mu'o mi'e durkavore
=20

On Wednesday, April 8, 2015 at= 8:50 AM, Gleki Arxokuna wrote:

Terminology:
*FAM = - a term taking a FA-position with FA explicitely filled with {faxiveimo'ek= o'a} where mo'eko'a is a precise number (e.g. not {xo'e}
* ZAM - = a bare term taking a FA-position with FA omitted. Positioning rules can res= tore exact value of ko'a in {faxiveimo'eko'a}
* BAM - all other t= erms, e.g. prefixed with BAI or PU etc.

So the= issue of te sumti detection is to turn all ZAMs into FAMs in the syntax tr= ee.
Can we do that using PEG? I'm not that sure because
some brivla have infinite number of places like e.g. {jutsi}. {du} = is a special case since every te sumti of it can just take {faxix= o'e} position.

Currently I'm unaware of any possib= ilities for remembering values of variables (te sumti numbers) in PEG.js th= us we cant increment to any given number of te sumti without first hardcodi= ng all of them in PEG itself.

However, if we limit= ourselves to just 5 places and basic cases of omitting FA then we can do t= hat using PEG.
The current version of my fork of camxes.js produc= es these outputs:

1. ([FAXIPA mi] [CU {prami <F= AXIRE do> VAU}]) 
2. ([FAXIPA mi] [CU {djuno <fi do> VAU}]= ) 
3. ([FAXIPA mi] [CU {djica <FAXIRE (=C2=B9lo [nu {<FAXIPA = (=C2=B2lo plise KU=C2=B2)> <cu (=C2=B2farlu [FAXIRE mi] [FAXICI {lo t= ricu KU}] VAU=C2=B2)>} KEI] KU=C2=B9)> VAU}]) 

FAXIPA, FAXIRE, FAXICI are restored FA.


This is how sentence looks now in my PEG:
=
sentence =3D expr:(
&(terms_1ZAM CU_elidible selbri term= s_1ZAM terms_1ZAM !terms_1ZAM) (termsfa bridi_tail_t1fefi) / /* mi klama do= ti*/
&(terms_1ZAM CU_elidible selbri terms_1ZAM) (termsfa br= idi_tail_t1fe) / /* mi klama do*/
&(terms_1ZAM terms_1ZAM CU_= elidible selbri !terms_1ZAM) (termsfa termsfe bridi_tail_t1) / /* mi do kla= ma*/
&(terms_1ZAM terms_1ZAM CU_elidible selbri terms_1ZAM !t= erms_1ZAM) (termsfa termsfe bridi_tail_t1fi) / /* mi do klama ti*/
&(terms_1ZAM terms_1ZAM terms_1ZAM CU_elidible selbri !terms_1ZAM) (t= ermsfa termsfe termsfi bridi_tail_t1) / /* mi do ti klama*/
&= (terms_1ZAM CU_elidible selbri) (termsfa bridi_tail_t1) / /* mi klama*/
terms? bridi_tail_t1 (joik_jek bridi_tail / joik_jek stag? KE_clause= free* bridi_tail KEhE_elidible free*)*) {return _node("sentence", expr);}<= /div>

Examples for each case is shown in comments.=
This addition to PEG required hardcoding terms for fa,fe,fi sepa= rately and every case of bridi tail like "selbri x2 x3", "selbri x2", "selb= ri x3". To hardcode them one needs to also hardcode all inner rules until y= ou meet selbri at bridi_tail_3 level and till tense_modal for sum= ti (a modification of tense_modal where {fa} or {fe} or {fi} are hardcoded = correspondingly).

This required adding 60 new line= s to PEG.
Even for supporting these basic cases the work isn't do= ne yet, optimizations of those copy-pasted strings might be possible.
=

Anyway this is just a proof of concept.

<= /div>
To test the current state of alta parser say "alta: mi prami do" = on La Naxle page.

--
You received this message because you are subscribed to the Google Groups "= BPFK" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to bpfk-list= +unsubscribe@googlegroups.com.
To post to this group, send email to bpfk-list@googlegroups.com.
Visit this group at ht= tp://groups.google.com/group/bpfk-list.
For more options, visit http= s://groups.google.com/d/optout.
=20 =20 =20 =20
=20

=20

--
You received this message because you are subscribed to the Google Groups &= quot;BPFK" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to bpfk-list= +unsubscribe@googlegroups.com.
To post to this group, send email to bpfk-list@googlegroups.com.
Visit this group at ht= tp://groups.google.com/group/bpfk-list.
For more options, visit http= s://groups.google.com/d/optout.
--552545f7_431bd7b7_bb09--