Received: from mail-lb0-f189.google.com ([209.85.217.189]:34383) by stodi.digitalkingdom.org with esmtps (TLSv1.2:AES128-GCM-SHA256:128) (Exim 4.80.1) (envelope-from ) id 1YfpRE-0006uy-P3; Wed, 08 Apr 2015 05:50:39 -0700 Received: by lbiv13 with SMTP id v13sf27940352lbi.1; Wed, 08 Apr 2015 05:50:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=mime-version:from:date:message-id:subject:to:content-type :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe; bh=iui6ma9+6V4pbtYIfNdxHMQSaOQoncwYZHDxLHbr/Ms=; b=fp5r4eyP0abUYUqWGVBOCzNP8yHIJBIvHfaxI7LqOyP6eXNtdxF5DqswHK9w5hnr/e Gan2tr717IlDGPwTh+Vg6mXFzOlPBPAfOg48tkYYuSFH/fxmlnmCQUDQEm0ZujUux6K4 E9B9VM9Yu3hF+6bMpRjYcK9lS079W8PuCFtfbVruLspTxfSVaTrg1EiGn0cAdsWcUx6m C75FX1UHJUrBMmbjMRUKQJTFxMIhBCNuDQY0+fzFpTWRyWHlcLFJZjA8Zkin00skkth6 WmqyPNwI+T662NKKBEWt2Y6j+kHH/vVnlR9y55EkUN/1VxjwjmmGJUnbYOmfsW1R0sDR VD2g== X-Received: by 10.152.164.230 with SMTP id yt6mr284999lab.16.1428497429428; Wed, 08 Apr 2015 05:50:29 -0700 (PDT) X-BeenThere: bpfk-list@googlegroups.com Received: by 10.152.30.37 with SMTP id p5ls181364lah.9.gmail; Wed, 08 Apr 2015 05:50:28 -0700 (PDT) X-Received: by 10.112.42.236 with SMTP id r12mr4853099lbl.2.1428497428923; Wed, 08 Apr 2015 05:50:28 -0700 (PDT) Received: from mail-wg0-x229.google.com (mail-wg0-x229.google.com. [2a00:1450:400c:c00::229]) by gmr-mx.google.com with ESMTPS id k2si308001wif.0.2015.04.08.05.50.28 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 08 Apr 2015 05:50:28 -0700 (PDT) Received-SPF: pass (google.com: domain of gleki.is.my.name@gmail.com designates 2a00:1450:400c:c00::229 as permitted sender) client-ip=2a00:1450:400c:c00::229; Received: by mail-wg0-x229.google.com with SMTP id n8so86718489wgi.0 for ; Wed, 08 Apr 2015 05:50:28 -0700 (PDT) X-Received: by 10.194.158.234 with SMTP id wx10mr50811505wjb.23.1428497428791; Wed, 08 Apr 2015 05:50:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.240.197 with HTTP; Wed, 8 Apr 2015 05:50:08 -0700 (PDT) From: Gleki Arxokuna Date: Wed, 8 Apr 2015 15:50:08 +0300 Message-ID: Subject: [bpfk] te sumti detection using PEG To: bpfk-list@googlegroups.com Content-Type: multipart/alternative; boundary=089e013c625cce4807051335f9cd X-Original-Sender: gleki.is.my.name@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of gleki.is.my.name@gmail.com designates 2a00:1450:400c:c00::229 as permitted sender) smtp.mail=gleki.is.my.name@gmail.com; dkim=pass header.i=@gmail.com; dmarc=pass (p=NONE dis=NONE) header.from=gmail.com Reply-To: bpfk-list@googlegroups.com Precedence: list Mailing-list: list bpfk-list@googlegroups.com; contact bpfk-list+owners@googlegroups.com List-ID: X-Google-Group-Id: 972099695765 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: -1.7 (-) X-Spam_score: -1.7 X-Spam_score_int: -16 X-Spam_bar: - --089e013c625cce4807051335f9cd Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Terminology: *FAM - a term taking a FA-position with FA explicitely filled with {faxiveimo'eko'a} where mo'eko'a is a precise number (e.g. not {xo'e} * ZAM - a bare term taking a FA-position with FA omitted. Positioning rules can restore exact value of ko'a in {faxiveimo'eko'a} * BAM - all other terms, e.g. prefixed with BAI or PU etc. So the issue of te sumti detection is to turn all ZAMs into FAMs in the syntax tree. Can we do that using PEG? I'm not that sure because *some brivla have infinite number of places* like e.g. {jutsi}. {du} is a special case since every te sumti of it can just take {faxixo'e} position. Currently I'm unaware of any possibilities for remembering values of variables (te sumti numbers) in PEG.js thus we cant increment to any given number of te sumti without first hardcoding all of them in PEG itself. However, if we limit ourselves to just 5 places and basic cases of omitting FA then we can do that using PEG. The current version of my fork of camxes.js produces these outputs: 1. ([FAXIPA mi] [CU {prami VAU}]) 2. ([FAXIPA mi] [CU {djuno VAU}]) 3. ([FAXIPA mi] [CU {djica } KEI] KU=C2=B9)= > VAU}]) FAXIPA, FAXIRE, FAXICI are restored FA. This is how sentence looks now in my PEG: sentence =3D expr:( &(terms_1ZAM CU_elidible selbri terms_1ZAM terms_1ZAM !terms_1ZAM) (termsfa bridi_tail_t1fefi) / /* mi klama do ti*/ &(terms_1ZAM CU_elidible selbri terms_1ZAM) (termsfa bridi_tail_t1fe) / /* mi klama do*/ &(terms_1ZAM terms_1ZAM CU_elidible selbri !terms_1ZAM) (termsfa termsfe bridi_tail_t1) / /* mi do klama*/ &(terms_1ZAM terms_1ZAM CU_elidible selbri terms_1ZAM !terms_1ZAM) (termsfa termsfe bridi_tail_t1fi) / /* mi do klama ti*/ &(terms_1ZAM terms_1ZAM terms_1ZAM CU_elidible selbri !terms_1ZAM) (termsfa termsfe termsfi bridi_tail_t1) / /* mi do ti klama*/ &(terms_1ZAM CU_elidible selbri) (termsfa bridi_tail_t1) / /* mi klama*/ terms? bridi_tail_t1 (joik_jek bridi_tail / joik_jek stag? KE_clause free* bridi_tail KEhE_elidible free*)*) {return _node("sentence", expr);} Examples for each case is shown in comments. This addition to PEG required hardcoding terms for fa,fe,fi separately and every case of bridi tail like "selbri x2 x3", "selbri x2", "selbri x3". To hardcode them one needs to also hardcode all inner rules until you meet selbri at bridi_tail_3 level and till tense_modal for sumti (a modification of tense_modal where {fa} or {fe} or {fi} are hardcoded correspondingly). This required adding 60 new lines to PEG. Even for supporting these basic cases the work isn't done yet, optimizations of those copy-pasted strings might be possible. Anyway this is just a proof of concept. To test the current state of alta parser say "alta: mi prami do" on La Naxle page . --=20 You received this message because you are subscribed to the Google Groups "= BPFK" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to bpfk-list+unsubscribe@googlegroups.com. To post to this group, send email to bpfk-list@googlegroups.com. Visit this group at http://groups.google.com/group/bpfk-list. For more options, visit https://groups.google.com/d/optout. --089e013c625cce4807051335f9cd Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Terminology:
*FAM - a term taking a FA-position with F= A explicitely filled with {faxiveimo'eko'a} where mo'eko'a = is a precise number (e.g. not {xo'e}
* ZAM - a bare term taki= ng a FA-position with FA omitted. Positioning rules can restore exact value= of ko'a in {faxiveimo'eko'a}
* BAM - all other terms= , e.g. prefixed with BAI or PU etc.

So the iss= ue of te sumti detection is to turn all ZAMs into FAMs in the syntax tree.<= /div>
Can we do that using PEG? I'm not that sure because
some brivla have infinite number of places like e.g. {jutsi}. {du} = is a special case since every te sumti of it can just=C2=A0take=C2=A0{faxix= o'e} position.

Currently I'm unaware of an= y possibilities for remembering values of variables (te sumti numbers) in P= EG.js thus we cant increment to any given number of te sumti without first = hardcoding all of them in PEG itself.

However, if = we limit ourselves to just 5 places and basic cases of omitting FA then we = can do that using PEG.
The current version of my fork of camxes.j= s produces these outputs:

1. ([FAXIPA mi] [CU {pra= mi <FAXIRE do> VAU}])=C2=A0
2. ([FAXIPA mi] [CU {djuno <fi do&g= t; VAU}])=C2=A0
3. ([FAXIPA mi] [CU {djica <FAXIRE (=C2=B9lo [nu {<= ;FAXIPA (=C2=B2lo plise KU=C2=B2)> <cu (=C2=B2farlu [FAXIRE mi] [FAXI= CI {lo tricu KU}] VAU=C2=B2)>} KEI] KU=C2=B9)> VAU}])=C2=A0

FAXIPA,=C2=A0FAXIRE,=C2=A0FAXICI are restored FA.


This is how sentence looks now in my PEG:
sentence =3D expr:(
&(terms_1ZAM CU_elidible sel= bri terms_1ZAM terms_1ZAM !terms_1ZAM) (termsfa bridi_tail_t1fefi) / /* mi = klama do ti*/
&(terms_1ZAM CU_elidible selbri terms_1ZAM) (te= rmsfa bridi_tail_t1fe) / /* mi klama do*/
&(terms_1ZAM terms_= 1ZAM CU_elidible selbri !terms_1ZAM) (termsfa termsfe bridi_tail_t1) / /* m= i do klama*/
&(terms_1ZAM terms_1ZAM CU_elidible selbri terms= _1ZAM !terms_1ZAM) (termsfa termsfe bridi_tail_t1fi) / /* mi do klama ti*/<= /div>
&(terms_1ZAM terms_1ZAM terms_1ZAM CU_elidible selbri !terms_= 1ZAM) (termsfa termsfe termsfi bridi_tail_t1) / /* mi do ti klama*/
&(terms_1ZAM CU_elidible selbri) (termsfa bridi_tail_t1) / /* mi kla= ma*/
terms? bridi_tail_t1 (joik_jek bridi_tail / joik_jek stag? K= E_clause free* bridi_tail KEhE_elidible free*)*) {return _node("senten= ce", expr);}

Examples for each case is = shown in comments.
This addition to PEG required hardcoding terms= for fa,fe,fi separately and every case of bridi tail like "selbri x2 = x3", "selbri x2", "selbri x3". To hardcode them on= e needs to also hardcode all inner rules until you meet selbri at=C2=A0brid= i_tail_3 level and till=C2=A0tense_modal for sumti (a modification of tense= _modal where {fa} or {fe} or {fi} are hardcoded correspondingly).

This required adding 60 new lines to PEG.
Even fo= r supporting these basic cases the work isn't done yet, optimizations o= f those copy-pasted strings might be possible.

Any= way this is just a proof of concept.

To test the c= urrent state of alta parser say "alta: mi prami do" on La Naxle page.

--
You received this message because you are subscribed to the Google Groups &= quot;BPFK" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to bpfk-list= +unsubscribe@googlegroups.com.
To post to this group, send email to bpfk-list@googlegroups.com.
Visit this group at ht= tp://groups.google.com/group/bpfk-list.
For more options, visit http= s://groups.google.com/d/optout.
--089e013c625cce4807051335f9cd--