Received: from mail-lb0-f186.google.com ([209.85.217.186]:33068) by stodi.digitalkingdom.org with esmtps (TLSv1.2:AES128-GCM-SHA256:128) (Exim 4.80.1) (envelope-from ) id 1YfrtE-0002pl-1C; Wed, 08 Apr 2015 08:27:42 -0700 Received: by lbio15 with SMTP id o15sf29887586lbi.0; Wed, 08 Apr 2015 08:27:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:sender:list-subscribe:list-unsubscribe; bh=PYcwx4gOBsR9q1/1uQI84ZRFshtcy4XBJGcktEpnpME=; b=BKUco2xBBqDlyTPs+7zTZbK8BKJTjc1YVLPIRq00DRhee3BS7Xka6ez+rLUsqqk5G4 X75DLFVL45O/Gy30ohTxWFvIyNKZqRrgvYMjgcGztzF9tlTnLXXFuSUdHBzc37GtrJ+W cQb+AgtBheRnvZ4buxjxbAwWGkEICt0SIUXCajml4ddX8Ebbow7QjsPEUVoqDqsCYjrr OoxLOAI1/XE3EiieiCG+DV6E9RgP2uvTW1fv9odRYsBJ4bgA/pZHvZ0RR05u3Chb9pvc 3EABvJL4CcrQ6OVD79GjGEFk1kPw0O+SrGDOj1/+QILiWmJuGht5NtypyIUwvUYBWPPP KGTg== X-Received: by 10.152.37.104 with SMTP id x8mr2291laj.19.1428506852641; Wed, 08 Apr 2015 08:27:32 -0700 (PDT) X-BeenThere: bpfk-list@googlegroups.com Received: by 10.152.29.201 with SMTP id m9ls205951lah.6.gmail; Wed, 08 Apr 2015 08:27:32 -0700 (PDT) X-Received: by 10.152.3.70 with SMTP id a6mr39169laa.0.1428506852085; Wed, 08 Apr 2015 08:27:32 -0700 (PDT) Received: from mail-wi0-x236.google.com (mail-wi0-x236.google.com. [2a00:1450:400c:c05::236]) by gmr-mx.google.com with ESMTPS id sf8si643047wic.2.2015.04.08.08.27.32 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 08 Apr 2015 08:27:32 -0700 (PDT) Received-SPF: pass (google.com: domain of gleki.is.my.name@gmail.com designates 2a00:1450:400c:c05::236 as permitted sender) client-ip=2a00:1450:400c:c05::236; Received: by mail-wi0-x236.google.com with SMTP id n10so62940036wiu.1 for ; Wed, 08 Apr 2015 08:27:32 -0700 (PDT) X-Received: by 10.194.21.193 with SMTP id x1mr50505891wje.144.1428506851977; Wed, 08 Apr 2015 08:27:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.240.197 with HTTP; Wed, 8 Apr 2015 08:27:11 -0700 (PDT) In-Reply-To: <166E1503C5E24261B88E4F7B41741A53@gmail.com> References: <166E1503C5E24261B88E4F7B41741A53@gmail.com> From: Gleki Arxokuna Date: Wed, 8 Apr 2015 18:27:11 +0300 Message-ID: Subject: Re: [bpfk] te sumti detection using PEG To: bpfk-list@googlegroups.com Content-Type: multipart/alternative; boundary=047d7b5d561078b1520513382b2c X-Original-Sender: gleki.is.my.name@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of gleki.is.my.name@gmail.com designates 2a00:1450:400c:c05::236 as permitted sender) smtp.mail=gleki.is.my.name@gmail.com; dkim=pass header.i=@gmail.com; dmarc=pass (p=NONE dis=NONE) header.from=gmail.com Reply-To: bpfk-list@googlegroups.com Precedence: list Mailing-list: list bpfk-list@googlegroups.com; contact bpfk-list+owners@googlegroups.com List-ID: X-Google-Group-Id: 972099695765 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: -1.7 (-) X-Spam_score: -1.7 X-Spam_score_int: -16 X-Spam_bar: - Content-Length: 13704 --047d7b5d561078b1520513382b2c Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable mi pu cpedu lo nu sidju mi lo ka favgau lo djavaskripti tutci be tu'a lo te sumti i ku'i no da pu co'e i ja'ebo mi pu jai se bapli fai lo ka zukte si'unai i fau ro da mi tadni sa'u PEG i lo nu go'i ka'e xamgu za'u da 2015-04-08 18:15 GMT+03:00 Alex Burka : > I'm impressed that you got it this far, but as I've said before I really > don't see the PEG as the place for this. First of all, it's mixing two > separate steps in the interpretation of a sentence (namely parsing and > sumti place resolution). And as you said, this basic functionality requir= es > 60 new rules ... and it has severe limitations as a te sumti detector, > since it gives up after the first explicit FA, doesn't interact with > SE/GIhE/BE/JAI, etc. Like I said, good proof of concept, but I'd be > surprised if this is the route to a general te sumti detector. > > mu'o mi'e durkavore > > On Wednesday, April 8, 2015 at 8:50 AM, Gleki Arxokuna wrote: > > Terminology: > *FAM - a term taking a FA-position with FA explicitely filled with > {faxiveimo'eko'a} where mo'eko'a is a precise number (e.g. not {xo'e} > * ZAM - a bare term taking a FA-position with FA omitted. Positioning > rules can restore exact value of ko'a in {faxiveimo'eko'a} > * BAM - all other terms, e.g. prefixed with BAI or PU etc. > > So the issue of te sumti detection is to turn all ZAMs into FAMs in the > syntax tree. > Can we do that using PEG? I'm not that sure because > *some brivla have infinite number of places* like e.g. {jutsi}. {du} is a > special case since every te sumti of it can just take {faxixo'e} position= . > > Currently I'm unaware of any possibilities for remembering values of > variables (te sumti numbers) in PEG.js thus we cant increment to any give= n > number of te sumti without first hardcoding all of them in PEG itself. > > However, if we limit ourselves to just 5 places and basic cases of > omitting FA then we can do that using PEG. > The current version of my fork of camxes.js produces these outputs: > > 1. ([FAXIPA mi] [CU {prami VAU}]) > 2. ([FAXIPA mi] [CU {djuno VAU}]) > 3. ([FAXIPA mi] [CU {djica (=C2=B2farlu [FAXIRE mi] [FAXICI {lo tricu KU}] VAU=C2=B2)>} KEI] KU=C2= =B9)> VAU}]) > > FAXIPA, FAXIRE, FAXICI are restored FA. > > > This is how sentence looks now in my PEG: > sentence =3D expr:( > &(terms_1ZAM CU_elidible selbri terms_1ZAM terms_1ZAM !terms_1ZAM) > (termsfa bridi_tail_t1fefi) / /* mi klama do ti*/ > &(terms_1ZAM CU_elidible selbri terms_1ZAM) (termsfa bridi_tail_t1fe) / /= * > mi klama do*/ > &(terms_1ZAM terms_1ZAM CU_elidible selbri !terms_1ZAM) (termsfa termsfe > bridi_tail_t1) / /* mi do klama*/ > &(terms_1ZAM terms_1ZAM CU_elidible selbri terms_1ZAM !terms_1ZAM) > (termsfa termsfe bridi_tail_t1fi) / /* mi do klama ti*/ > &(terms_1ZAM terms_1ZAM terms_1ZAM CU_elidible selbri !terms_1ZAM) > (termsfa termsfe termsfi bridi_tail_t1) / /* mi do ti klama*/ > &(terms_1ZAM CU_elidible selbri) (termsfa bridi_tail_t1) / /* mi klama*/ > terms? bridi_tail_t1 (joik_jek bridi_tail / joik_jek stag? KE_clause free= * > bridi_tail KEhE_elidible free*)*) {return _node("sentence", expr);} > > Examples for each case is shown in comments. > This addition to PEG required hardcoding terms for fa,fe,fi separately an= d > every case of bridi tail like "selbri x2 x3", "selbri x2", "selbri x3". T= o > hardcode them one needs to also hardcode all inner rules until you meet > selbri at bridi_tail_3 level and till tense_modal for sumti (a modificati= on > of tense_modal where {fa} or {fe} or {fi} are hardcoded correspondingly). > > This required adding 60 new lines to PEG. > Even for supporting these basic cases the work isn't done yet, > optimizations of those copy-pasted strings might be possible. > > Anyway this is just a proof of concept. > > To test the current state of alta parser say "alta: mi prami do" on La > Naxle page > > . > > -- > You received this message because you are subscribed to the Google Groups > "BPFK" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to bpfk-list+unsubscribe@googlegroups.com. > To post to this group, send email to bpfk-list@googlegroups.com. > Visit this group at http://groups.google.com/group/bpfk-list. > For more options, visit https://groups.google.com/d/optout. > > > -- > You received this message because you are subscribed to the Google Groups > "BPFK" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to bpfk-list+unsubscribe@googlegroups.com. > To post to this group, send email to bpfk-list@googlegroups.com. > Visit this group at http://groups.google.com/group/bpfk-list. > For more options, visit https://groups.google.com/d/optout. > --=20 You received this message because you are subscribed to the Google Groups "= BPFK" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to bpfk-list+unsubscribe@googlegroups.com. To post to this group, send email to bpfk-list@googlegroups.com. Visit this group at http://groups.google.com/group/bpfk-list. For more options, visit https://groups.google.com/d/optout. --047d7b5d561078b1520513382b2c Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
mi pu cpedu lo nu sidju mi lo ka favgau lo djavaskripti tu= tci be tu'a lo te sumti i ku'i no da pu co'e i ja'ebo mi pu= jai se bapli fai lo ka zukte si'unai

i fau ro da mi= tadni sa'u PEG i lo nu go'i ka'e xamgu za'u da
=

2015-04-08 18:15 = GMT+03:00 Alex Burka <durka42@gmail.com>:
I'm impressed that you got it this far, but as I&#= 39;ve said before I really don't see the PEG as the place for this. Fir= st of all, it's mixing two separate steps in the interpretation of a se= ntence (namely parsing and sumti place resolution). And as you said, this b= asic functionality requires 60 new rules ... and it has severe limitations = as a te sumti detector, since it gives up after the first explicit FA, does= n't interact with SE/GIhE/BE/JAI, etc. Like I said, good proof of conce= pt, but I'd be surprised if this is the route to a general te sumti det= ector.

mu'o mi'e durkavore
=
=20

On Wednesday, April 8, 2015 at 8= :50 AM, Gleki Arxokuna wrote:

Terminology:
*FAM - a term taking a FA-position with FA explicitely fi= lled with {faxiveimo'eko'a} where mo'eko'a is a precise num= ber (e.g. not {xo'e}
* ZAM - a bare term taking a FA-position= with FA omitted. Positioning rules can restore exact value of ko'a in = {faxiveimo'eko'a}
* BAM - all other terms, e.g. prefixed = with BAI or PU etc.

So the issue of te sumti d= etection is to turn all ZAMs into FAMs in the syntax tree.
Can we= do that using PEG? I'm not that sure because
some brivla = have infinite number of places like e.g. {jutsi}. {du} is a special cas= e since every te sumti of it can just=C2=A0take=C2=A0{faxixo'e} positio= n.

Currently I'm unaware of any possibilities = for remembering values of variables (te sumti numbers) in PEG.js thus we ca= nt increment to any given number of te sumti without first hardcoding all o= f them in PEG itself.

However, if we limit ourselv= es to just 5 places and basic cases of omitting FA then we can do that usin= g PEG.
The current version of my fork of camxes.js produces these= outputs:

1. ([FAXIPA mi] [CU {prami <FAXIRE do= > VAU}])=C2=A0
2. ([FAXIPA mi] [CU {djuno <fi do> VAU}])=C2=A0<= br>3. ([FAXIPA mi] [CU {djica <FAXIRE (=C2=B9lo [nu {<FAXIPA (=C2=B2l= o plise KU=C2=B2)> <cu (=C2=B2farlu [FAXIRE mi] [FAXICI {lo tricu KU}= ] VAU=C2=B2)>} KEI] KU=C2=B9)> VAU}])=C2=A0

= FAXIPA,=C2=A0FAXIRE,=C2=A0FAXICI are restored FA.

=
This is how sentence looks now in my PEG:
sen= tence =3D expr:(
&(terms_1ZAM CU_elidible selbri terms_1ZAM t= erms_1ZAM !terms_1ZAM) (termsfa bridi_tail_t1fefi) / /* mi klama do ti*/
&(terms_1ZAM CU_elidible selbri terms_1ZAM) (termsfa bridi_tail= _t1fe) / /* mi klama do*/
&(terms_1ZAM terms_1ZAM CU_elidible= selbri !terms_1ZAM) (termsfa termsfe bridi_tail_t1) / /* mi do klama*/
&(terms_1ZAM terms_1ZAM CU_elidible selbri terms_1ZAM !terms_1ZA= M) (termsfa termsfe bridi_tail_t1fi) / /* mi do klama ti*/
&(= terms_1ZAM terms_1ZAM terms_1ZAM CU_elidible selbri !terms_1ZAM) (termsfa t= ermsfe termsfi bridi_tail_t1) / /* mi do ti klama*/
&(terms_1= ZAM CU_elidible selbri) (termsfa bridi_tail_t1) / /* mi klama*/
t= erms? bridi_tail_t1 (joik_jek bridi_tail / joik_jek stag? KE_clause free* b= ridi_tail KEhE_elidible free*)*) {return _node("sentence", expr);= }

Examples for each case is shown in comment= s.
This addition to PEG required hardcoding terms for fa,fe,fi se= parately and every case of bridi tail like "selbri x2 x3", "= selbri x2", "selbri x3". To hardcode them one needs to also = hardcode all inner rules until you meet selbri at=C2=A0bridi_tail_3 level a= nd till=C2=A0tense_modal for sumti (a modification of tense_modal where {fa= } or {fe} or {fi} are hardcoded correspondingly).

= This required adding 60 new lines to PEG.
Even for supporting the= se basic cases the work isn't done yet, optimizations of those copy-pas= ted strings might be possible.

Anyway this is just= a proof of concept.

To test the current state of = alta parser say "alta: mi prami do" on La Naxle page.

--
You received this message because you are subscribed to the Google Groups &= quot;BPFK" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to bpfk-list+unsubscribe@googlegroups.com.
To post to this group, send email to bpfk-list@googlegroups.com.
Visit this group at http://groups.google.com/group/bpfk-list.
For more options, visit https://groups.google.com/d/optout.
=20 =20 =20 =20
=20

=20

--
You received this message because you are subscribed to the Google Groups &= quot;BPFK" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to bpfk-list+unsubscribe@googlegroups.com.
To post to this group, send email to bpfk-list@googlegroups.com.
Visit this group at http://groups.google.com/group/bpfk-list.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups &= quot;BPFK" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to bpfk-list= +unsubscribe@googlegroups.com.
To post to this group, send email to bpfk-list@googlegroups.com.
Visit this group at ht= tp://groups.google.com/group/bpfk-list.
For more options, visit http= s://groups.google.com/d/optout.
--047d7b5d561078b1520513382b2c--