Received: from mail-wg0-f60.google.com ([74.125.82.60]:32951) by stodi.digitalkingdom.org with esmtps (TLSv1.2:AES128-GCM-SHA256:128) (Exim 4.80.1) (envelope-from ) id 1Yhvuj-00060D-MA; Tue, 14 Apr 2015 01:09:51 -0700 Received: by wggz12 with SMTP id z12sf679812wgg.0; Tue, 14 Apr 2015 01:09:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:sender:list-subscribe:list-unsubscribe; bh=hgSfa3/9a1OYPVrVhQSuK9rbUZ4/qyl873K11IqVoiM=; b=meyN1/Er3KF2dJzthXjIH5teYEQoMqTodlZTYydcA3KIlWuJl82JT0Gtb1XCItYyP4 qhf73yJ4MTLLUMnn5jQ/FcCrixfQ0MbAm3L7xCnhCT14RlW4tPq+BLdIwYkDlW9hUKzO i8IpW4hFVYq16Lt8R8fJyNwBBTWu5C4sW14WbmXKOf+KuAiH2yxt7J72arTz4LQ0yQ02 Gpd8zForK5bEgy2puw+TlSTxKd8cxxq3L4TKIpvDWjTQ7Hf9I5cOEs1YSe0yCDUnm+Kx BEXIzMK3eeyZaIQ+o602okFARoIL1G2hmg+1KRwMJb4th9WfGNs/bGCr34RcB2WPVMHs rXUA== X-Received: by 10.152.37.104 with SMTP id x8mr192702laj.19.1428998978690; Tue, 14 Apr 2015 01:09:38 -0700 (PDT) X-BeenThere: bpfk-list@googlegroups.com Received: by 10.152.181.5 with SMTP id ds5ls11686lac.20.gmail; Tue, 14 Apr 2015 01:09:38 -0700 (PDT) X-Received: by 10.152.37.101 with SMTP id x5mr3043183laj.5.1428998978100; Tue, 14 Apr 2015 01:09:38 -0700 (PDT) Received: from mail-wi0-x22a.google.com (mail-wi0-x22a.google.com. [2a00:1450:400c:c05::22a]) by gmr-mx.google.com with ESMTPS id ec7si48185wib.3.2015.04.14.01.09.38 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 14 Apr 2015 01:09:38 -0700 (PDT) Received-SPF: pass (google.com: domain of gleki.is.my.name@gmail.com designates 2a00:1450:400c:c05::22a as permitted sender) client-ip=2a00:1450:400c:c05::22a; Received: by mail-wi0-x22a.google.com with SMTP id x7so82337531wia.0 for ; Tue, 14 Apr 2015 01:09:38 -0700 (PDT) X-Received: by 10.194.9.98 with SMTP id y2mr37208174wja.85.1428998977921; Tue, 14 Apr 2015 01:09:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.240.197 with HTTP; Tue, 14 Apr 2015 01:09:17 -0700 (PDT) In-Reply-To: <166E1503C5E24261B88E4F7B41741A53@gmail.com> References: <166E1503C5E24261B88E4F7B41741A53@gmail.com> From: Gleki Arxokuna Date: Tue, 14 Apr 2015 11:09:17 +0300 Message-ID: Subject: Re: [bpfk] te sumti detection using PEG To: bpfk-list@googlegroups.com Content-Type: multipart/alternative; boundary=047d7b5d8dab76a9680513aac0c7 X-Original-Sender: gleki.is.my.name@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of gleki.is.my.name@gmail.com designates 2a00:1450:400c:c05::22a as permitted sender) smtp.mail=gleki.is.my.name@gmail.com; dkim=pass header.i=@gmail.com; dmarc=pass (p=NONE dis=NONE) header.from=gmail.com Reply-To: bpfk-list@googlegroups.com Precedence: list Mailing-list: list bpfk-list@googlegroups.com; contact bpfk-list+owners@googlegroups.com List-ID: X-Google-Group-Id: 972099695765 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: -1.7 (-) X-Spam_score: -1.7 X-Spam_score_int: -16 X-Spam_bar: - --047d7b5d8dab76a9680513aac0c7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable 2015-04-08 18:15 GMT+03:00 Alex Burka : > I'm impressed that you got it this far, but as I've said before I really > don't see the PEG as the place for this. First of all, it's mixing two > separate steps in the interpretation of a sentence (namely parsing and > sumti place resolution). > I think this is the same step separated only in our minds. If it shouldn't be called "parsing" then okay. What concerns me is that one can create any arbitrary system of te sumti resolution especially when you put ZAMs after FAMs. Programmers would then say {i'asai} to such BPFK decision and implement this system on top of PEG no matter how unnatural it would be for human brain. I used only PEG to study the core of the language itself to see what system would require as few conceptually new rules as possible (copy-pasting existing rules doesn't count). I already explained my vision of some results of this analysis in FA-autorestoration thread = . And as you said, this basic functionality requires 60 new rules ... and it > has severe limitations as a te sumti detector, since it gives up after th= e > first explicit FA, doesn't interact with SE/GIhE/BE/JAI, etc. > No longer 60 rules due to optimizations and on the opposite due to supporting new features. Currently, everything excluding putting ZAMs after FAMs is supported from x1 to x5. Here are some more examples: mi do ti bai do ta gau mi fe do gau ti klama gau tu mi =3D> ([{FA mi} {FE do} {FI ti} {bai do} {FO ta} {gau mi} {fe do} {gau ti}] [CU {klama SF} { } VAU]) cusku zo coi fi mi =3D> ([FA ZOhE] [CU {cusku SF} {FE } {fi mi} VAU]) mo mi ti do tu gi'e co'e ta tu =3D> ([FA ZOhE] [CU {mo SF} {FE mi} {FI ti} {FO do} {FU tu} VAU] [gi'e {CU VAU} VAU]) Also since CLL asserts that bridi head can never be empty this is implemented: i carvi =3D> (i [FA ZOhE] [CU {carvi SF} VAU]) This ensures x1 exists. In English this carvi1 is "It" (it rains). In other languages it is zero-marked. > Like I said, good proof of concept, but I'd be surprised if this is the > route to a general te sumti detector. > PEG.js can indeed be the route to it since here PEG is coupled with javascript, however, I didn't use any javascript except output of strings and nodes as in the original camxes.js. This is the proof of concept that having a variable memory is not necessary for the language to work up to some point. For detecting more places higher than x5 it'd be desirable to get memory by storing what's needed in javascript arrays however I doubt very much the language needs more than 5 arguments in functions. If they are needed in e.g. emulation of programming languages I suggest that FAMs are used instead of ZAMs. There've been different requests to allow empty {lo ... ku} sumti. This all results in some funny long outputs from very short inputs: lo =3D> ([FA {lo KU}] [CU {COhE SF} VAU]) lo lo =3D> ([FA {lo <(=C2=B9lo [COhE SF] KU=C2=B9) (=C2=B9COhE SF=C2=B9)> KU}] [CU {CO= hE SF} VAU]) lonunoi =3D> ([FA {lo <(=C2=B9[nu { } KEI] SF=C2= =B9) (=C2=B9noi [{FA ZOhE} {CU VAU}] KUhO=C2=B9)> KU}] [CU {COhE SF} VAU]) I have no idea whether we need it. In some cases {fa zo'e} and selbri autorestoration required changing the choice order of subrules to test. I also removed one rule treating {sa} in linkargs. If the community thinks {sa} is important I will work on it. At this point my ToDo list of altatufa "parser" is empty so my work is done unless new features are requested, unnoticed bugs discovered or optimizations or prettifications of the code are envisioned. > > On Wednesday, April 8, 2015 at 8:50 AM, Gleki Arxokuna wrote: > > Terminology: > *FAM - a term taking a FA-position with FA explicitely filled with > {faxiveimo'eko'a} where mo'eko'a is a precise number (e.g. not {xo'e} > * ZAM - a bare term taking a FA-position with FA omitted. Positioning > rules can restore exact value of ko'a in {faxiveimo'eko'a} > * BAM - all other terms, e.g. prefixed with BAI or PU etc. > > So the issue of te sumti detection is to turn all ZAMs into FAMs in the > syntax tree. > Can we do that using PEG? I'm not that sure because > *some brivla have infinite number of places* like e.g. {jutsi}. {du} is a > special case since every te sumti of it can just take {faxixo'e} position= . > > Currently I'm unaware of any possibilities for remembering values of > variables (te sumti numbers) in PEG.js thus we cant increment to any give= n > number of te sumti without first hardcoding all of them in PEG itself. > > However, if we limit ourselves to just 5 places and basic cases of > omitting FA then we can do that using PEG. > The current version of my fork of camxes.js produces these outputs: > > 1. ([FAXIPA mi] [CU {prami VAU}]) > 2. ([FAXIPA mi] [CU {djuno VAU}]) > 3. ([FAXIPA mi] [CU {djica (=C2=B2farlu [FAXIRE mi] [FAXICI {lo tricu KU}] VAU=C2=B2)>} KEI] KU=C2= =B9)> VAU}]) > > FAXIPA, FAXIRE, FAXICI are restored FA. > > > This is how sentence looks now in my PEG: > sentence =3D expr:( > &(terms_1ZAM CU_elidible selbri terms_1ZAM terms_1ZAM !terms_1ZAM) > (termsfa bridi_tail_t1fefi) / /* mi klama do ti*/ > &(terms_1ZAM CU_elidible selbri terms_1ZAM) (termsfa bridi_tail_t1fe) / /= * > mi klama do*/ > &(terms_1ZAM terms_1ZAM CU_elidible selbri !terms_1ZAM) (termsfa termsfe > bridi_tail_t1) / /* mi do klama*/ > &(terms_1ZAM terms_1ZAM CU_elidible selbri terms_1ZAM !terms_1ZAM) > (termsfa termsfe bridi_tail_t1fi) / /* mi do klama ti*/ > &(terms_1ZAM terms_1ZAM terms_1ZAM CU_elidible selbri !terms_1ZAM) > (termsfa termsfe termsfi bridi_tail_t1) / /* mi do ti klama*/ > &(terms_1ZAM CU_elidible selbri) (termsfa bridi_tail_t1) / /* mi klama*/ > terms? bridi_tail_t1 (joik_jek bridi_tail / joik_jek stag? KE_clause free= * > bridi_tail KEhE_elidible free*)*) {return _node("sentence", expr);} > > Examples for each case is shown in comments. > This addition to PEG required hardcoding terms for fa,fe,fi separately an= d > every case of bridi tail like "selbri x2 x3", "selbri x2", "selbri x3". T= o > hardcode them one needs to also hardcode all inner rules until you meet > selbri at bridi_tail_3 level and till tense_modal for sumti (a modificati= on > of tense_modal where {fa} or {fe} or {fi} are hardcoded correspondingly). > > This required adding 60 new lines to PEG. > Even for supporting these basic cases the work isn't done yet, > optimizations of those copy-pasted strings might be possible. > > Anyway this is just a proof of concept. > > To test the current state of alta parser say "alta: mi prami do" on La > Naxle page > > . > > -- > You received this message because you are subscribed to the Google Groups > "BPFK" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to bpfk-list+unsubscribe@googlegroups.com. > To post to this group, send email to bpfk-list@googlegroups.com. > Visit this group at http://groups.google.com/group/bpfk-list. > For more options, visit https://groups.google.com/d/optout. > > > -- > You received this message because you are subscribed to the Google Groups > "BPFK" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to bpfk-list+unsubscribe@googlegroups.com. > To post to this group, send email to bpfk-list@googlegroups.com. > Visit this group at http://groups.google.com/group/bpfk-list. > For more options, visit https://groups.google.com/d/optout. > --=20 You received this message because you are subscribed to the Google Groups "= BPFK" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to bpfk-list+unsubscribe@googlegroups.com. To post to this group, send email to bpfk-list@googlegroups.com. Visit this group at http://groups.google.com/group/bpfk-list. For more options, visit https://groups.google.com/d/optout. --047d7b5d8dab76a9680513aac0c7 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


2015-04-08 18:15 GMT+03:00 Alex Burka <durka42@gmail.com>:
I'm impressed that you got it this far, but as I&#= 39;ve said before I really don't see the PEG as the place for this. Fir= st of all, it's mixing two separate steps in the interpretation of a se= ntence (namely parsing and sumti place resolution).
=
I think this is the same step separated only in our minds. I= f it shouldn't be called "parsing" then okay.

What con= cerns me is that one can create any arbitrary system of te sumti resolution= especially when you put ZAMs after FAMs.
Programmers would then = say {i'asai} to such BPFK decision and implement this system on top of = PEG no matter how unnatural it would be for human brain.

I used only PEG to study the core of the language itself to see what= system would require as few conceptually new rules as possible (copy-pasti= ng existing rules doesn't count).
I already explained my vision of s= ome results of this analysis in FA-autorestoration thread.

And as you said, this basic functionalit= y requires 60 new rules ... and it has severe limitations as a te sumti det= ector, since it gives up after the first explicit FA, doesn't interact = with SE/GIhE/BE/JAI, etc.

No longer 6= 0 rules due to optimizations and on the opposite due to supporting new feat= ures.
Currently, everything excluding putting ZAMs after FAMs is support= ed from x1 to x5.
Here are some more examples:
mi do ti bai do ta gau= mi fe do gau ti klama gau tu mi =3D>
([{FA mi} {FE do} {FI ti} {bai = do} {FO ta} {gau mi} {fe do} {gau ti}] [CU {klama SF} {<gau tu> <F= IhA mi>} VAU])=C2=A0

cusku zo coi fi mi =3D>
([FA ZOhE] [CU= {cusku SF} {FE <zo coi>} {fi mi} VAU])=C2=A0

mo mi ti do tu g= i'e co'e ta tu =3D>
([FA ZOhE] [CU {mo SF} {FE mi} {FI ti} {F= O do} {FU tu} VAU] [gi'e {CU <co'e SF> <FE ta> <FI t= u> VAU} VAU])=C2=A0

Also since CLL asserts that bridi head can ne= ver be empty this is implemented:
i carvi =3D>
(i [FA ZOhE] [CU {c= arvi SF} VAU])=C2=A0
This ensures x1 exists.
In English this carvi1 i= s "It" (it rains). In other languages it is zero-marked.

<= /div>
=C2=A0
Like I said, good proof of con= cept, but I'd be surprised if this is the route to a general te sumti d= etector.

PEG.js can indeed be the rou= te to it since here PEG is coupled with javascript, however, I didn't u= se any javascript except output of strings and nodes as in the original cam= xes.js.

This is the proof of concept that having a= variable memory is not necessary for the language to work up to some point= .
For detecting more places higher than x5 it'd be desirable = to get memory by storing what's needed in javascript arrays however I d= oubt very much the language needs more than 5 arguments in functions. If th= ey are needed in e.g. emulation of programming languages I suggest that FAM= s are used instead of ZAMs.

There've been diff= erent requests to allow empty {lo ... ku} sumti. This all results in some f= unny long outputs from very short inputs:

lo =3D>
([FA {lo <= ;COhE SF> KU}] [CU {COhE SF} VAU])=C2=A0
lo lo =3D>
([FA {lo &l= t;(=C2=B9lo [COhE SF] KU=C2=B9) (=C2=B9COhE SF=C2=B9)> KU}] [CU {COhE SF= } VAU])=C2=A0

lonunoi =3D>
([FA {lo <(=C2=B9[nu {<FA ZOh= E> <CU (=C2=B2COhE SF=C2=B2) VAU>} KEI] SF=C2=B9) (=C2=B9noi [{FA = ZOhE} {CU <COhE SF> VAU}] KUhO=C2=B9)> KU}] [CU {COhE SF} VAU])=C2= =A0

I have no idea whether we need it. In some cases {fa zo'e} a= nd selbri autorestoration required changing the choice order of subrules to= test.
I also removed one rule treating {sa} in linkargs. If the communi= ty thinks {sa} is important I will work on it.

At this po= int my ToDo list of altatufa "parser" is empty so my work is done= unless new features are requested, unnoticed bugs discovered or optimizati= ons or prettifications of the code are envisioned.



=20

On Wednesday, April 8, = 2015 at 8:50 AM, Gleki Arxokuna wrote:

Terminology:
*FAM - a term taking a FA-position with FA explicitely fi= lled with {faxiveimo'eko'a} where mo'eko'a is a precise num= ber (e.g. not {xo'e}
* ZAM - a bare term taking a FA-position= with FA omitted. Positioning rules can restore exact value of ko'a in = {faxiveimo'eko'a}
* BAM - all other terms, e.g. prefixed = with BAI or PU etc.

So the issue of te sumti d= etection is to turn all ZAMs into FAMs in the syntax tree.
Can we= do that using PEG? I'm not that sure because
some brivla = have infinite number of places like e.g. {jutsi}. {du} is a special cas= e since every te sumti of it can just=C2=A0take=C2=A0{faxixo'e} positio= n.

Currently I'm unaware of any possibilities = for remembering values of variables (te sumti numbers) in PEG.js thus we ca= nt increment to any given number of te sumti without first hardcoding all o= f them in PEG itself.

However, if we limit ourselv= es to just 5 places and basic cases of omitting FA then we can do that usin= g PEG.
The current version of my fork of camxes.js produces these= outputs:

1. ([FAXIPA mi] [CU {prami <FAXIRE do= > VAU}])=C2=A0
2. ([FAXIPA mi] [CU {djuno <fi do> VAU}])=C2=A0<= br>3. ([FAXIPA mi] [CU {djica <FAXIRE (=C2=B9lo [nu {<FAXIPA (=C2=B2l= o plise KU=C2=B2)> <cu (=C2=B2farlu [FAXIRE mi] [FAXICI {lo tricu KU}= ] VAU=C2=B2)>} KEI] KU=C2=B9)> VAU}])=C2=A0

= FAXIPA,=C2=A0FAXIRE,=C2=A0FAXICI are restored FA.

=
This is how sentence looks now in my PEG:
sen= tence =3D expr:(
&(terms_1ZAM CU_elidible selbri terms_1ZAM t= erms_1ZAM !terms_1ZAM) (termsfa bridi_tail_t1fefi) / /* mi klama do ti*/
&(terms_1ZAM CU_elidible selbri terms_1ZAM) (termsfa bridi_tail= _t1fe) / /* mi klama do*/
&(terms_1ZAM terms_1ZAM CU_elidible= selbri !terms_1ZAM) (termsfa termsfe bridi_tail_t1) / /* mi do klama*/
&(terms_1ZAM terms_1ZAM CU_elidible selbri terms_1ZAM !terms_1ZA= M) (termsfa termsfe bridi_tail_t1fi) / /* mi do klama ti*/
&(= terms_1ZAM terms_1ZAM terms_1ZAM CU_elidible selbri !terms_1ZAM) (termsfa t= ermsfe termsfi bridi_tail_t1) / /* mi do ti klama*/
&(terms_1= ZAM CU_elidible selbri) (termsfa bridi_tail_t1) / /* mi klama*/
t= erms? bridi_tail_t1 (joik_jek bridi_tail / joik_jek stag? KE_clause free* b= ridi_tail KEhE_elidible free*)*) {return _node("sentence", expr);= }

Examples for each case is shown in comment= s.
This addition to PEG required hardcoding terms for fa,fe,fi se= parately and every case of bridi tail like "selbri x2 x3", "= selbri x2", "selbri x3". To hardcode them one needs to also = hardcode all inner rules until you meet selbri at=C2=A0bridi_tail_3 level a= nd till=C2=A0tense_modal for sumti (a modification of tense_modal where {fa= } or {fe} or {fi} are hardcoded correspondingly).

= This required adding 60 new lines to PEG.
Even for supporting the= se basic cases the work isn't done yet, optimizations of those copy-pas= ted strings might be possible.

Anyway this is just= a proof of concept.

To test the current state of = alta parser say "alta: mi prami do" on La Naxle page.

--
You received this message because you are subscribed to the Google Groups &= quot;BPFK" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to bpfk-list+unsubscribe@googlegroups.com.
To post to this group, send email to bpfk-list@googlegroups.com.
Visit this group at http://groups.google.com/group/bpfk-list.
For more options, visit https://groups.google.com/d/optout.
=20 =20 =20 =20
=20

=20

--
You received this message because you are subscribed to the Google Groups &= quot;BPFK" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to bpfk-list+unsubscribe@googlegroups.com.
To post to this group, send email to bpfk-list@googlegroups.com.
Visit this group at http://groups.google.com/group/bpfk-list.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups &= quot;BPFK" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to bpfk-list= +unsubscribe@googlegroups.com.
To post to this group, send email to bpfk-list@googlegroups.com.
Visit this group at ht= tp://groups.google.com/group/bpfk-list.
For more options, visit http= s://groups.google.com/d/optout.
--047d7b5d8dab76a9680513aac0c7--