Received-SPF: pass (google.com: domain of gleki.is.my.name@gmail.com designates 2a00:1450:400c:c00::229 as permitted sender) client-ip=2a00:1450:400c:c00::229;
MIME-Version: 1.0
From: Gleki Arxokuna <gleki.is.my.name@gmail.com>
Date: Wed, 8 Apr 2015 15:50:08 +0300
Message-ID: <CAO7bV+g3dr6p-1foeJWNHMmOnP+mYb5XVqwpFWGqjhEejfBo-Q@mail.gmail.com>
Subject: [bpfk] te sumti detection using PEG
To: bpfk-list@googlegroups.com
Content-Type: multipart/alternative; boundary=089e013c625cce4807051335f9cd
Reply-To: bpfk-list@googlegroups.com
Precedence: list
Mailing-list: list bpfk-list@googlegroups.com; contact bpfk-list+owners@googlegroups.com
Sender: bpfk-list@googlegroups.com
X-Spam_score: -1.7
X-Spam_score_int: -16
X-Spam_bar: -

--089e013c625cce4807051335f9cd
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Terminology:
*FAM - a term taking a FA-position with FA explicitely filled with
{faxiveimo'eko'a} where mo'eko'a is a precise number (e.g. not {xo'e}
* ZAM - a bare term taking a FA-position with FA omitted. Positioning rules
can restore exact value of ko'a in {faxiveimo'eko'a}
* BAM - all other terms, e.g. prefixed with BAI or PU etc.

So the issue of te sumti detection is to turn all ZAMs into FAMs in the
syntax tree.
Can we do that using PEG? I'm not that sure because
*some brivla have infinite number of places* like e.g. {jutsi}. {du} is a
special case since every te sumti of it can just take {faxixo'e} position.

Currently I'm unaware of any possibilities for remembering values of
variables (te sumti numbers) in PEG.js thus we cant increment to any given
number of te sumti without first hardcoding all of them in PEG itself.

However, if we limit ourselves to just 5 places and basic cases of omitting
FA then we can do that using PEG.
The current version of my fork of camxes.js produces these outputs:

1. ([FAXIPA mi] [CU {prami <FAXIRE do> VAU}])
2. ([FAXIPA mi] [CU {djuno <fi do> VAU}])
3. ([FAXIPA mi] [CU {djica <FAXIRE (=C2=B9lo [nu {<FAXIPA (=C2=B2lo plise K=
U=C2=B2)> <cu
(=C2=B2farlu [FAXIRE mi] [FAXICI {lo tricu KU}] VAU=C2=B2)>} KEI] KU=C2=B9)=
> VAU}])

FAXIPA, FAXIRE, FAXICI are restored FA.


This is how sentence looks now in my PEG:
sentence =3D expr:(
&(terms_1ZAM CU_elidible selbri terms_1ZAM terms_1ZAM !terms_1ZAM) (termsfa
bridi_tail_t1fefi) / /* mi klama do ti*/
&(terms_1ZAM CU_elidible selbri terms_1ZAM) (termsfa bridi_tail_t1fe) / /*
mi klama do*/
&(terms_1ZAM terms_1ZAM CU_elidible selbri !terms_1ZAM) (termsfa termsfe
bridi_tail_t1) / /* mi do klama*/
&(terms_1ZAM terms_1ZAM CU_elidible selbri terms_1ZAM !terms_1ZAM) (termsfa
termsfe bridi_tail_t1fi) / /* mi do klama ti*/
&(terms_1ZAM terms_1ZAM terms_1ZAM CU_elidible selbri !terms_1ZAM) (termsfa
termsfe termsfi bridi_tail_t1) / /* mi do ti klama*/
&(terms_1ZAM CU_elidible selbri) (termsfa bridi_tail_t1) / /* mi klama*/
terms? bridi_tail_t1 (joik_jek bridi_tail / joik_jek stag? KE_clause free*
bridi_tail KEhE_elidible free*)*) {return _node("sentence", expr);}

Examples for each case is shown in comments.
This addition to PEG required hardcoding terms for fa,fe,fi separately and
every case of bridi tail like "selbri x2 x3", "selbri x2", "selbri x3". To
hardcode them one needs to also hardcode all inner rules until you meet
selbri at bridi_tail_3 level and till tense_modal for sumti (a modification
of tense_modal where {fa} or {fe} or {fi} are hardcoded correspondingly).

This required adding 60 new lines to PEG.
Even for supporting these basic cases the work isn't done yet,
optimizations of those copy-pasted strings might be possible.

Anyway this is just a proof of concept.

To test the current state of alta parser say "alta: mi prami do" on La
Naxle page
<http://vrici.lojban.org/~gleki/mediawiki-1.19.2/extensions/ilmentufa/ircbo=
t/naxle.html>
.

--=20
You received this message because you are subscribed to the Google Groups "=
BPFK" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to bpfk-list+unsubscribe@googlegroups.com.
To post to this group, send email to bpfk-list@googlegroups.com.
Visit this group at http://groups.google.com/group/bpfk-list.
For more options, visit https://groups.google.com/d/optout.

--089e013c625cce4807051335f9cd
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Terminology:<div>*FAM - a term taking a FA-position with F=
A explicitely filled with {faxiveimo&#39;eko&#39;a} where mo&#39;eko&#39;a =
is a precise number (e.g. not {xo&#39;e}</div><div>* ZAM - a bare term taki=
ng a FA-position with FA omitted. Positioning rules can restore exact value=
 of ko&#39;a in {faxiveimo&#39;eko&#39;a}</div><div>* BAM - all other terms=
, e.g. prefixed with BAI or PU etc.<br></div><div><br></div><div>So the iss=
ue of te sumti detection is to turn all ZAMs into FAMs in the syntax tree.<=
/div><div>Can we do that using PEG? I&#39;m not that sure because</div><div=
><b>some brivla have infinite number of places</b> like e.g. {jutsi}. {du} =
is a special case since every te sumti of it can just=C2=A0take=C2=A0{faxix=
o&#39;e} position.</div><div><br></div><div>Currently I&#39;m unaware of an=
y possibilities for remembering values of variables (te sumti numbers) in P=
EG.js thus we cant increment to any given number of te sumti without first =
hardcoding all of them in PEG itself.</div><div><br></div><div>However, if =
we limit ourselves to just 5 places and basic cases of omitting FA then we =
can do that using PEG.</div><div>The current version of my fork of camxes.j=
s produces these outputs:</div><div><br></div><div>1. ([FAXIPA mi] [CU {pra=
mi &lt;FAXIRE do&gt; VAU}])=C2=A0<br>2. ([FAXIPA mi] [CU {djuno &lt;fi do&g=
t; VAU}])=C2=A0<br>3. ([FAXIPA mi] [CU {djica &lt;FAXIRE (=C2=B9lo [nu {&lt=
;FAXIPA (=C2=B2lo plise KU=C2=B2)&gt; &lt;cu (=C2=B2farlu [FAXIRE mi] [FAXI=
CI {lo tricu KU}] VAU=C2=B2)&gt;} KEI] KU=C2=B9)&gt; VAU}])=C2=A0</div><div=
><br></div><div>FAXIPA,=C2=A0FAXIRE,=C2=A0FAXICI are restored FA.</div><div=
><br></div><div><br></div><div>This is how sentence looks now in my PEG:</d=
iv><div><div>sentence =3D expr:(</div><div>&amp;(terms_1ZAM CU_elidible sel=
bri terms_1ZAM terms_1ZAM !terms_1ZAM) (termsfa bridi_tail_t1fefi) / /* mi =
klama do ti*/</div><div>&amp;(terms_1ZAM CU_elidible selbri terms_1ZAM) (te=
rmsfa bridi_tail_t1fe) / /* mi klama do*/</div><div>&amp;(terms_1ZAM terms_=
1ZAM CU_elidible selbri !terms_1ZAM) (termsfa termsfe bridi_tail_t1) / /* m=
i do klama*/</div><div>&amp;(terms_1ZAM terms_1ZAM CU_elidible selbri terms=
_1ZAM !terms_1ZAM) (termsfa termsfe bridi_tail_t1fi) / /* mi do klama ti*/<=
/div><div>&amp;(terms_1ZAM terms_1ZAM terms_1ZAM CU_elidible selbri !terms_=
1ZAM) (termsfa termsfe termsfi bridi_tail_t1) / /* mi do ti klama*/</div><d=
iv>&amp;(terms_1ZAM CU_elidible selbri) (termsfa bridi_tail_t1) / /* mi kla=
ma*/</div><div>terms? bridi_tail_t1 (joik_jek bridi_tail / joik_jek stag? K=
E_clause free* bridi_tail KEhE_elidible free*)*) {return _node(&quot;senten=
ce&quot;, expr);}</div></div><div><br></div><div>Examples for each case is =
shown in comments.</div><div>This addition to PEG required hardcoding terms=
 for fa,fe,fi separately and every case of bridi tail like &quot;selbri x2 =
x3&quot;, &quot;selbri x2&quot;, &quot;selbri x3&quot;. To hardcode them on=
e needs to also hardcode all inner rules until you meet selbri at=C2=A0brid=
i_tail_3 level and till=C2=A0tense_modal for sumti (a modification of tense=
_modal where {fa} or {fe} or {fi} are hardcoded correspondingly).</div><div=
><br></div><div>This required adding 60 new lines to PEG.</div><div>Even fo=
r supporting these basic cases the work isn&#39;t done yet, optimizations o=
f those copy-pasted strings might be possible.</div><div><br></div><div>Any=
way this is just a proof of concept.</div><div><br></div><div>To test the c=
urrent state of alta parser say &quot;alta: mi prami do&quot; on <a href=3D=
"http://vrici.lojban.org/~gleki/mediawiki-1.19.2/extensions/ilmentufa/ircbo=
t/naxle.html">La Naxle page</a>.</div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;BPFK&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:bpfk-list+unsubscribe@googlegroups.com">bpfk-list=
+unsubscribe@googlegroups.com</a>.<br />
To post to this group, send email to <a href=3D"mailto:bpfk-list@googlegrou=
ps.com">bpfk-list@googlegroups.com</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/group/bpfk-list">ht=
tp://groups.google.com/group/bpfk-list</a>.<br />
For more options, visit <a href=3D"https://groups.google.com/d/optout">http=
s://groups.google.com/d/optout</a>.<br />

--089e013c625cce4807051335f9cd--