On 4 June 2012 08:30, Robin Lee Powell &= lt;rlpowel= l@digitalkingdom.org> wrote:

Sooner or later, we're going to need something that can go through
the corpus ( ht= tp://www.lojban.org/corpus/ ) and answer questions
like "show me all sentences in which the x3 of tubnu is filled", = to
aid figuring out how to fix the various gismu list problems.

I've given some thought to this while working on = my parser. There are problems, easy ones and increasingly difficult ones.

First of all, the corpus must be "cleaned" to= pass a parser. Obvious mistakes can be corrected, incomprehensible section= s removed and intentional deviations from the morphology and/or the grammar= (like some stuff in Alice) commented out and provided with an alternative = form passing the parser.

Forgetting lujvo, tanru and all the mess with connectiv= es, things are pretty easy.

=A0

Enumerating the sumti i= n simple sentences with no FA/SE is a trivial exercise, and I've got a = piece of code to do it at the output stage of my parser.

FA is slightly more involved but can probably also be d= one without any extra information from the syntax stage.

SE is more complicated as it involves backtracking when we want to g= et the sumti numbered relative to the base gismu.

Doing anything with lujvo requires a split form to see = what needs to be done, ranging from easy (just a SE-rafsi + a single gismu)= via hairy to well nigh impossible.

In this contex= t tanru are more or less like split lujvo. =A0

I haven't yet at all thought about the effect of co= nnectives, but the added complexity probably ranges from trivial to hairy.<= /div>

I'd start with the easy bits as it is always b= etter to have something reasonably soon than a promised perfection probably= never. I can add sumti enumeration including the FA/SE cases (excluding lu= jvo and tanru) to my parser pretty soon and the simplest lujvo case (SE-raf= si + gismu) as soon as I get a lujvo splitter done. At this stage I'd d= raw the line here. Pretty soon means sometime in August as I'll spend J= uly doing something completely different,

like attending 60+ chamber music concerts, which doesn't, however,= mean I can necessarily completely stop tinkering with ideas in my head, al= as.

=A0 Veijo

--

=A0 web site: http://galactinus.net/vilva/

=A0 on Google+: https://plus.= google.com/106533767817816079660/posts

--
You received this message because you are subscribed to the Google Groups "= lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou= ps.com.
For more options, visit this group at http://groups.google.com/group/lojban= ?hl=3Den.
--14dae9cfc83075840c04c3986706--