Received: from mail-pb0-f61.google.com ([209.85.160.61]:40065) by stodi.digitalkingdom.org with esmtps (TLSv1:RC4-SHA:128) (Exim 4.76) (envelope-from ) id 1SkWl1-0007eG-PH; Fri, 29 Jun 2012 01:41:00 -0700 Received: by pbbro2 with SMTP id ro2sf3079577pbb.16 for ; Fri, 29 Jun 2012 01:40:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:mime-version:in-reply-to:references:date :message-id:subject:from:to:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-google-group-id:list-post:list-help:list-archive:sender :list-subscribe:list-unsubscribe:content-type; bh=UWpuNCzy5GeWrSSKR21jYGA8P3YVv7FMGxfOC8YYHis=; b=ME+Sie+FaiY4cVgDO4/9qvZeJovqsM41YvnnppfAQU3yRdLiUpKQQt+Yuxl1bVEjKl hbeiKiCWI2XbY5kNNrJ1PxCYW+i6GsVfqKKRcDqCKFKLA0oUNEFkXOCkv73QpyZ7DgiE YQFkmA2UyPW7ICcIKxW2ErqfuqM12ZasS8pZc= Received: by 10.52.94.111 with SMTP id db15mr13116vdb.11.1340959245680; Fri, 29 Jun 2012 01:40:45 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.220.157.82 with SMTP id a18ls2017462vcx.3.gmail; Fri, 29 Jun 2012 01:40:45 -0700 (PDT) Received: by 10.52.89.129 with SMTP id bo1mr1524060vdb.0.1340959245222; Fri, 29 Jun 2012 01:40:45 -0700 (PDT) Received: by 10.52.89.129 with SMTP id bo1mr1524057vdb.0.1340959245207; Fri, 29 Jun 2012 01:40:45 -0700 (PDT) Received: from mail-vb0-f52.google.com (mail-vb0-f52.google.com [209.85.212.52]) by gmr-mx.google.com with ESMTPS id y4si354545vds.2.2012.06.29.01.40.45 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 29 Jun 2012 01:40:45 -0700 (PDT) Received-SPF: pass (google.com: domain of veijo.vilva@gmail.com designates 209.85.212.52 as permitted sender) client-ip=209.85.212.52; Received: by vbzb23 with SMTP id b23so2389151vbz.11 for ; Fri, 29 Jun 2012 01:40:45 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.218.141 with SMTP id hq13mr409967vcb.8.1340959245042; Fri, 29 Jun 2012 01:40:45 -0700 (PDT) Received: by 10.52.159.193 with HTTP; Fri, 29 Jun 2012 01:40:44 -0700 (PDT) In-Reply-To: <20120604053049.GR8656@stodi.digitalkingdom.org> References: <20120604053049.GR8656@stodi.digitalkingdom.org> Date: Fri, 29 Jun 2012 11:40:44 +0300 Message-ID: Subject: Re: [lojban] A tool we'll need From: Veijo Vilva To: lojban@googlegroups.com, lojban-list@lojban.org X-Original-Sender: veijo.vilva@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of veijo.vilva@gmail.com designates 209.85.212.52 as permitted sender) smtp.mail=veijo.vilva@gmail.com; dkim=pass header.i=@gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: multipart/alternative; boundary=14dae9cfc83075840c04c3986706 X-Spam-Score: 0.0 (/) X-Spam_score: 0.0 X-Spam_score_int: 0 X-Spam_bar: / --14dae9cfc83075840c04c3986706 Content-Type: text/plain; charset=ISO-8859-1 On 4 June 2012 08:30, Robin Lee Powell wrote: > > Sooner or later, we're going to need something that can go through > the corpus ( http://www.lojban.org/corpus/ ) and answer questions > like "show me all sentences in which the x3 of tubnu is filled", to > aid figuring out how to fix the various gismu list problems. > I've given some thought to this while working on my parser. There are problems, easy ones and increasingly difficult ones. First of all, the corpus must be "cleaned" to pass a parser. Obvious mistakes can be corrected, incomprehensible sections removed and intentional deviations from the morphology and/or the grammar (like some stuff in Alice) commented out and provided with an alternative form passing the parser. Forgetting lujvo, tanru and all the mess with connectives, things are pretty easy. Enumerating the sumti in simple sentences with no FA/SE is a trivial exercise, and I've got a piece of code to do it at the output stage of my parser. FA is slightly more involved but can probably also be done without any extra information from the syntax stage. SE is more complicated as it involves backtracking when we want to get the sumti numbered relative to the base gismu. Doing anything with lujvo requires a split form to see what needs to be done, ranging from easy (just a SE-rafsi + a single gismu) via hairy to well nigh impossible. In this context tanru are more or less like split lujvo. I haven't yet at all thought about the effect of connectives, but the added complexity probably ranges from trivial to hairy. I'd start with the easy bits as it is always better to have something reasonably soon than a promised perfection probably never. I can add sumti enumeration including the FA/SE cases (excluding lujvo and tanru) to my parser pretty soon and the simplest lujvo case (SE-rafsi + gismu) as soon as I get a lujvo splitter done. At this stage I'd draw the line here. Pretty soon means sometime in August as I'll spend July doing something completely different, like attending 60+ chamber music concerts, which doesn't, however, mean I can necessarily completely stop tinkering with ideas in my head, alas. Veijo -- web site: http://galactinus.net/vilva/ on Google+: https://plus.google.com/106533767817816079660/posts -- You received this message because you are subscribed to the Google Groups "lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/lojban?hl=en. --14dae9cfc83075840c04c3986706 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
On 4 June 2012 08:30, Robin Lee Powell &= lt;rlpowel= l@digitalkingdom.org> wrote:

Sooner or later, we're going to need something that can go through
the corpus ( ht= tp://www.lojban.org/corpus/ ) and answer questions
like "show me all sentences in which the x3 of tubnu is filled", = to
aid figuring out how to fix the various gismu list problems.

I've given some thought to this while working on = my parser. There are problems, easy ones and increasingly difficult ones.

First of all, the corpus must be "cleaned" to= pass a parser. Obvious mistakes can be corrected, incomprehensible section= s removed and intentional deviations from the morphology and/or the grammar= (like some stuff in Alice) commented out and provided with an alternative = form passing the parser.

Forgetting lujvo, tanru and all the mess with connectiv= es, things are pretty easy.
=A0
Enumerating the sumti i= n simple sentences with no FA/SE is a trivial exercise, and I've got a = piece of code to do it at the output stage of my parser.

FA is slightly more involved but can probably also be d= one without any extra information from the syntax stage.

SE is more complicated as it involves backtracking when we want to g= et the sumti numbered relative to the base gismu.

Doing anything with lujvo requires a split form to see = what needs to be done, ranging from easy (just a SE-rafsi + a single gismu)= via hairy to well nigh impossible.

In this contex= t tanru are more or less like split lujvo. =A0

I haven't yet at all thought about the effect of co= nnectives, but the added complexity probably ranges from trivial to hairy.<= /div>

I'd start with the easy bits as it is always b= etter to have something reasonably soon than a promised perfection probably= never. I can add sumti enumeration including the FA/SE cases (excluding lu= jvo and tanru) to my parser pretty soon and the simplest lujvo case (SE-raf= si + gismu) as soon as I get a lujvo splitter done. At this stage I'd d= raw the line here. Pretty soon means sometime in August as I'll spend J= uly doing something completely different,
like attending 60+ chamber music concerts, which doesn't, however,= mean I can necessarily completely stop tinkering with ideas in my head, al= as.

=A0 Veijo

--

=A0 web site: http://galactinus.net/vilva/

--
You received this message because you are subscribed to the Google Groups "= lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou= ps.com.
For more options, visit this group at http://groups.google.com/group/lojban= ?hl=3Den.
--14dae9cfc83075840c04c3986706--