Received: from mail-vc0-f189.google.com ([209.85.220.189]:43793) by stodi.digitalkingdom.org with esmtps (TLSv1:RC4-SHA:128) (Exim 4.76) (envelope-from ) id 1SkWlK-0007fV-Q5; Fri, 29 Jun 2012 01:41:24 -0700 Received: by vcbfo14 with SMTP id fo14sf2952576vcb.16 for ; Fri, 29 Jun 2012 01:41:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:mime-version:in-reply-to:references:date :message-id:subject:from:to:x-spam-score:x-spam_score :x-spam_score_int:x-spam_bar:sender:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-google-group-id:list-post:list-help:list-archive :list-subscribe:list-unsubscribe:content-type; bh=UWpuNCzy5GeWrSSKR21jYGA8P3YVv7FMGxfOC8YYHis=; b=RI4fIiMCy/mORfT/2Arh2bvNl0C+zPj/9D9uIreLm7s8Mdx9rvhgls6+brnF9G+9QI WL53l+Eb8QwrwP6iDW9PqId55RaiGQEpuF6r+5ClAMSlWee2GbJ3IYIkBxO5FfQtRqSs 0Q0i1xNk/fmDeIUvqSQ1e1TUKJSA5wBQ2hCWY= Received: by 10.68.135.99 with SMTP id pr3mr148991pbb.5.1340959263828; Fri, 29 Jun 2012 01:41:03 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.68.191.225 with SMTP id hb1ls11199908pbc.4.gmail; Fri, 29 Jun 2012 01:41:03 -0700 (PDT) Received: by 10.68.219.170 with SMTP id pp10mr1376530pbc.1.1340959263087; Fri, 29 Jun 2012 01:41:03 -0700 (PDT) Received: by 10.68.219.170 with SMTP id pp10mr1376526pbc.1.1340959263061; Fri, 29 Jun 2012 01:41:03 -0700 (PDT) Received: from stodi.digitalkingdom.org (mail.digitalkingdom.org. [173.13.139.236]) by gmr-mx.google.com with ESMTPS id ir9si1639165pbc.1.2012.06.29.01.41.02 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 29 Jun 2012 01:41:03 -0700 (PDT) Received-SPF: pass (google.com: domain of nobody@stodi.digitalkingdom.org designates 173.13.139.236 as permitted sender) client-ip=173.13.139.236; Received: from nobody by stodi.digitalkingdom.org with local (Exim 4.76) (envelope-from ) id 1SkWlB-0007eb-Bk for lojban@googlegroups.com; Fri, 29 Jun 2012 01:41:01 -0700 Received: from mail-vc0-f181.google.com ([209.85.220.181]:56793) by stodi.digitalkingdom.org with esmtps (TLSv1:RC4-SHA:128) (Exim 4.76) (envelope-from ) id 1SkWl1-0007eB-EB for lojban-list@lojban.org; Fri, 29 Jun 2012 01:41:00 -0700 Received: by vcbf1 with SMTP id f1so2403218vcb.40 for ; Fri, 29 Jun 2012 01:40:45 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.218.141 with SMTP id hq13mr409967vcb.8.1340959245042; Fri, 29 Jun 2012 01:40:45 -0700 (PDT) Received: by 10.52.159.193 with HTTP; Fri, 29 Jun 2012 01:40:44 -0700 (PDT) In-Reply-To: <20120604053049.GR8656@stodi.digitalkingdom.org> References: <20120604053049.GR8656@stodi.digitalkingdom.org> Date: Fri, 29 Jun 2012 11:40:44 +0300 Message-ID: Subject: Re: [lojban] A tool we'll need From: Veijo Vilva To: lojban@googlegroups.com, lojban-list@lojban.org X-Spam-Score: -0.1 (/) X-Spam_score: -0.1 X-Spam_score_int: 0 X-Spam_bar: / Sender: lojban@googlegroups.com X-Original-Sender: veijo.vilva@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of nobody@stodi.digitalkingdom.org designates 173.13.139.236 as permitted sender) smtp.mail=nobody@stodi.digitalkingdom.org; dkim=pass header.i=@gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: List-Subscribe: , List-Unsubscribe: , Content-Type: multipart/alternative; boundary=14dae9cfc83075840c04c3986706 X-Spam-Score: -0.7 (/) X-Spam_score: -0.7 X-Spam_score_int: -6 X-Spam_bar: / --14dae9cfc83075840c04c3986706 Content-Type: text/plain; charset=ISO-8859-1 On 4 June 2012 08:30, Robin Lee Powell wrote: > > Sooner or later, we're going to need something that can go through > the corpus ( http://www.lojban.org/corpus/ ) and answer questions > like "show me all sentences in which the x3 of tubnu is filled", to > aid figuring out how to fix the various gismu list problems. > I've given some thought to this while working on my parser. There are problems, easy ones and increasingly difficult ones. First of all, the corpus must be "cleaned" to pass a parser. Obvious mistakes can be corrected, incomprehensible sections removed and intentional deviations from the morphology and/or the grammar (like some stuff in Alice) commented out and provided with an alternative form passing the parser. Forgetting lujvo, tanru and all the mess with connectives, things are pretty easy. Enumerating the sumti in simple sentences with no FA/SE is a trivial exercise, and I've got a piece of code to do it at the output stage of my parser. FA is slightly more involved but can probably also be done without any extra information from the syntax stage. SE is more complicated as it involves backtracking when we want to get the sumti numbered relative to the base gismu. Doing anything with lujvo requires a split form to see what needs to be done, ranging from easy (just a SE-rafsi + a single gismu) via hairy to well nigh impossible. In this context tanru are more or less like split lujvo. I haven't yet at all thought about the effect of connectives, but the added complexity probably ranges from trivial to hairy. I'd start with the easy bits as it is always better to have something reasonably soon than a promised perfection probably never. I can add sumti enumeration including the FA/SE cases (excluding lujvo and tanru) to my parser pretty soon and the simplest lujvo case (SE-rafsi + gismu) as soon as I get a lujvo splitter done. At this stage I'd draw the line here. Pretty soon means sometime in August as I'll spend July doing something completely different, like attending 60+ chamber music concerts, which doesn't, however, mean I can necessarily completely stop tinkering with ideas in my head, alas. Veijo -- web site: http://galactinus.net/vilva/ on Google+: https://plus.google.com/106533767817816079660/posts -- You received this message because you are subscribed to the Google Groups "lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/lojban?hl=en. --14dae9cfc83075840c04c3986706 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
On 4 June 2012 08:30, Robin Lee Powell &= lt;rlpowel= l@digitalkingdom.org> wrote:

Sooner or later, we're going to need something that can go through
the corpus ( ht= tp://www.lojban.org/corpus/ ) and answer questions
like "show me all sentences in which the x3 of tubnu is filled", = to
aid figuring out how to fix the various gismu list problems.

I've given some thought to this while working on = my parser. There are problems, easy ones and increasingly difficult ones.

First of all, the corpus must be "cleaned" to= pass a parser. Obvious mistakes can be corrected, incomprehensible section= s removed and intentional deviations from the morphology and/or the grammar= (like some stuff in Alice) commented out and provided with an alternative = form passing the parser.

Forgetting lujvo, tanru and all the mess with connectiv= es, things are pretty easy.
=A0
Enumerating the sumti i= n simple sentences with no FA/SE is a trivial exercise, and I've got a = piece of code to do it at the output stage of my parser.

FA is slightly more involved but can probably also be d= one without any extra information from the syntax stage.

SE is more complicated as it involves backtracking when we want to g= et the sumti numbered relative to the base gismu.

Doing anything with lujvo requires a split form to see = what needs to be done, ranging from easy (just a SE-rafsi + a single gismu)= via hairy to well nigh impossible.

In this contex= t tanru are more or less like split lujvo. =A0

I haven't yet at all thought about the effect of co= nnectives, but the added complexity probably ranges from trivial to hairy.<= /div>

I'd start with the easy bits as it is always b= etter to have something reasonably soon than a promised perfection probably= never. I can add sumti enumeration including the FA/SE cases (excluding lu= jvo and tanru) to my parser pretty soon and the simplest lujvo case (SE-raf= si + gismu) as soon as I get a lujvo splitter done. At this stage I'd d= raw the line here. Pretty soon means sometime in August as I'll spend J= uly doing something completely different,
like attending 60+ chamber music concerts, which doesn't, however,= mean I can necessarily completely stop tinkering with ideas in my head, al= as.

=A0 Veijo

--

=A0 web site: http://galactinus.net/vilva/

--
You received this message because you are subscribed to the Google Groups "= lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou= ps.com.
For more options, visit this group at http://groups.google.com/group/lojban= ?hl=3Den.
--14dae9cfc83075840c04c3986706--