From lojban+bncCLr6ktCfBBDxj6zmBBoEWe_Nnw@googlegroups.com Fri Oct 29 10:44:34 2010 Received: from mail-yx0-f189.google.com ([209.85.213.189]) by chain.digitalkingdom.org with esmtp (Exim 4.72) (envelope-from ) id 1PBt09-0002AF-0b; Fri, 29 Oct 2010 10:44:34 -0700 Received: by yxe42 with SMTP id 42sf4772781yxe.16 for ; Fri, 29 Oct 2010 10:44:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:x-beenthere:received:received:received :received:received-spf:received:received:received:date:from:to :subject:message-id:mail-followup-to:references:mime-version :in-reply-to:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:sender:list-subscribe:list-unsubscribe:content-type :content-disposition; bh=7Y6nB3WD5AVLw4ZlpDaBSKs/0B1pcJv6GScRdyptYQM=; b=O/4J2nyU7jr/NpBN/79W5FNoeFmHxWyg9YAuRq+6Kd2itzDkzSsQK3MYU6Hh+P37o/ V7JWacnIJHVQn9SkOl7kW156v28SfUko4JKvVkyjFqEZTQaBu2o+6mhFCwmGQQHjVMxk 5NTNs4WCfq/EAtNZSpRT0noQLTuSLqhfAs/hc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:date:from:to:subject:message-id :mail-followup-to:references:mime-version:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type :content-disposition; b=o4PhDaKL+KEQRYtSVM0gMo04AxA876vmOeIh/SeRGk3SJTtPvWCRqLCY2eyeLl4ICN pdRe1oBNrgonYX6x2Kd3dDzYAG75vtoXyT5SkCofCnoQjdnI1ua8WlTRzets2SrSNhUo KoudBqoRbMHUXRHS9ScEIzTkbpg3GYd9H3BJc= Received: by 10.150.172.6 with SMTP id u6mr1882482ybe.77.1288374257445; Fri, 29 Oct 2010 10:44:17 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.100.54.26 with SMTP id c26ls1040745ana.2.p; Fri, 29 Oct 2010 10:44:16 -0700 (PDT) Received: by 10.100.122.2 with SMTP id u2mr4581611anc.11.1288374256709; Fri, 29 Oct 2010 10:44:16 -0700 (PDT) Received: by 10.100.122.2 with SMTP id u2mr4581610anc.11.1288374256696; Fri, 29 Oct 2010 10:44:16 -0700 (PDT) Received: from mail-yw0-f52.google.com (mail-yw0-f52.google.com [209.85.213.52]) by gmr-mx.google.com with ESMTP id x38si881189anx.7.2010.10.29.10.44.16; Fri, 29 Oct 2010 10:44:16 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.213.52 is neither permitted nor denied by best guess record for domain of alanpost@sunflowerriver.org) client-ip=209.85.213.52; Received: by ywf7 with SMTP id 7so2310715ywf.11 for ; Fri, 29 Oct 2010 10:44:16 -0700 (PDT) Received: by 10.91.13.18 with SMTP id q18mr4719735agi.50.1288374255576; Fri, 29 Oct 2010 10:44:15 -0700 (PDT) Received: from sunflowerriver.org (173-10-243-253-Albuquerque.hfc.comcastbusiness.net [173.10.243.253]) by mx.google.com with ESMTPS id r25sm1883896yhc.0.2010.10.29.10.44.13 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 29 Oct 2010 10:44:14 -0700 (PDT) Date: Fri, 29 Oct 2010 11:44:11 -0600 From: ".alyn.post." To: lojban@googlegroups.com Subject: Re: [lojban] lujvo deconstruction Message-ID: <20101029174411.GF47249@alice.local> Mail-Followup-To: lojban@googlegroups.com References: <20101029170344.GB47249@alice.local> Mime-Version: 1.0 In-Reply-To: X-Original-Sender: alyn.post@lodockikumazvati.org X-Original-Authentication-Results: gmr-mx.google.com; spf=neutral (google.com: 209.85.213.52 is neither permitted nor denied by best guess record for domain of alanpost@sunflowerriver.org) smtp.mail=alanpost@sunflowerriver.org Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline I think your message here contains the kernel of the solution, namely that you can't just chop off three letters and call that a rafsi, but you must grab three (four) letters that form a valid rafsi or it isn't one. The PEG grammar for Lojban morphology: http://www.lojban.org/tiki/tiki-index.php?page=BPFK+Section%3A+PEG+Morphology+Algorithm Shows what makes a valid Lujvo, and the process is more subtle than "grab three letters and pretend they're a rafsi." But if you follow the formal grammar, you'll get an abstract syntax tree that fully delimits each piece of the lujvo. -Alan On Fri, Oct 29, 2010 at 01:37:23PM -0400, Luke Bergen wrote: > Actually I guess that was a bad example at the end because a lujvo ending > with "rat" would definitely be wrong. But you get where I'm going with it. > > On Fri, Oct 29, 2010 at 1:34 PM, Luke Bergen <[1]lukeabergen@gmail.com> > wrote: > > Sorry, yes, I was providing very rough pseudocode for my script. I do > look from left to right. But since rafsi are always 3 letters (minus any > ' characters and excluding 4 letter rafsi), I take them in chunks of 3. > an example with morsi would be "xamymro". My code would go like: > grab left most three chars, check for .y'ys and grab a fourth char if > there is a .y'y > look up the rafsi, chop off what you found to be the "leftmost" rafsi > and loop again with what you have left > Now we're looking at "ymro" > Strip off "y" and we're left with "mro". Now because I'm assuming that > "r", "l", "m", or "n" followed by a consonant is a buffer vowel, I see > "mro" and think "ok, the 'm' is a buffer vowel so grab another char so > we're back to a 3 letter rafsi", I then try to grab whatever comes after > "o" and get a null-pointer or some such. > It just occurred to me that I might deal with 4 letter rafsi by keeping > in mind that they always end with "y". So my revised "grab leftmost > rafsi" code would look something like: > word = xajmymro > if (word = "....y") // where this is "word" = any 4 characters followed > by an "y" > return substring(word, 0, 4) > Then in the calling function I just have to look for gismu of the form > rafsi+a, rafsi+e, etc... till I find one that matches a gismu. > I'm still stuck on the buffer consonant problem though. > It feels wrong to use guesswork like "if you see [r|l|m|n]C then check > to see if it's a valid rafsi, if it's not, strip off the [r|l|m|n], grab > another char from the right, and look THAT up and see if it's a rafsi". > Here's a non-code way to think of the problem. How would a parser figure > out whether "co'amrobratroci" is "co'a mro bra troci" or "co'a m rob rat > ro ci"? > On Fri, Oct 29, 2010 at 1:03 PM, .alyn.post. > <[2]alyn.post@lodockikumazvati.org> wrote: > > On Fri, Oct 29, 2010 at 12:08:09PM -0400, Luke Bergen wrote: > > When I first started learning lojban I wrote up a quick'n dirty > script to > > make looking up words faster and easier. gismu and cmavo were easy, > but I > > could never figure out lujvo. So I'm taking another stab at it. I > > currently have something that works in the general cases of > {bajdri}, > > {ba'udri}, and {bagypau}. But currently I'm not sure how to deal > with 4 > > letter rafsi and non "y" buffer letters. > > To deal with the non "y" buffer letters I thought I could just say: > > strip all "y" from the word > > get first three non "'" chars > > if the first letter is "r", "l", "m", or "n" and the second letter > is a > > consonant, then chop off the first letter and grab another letter > from the > > right > > (so if I was parsing "bacru zei bevri" = "ba'urbei" I would (after > > handling ba'u in the first iteration) end up with "rbe" and due to > the > > above step, I'd strip off the "r" and grab the next letter thus > ending > > with "bei" which is the right result). > > But this produces strange results because there ARE cases where > buffer > > letters are followed by consonants (morsi for instance). > > Is there a way to un-ambiguously and algorithmically break a lujvo > down > > into its component gismu? > > > > I haven't rigorously looked at this, so please excuse me if I'm way > off base. > > What if you start at the left side of the word and match characters > until you get a matching rafsi, then look for optional buffer > characters before matching your next rafsi, &c? You could be much > more sophisticated by adding detection for valid lerfu clustering > to throw out what would otherwise be an ambiguous case. > > It sounds like you're working top down on the problem rather than > going from left to right, but I don't know what is wrong with my > suggestion yet. > > I see you've provided 3 simple examples, but can you provide an > example for morsi which you mention at the end? > > -Alan > -- > .i ko djuno fi le do sevzi > -- > You received this message because you are subscribed to the Google > Groups "lojban" group. > To post to this group, send email to [3]lojban@googlegroups.com. > To unsubscribe from this group, send email to > [4]lojban+unsubscribe@googlegroups.com. > For more options, visit this group at > [5]http://groups.google.com/group/lojban?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "lojban" group. > To post to this group, send email to lojban@googlegroups.com. > To unsubscribe from this group, send email to > lojban+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/lojban?hl=en. > > References > > Visible links > 1. mailto:lukeabergen@gmail.com > 2. mailto:alyn.post@lodockikumazvati.org > 3. mailto:lojban@googlegroups.com > 4. mailto:lojban%2Bunsubscribe@googlegroups.com > 5. http://groups.google.com/group/lojban?hl=en -- .i ko djuno fi le do sevzi -- You received this message because you are subscribed to the Google Groups "lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/lojban?hl=en.