From lojban+bncCLr6ktCfBBD-_KvmBBoEU_XD6Q@googlegroups.com Fri Oct 29 10:04:15 2010 Received: from mail-gw0-f61.google.com ([74.125.83.61]) by chain.digitalkingdom.org with esmtp (Exim 4.72) (envelope-from ) id 1PBsN8-000067-Vv; Fri, 29 Oct 2010 10:04:15 -0700 Received: by gwj20 with SMTP id 20sf4726761gwj.16 for ; Fri, 29 Oct 2010 10:04:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:x-beenthere:received:received:received :received:received-spf:received:received:received:date:from:to :subject:message-id:mail-followup-to:references:mime-version :in-reply-to:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:sender:list-subscribe:list-unsubscribe:content-type :content-disposition; bh=pkzZ+CvlrsaAcOsdZVhAxMGMdF3tb9YYzj+BWlIvr9U=; b=U13xzhG7q7566CCc757RqeJJIyy8pXMy76JDrgQKf++lnhh9cFf+K1ZGjC2yKLzBxd Wq1L2PHBM0kart/h6I8h8v3vidY1eWux5XZlkQxKKGealJA6c3ZTILpj8wHUhqHCs+7/ IIRlKN2wm689J+g4UtpMVWwtuKq7IAFQp3nf8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:date:from:to:subject:message-id :mail-followup-to:references:mime-version:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type :content-disposition; b=pC4Yaivolpt87J0JtGVUHQUHBNV/bKGmQ2+V02/i52DTxkpi+NQXSIslzHKfQbcz/A ZtnFJ4Z7gvDb+EKTIc6IKm+DJgmagSNZUw/A+0FLcqEeLUQlvuqpRsZhc8/c3AVNBSxb G378EoqUuJs1s6leHcSmyTn0ywR7faaD+W8PE= Received: by 10.91.50.18 with SMTP id c18mr361232agk.41.1288371838846; Fri, 29 Oct 2010 10:03:58 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.91.170.4 with SMTP id x4ls606990ago.0.p; Fri, 29 Oct 2010 10:03:58 -0700 (PDT) Received: by 10.90.86.12 with SMTP id j12mr1355546agb.29.1288371838136; Fri, 29 Oct 2010 10:03:58 -0700 (PDT) Received: by 10.90.86.12 with SMTP id j12mr1355539agb.29.1288371838050; Fri, 29 Oct 2010 10:03:58 -0700 (PDT) Received: from mail-yx0-f176.google.com (mail-yx0-f176.google.com [209.85.213.176]) by gmr-mx.google.com with ESMTP id c20si869876and.5.2010.10.29.10.03.57; Fri, 29 Oct 2010 10:03:57 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.213.176 is neither permitted nor denied by best guess record for domain of alanpost@sunflowerriver.org) client-ip=209.85.213.176; Received: by mail-yx0-f176.google.com with SMTP id 22so2304692yxn.21 for ; Fri, 29 Oct 2010 10:03:57 -0700 (PDT) Received: by 10.150.197.11 with SMTP id u11mr7712051ybf.401.1288371837883; Fri, 29 Oct 2010 10:03:57 -0700 (PDT) Received: from sunflowerriver.org (173-10-243-253-Albuquerque.hfc.comcastbusiness.net [173.10.243.253]) by mx.google.com with ESMTPS id n49sm1852143yha.34.2010.10.29.10.03.46 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 29 Oct 2010 10:03:47 -0700 (PDT) Date: Fri, 29 Oct 2010 11:03:44 -0600 From: ".alyn.post." To: lojban@googlegroups.com Subject: Re: [lojban] lujvo deconstruction Message-ID: <20101029170344.GB47249@alice.local> Mail-Followup-To: lojban@googlegroups.com References: Mime-Version: 1.0 In-Reply-To: X-Original-Sender: alyn.post@lodockikumazvati.org X-Original-Authentication-Results: gmr-mx.google.com; spf=neutral (google.com: 209.85.213.176 is neither permitted nor denied by best guess record for domain of alanpost@sunflowerriver.org) smtp.mail=alanpost@sunflowerriver.org Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline On Fri, Oct 29, 2010 at 12:08:09PM -0400, Luke Bergen wrote: > When I first started learning lojban I wrote up a quick'n dirty script to > make looking up words faster and easier. gismu and cmavo were easy, but I > could never figure out lujvo. So I'm taking another stab at it. I > currently have something that works in the general cases of {bajdri}, > {ba'udri}, and {bagypau}. But currently I'm not sure how to deal with 4 > letter rafsi and non "y" buffer letters. > To deal with the non "y" buffer letters I thought I could just say: > strip all "y" from the word > get first three non "'" chars > if the first letter is "r", "l", "m", or "n" and the second letter is a > consonant, then chop off the first letter and grab another letter from the > right > (so if I was parsing "bacru zei bevri" = "ba'urbei" I would (after > handling ba'u in the first iteration) end up with "rbe" and due to the > above step, I'd strip off the "r" and grab the next letter thus ending > with "bei" which is the right result). > But this produces strange results because there ARE cases where buffer > letters are followed by consonants (morsi for instance). > Is there a way to un-ambiguously and algorithmically break a lujvo down > into its component gismu? > I haven't rigorously looked at this, so please excuse me if I'm way off base. What if you start at the left side of the word and match characters until you get a matching rafsi, then look for optional buffer characters before matching your next rafsi, &c? You could be much more sophisticated by adding detection for valid lerfu clustering to throw out what would otherwise be an ambiguous case. It sounds like you're working top down on the problem rather than going from left to right, but I don't know what is wrong with my suggestion yet. I see you've provided 3 simple examples, but can you provide an example for morsi which you mention at the end? -Alan -- .i ko djuno fi le do sevzi -- You received this message because you are subscribed to the Google Groups "lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/lojban?hl=en.