From lojban+bncCJXwn4e6DRCfiqzmBBoEalTzuA@googlegroups.com Fri Oct 29 10:32:31 2010 Received: from mail-gy0-f189.google.com ([209.85.160.189]) by chain.digitalkingdom.org with esmtp (Exim 4.72) (envelope-from ) id 1PBsoV-0001jk-Jy; Fri, 29 Oct 2010 10:32:30 -0700 Received: by gyd5 with SMTP id 5sf4771856gyd.16 for ; Fri, 29 Oct 2010 10:32:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:x-beenthere:received:received:received :received:received-spf:received:received:mime-version:received :in-reply-to:references:from:date:message-id:subject:to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type; bh=JQMOpdzxDJnaiCxoCyMTD93wHihB2q35NFY6NLSULWE=; b=GqjeSH5wYUmpbRbBfESgfbxEXb85j7Dv9rtyeSzQnBhgzdIwUKoUoT7zQglGE1vkgC mKeCisQDyfdUNZ3waFwBgGVTZNvHKAYt7JrhjtAkeBAwgbvJFoW8rr7RhYu+7L/UDNZ3 gayUETRRm2XdB0Qy1R3X0hXyxz6KnlAnKb84c= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:mime-version:in-reply-to:references:from :date:message-id:subject:to:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe:content-type; b=cUw8lGynwmY+4stNwGBRR0623GBuLe2zscX8vMIv4NrWT/Tm6VqJKvbU0vNHreGhCK VaxGSdCUnCxHxqTIzibOBv52xvCIcd5hUGQrpg3iX5OFqbw/k798P8uP/erxmI2izrcp gFjZdUEYeSn5FpbDDI3i+WU+V4+Ib/4Kb1y7c= Received: by 10.91.195.14 with SMTP id x14mr364024agp.15.1288373535845; Fri, 29 Oct 2010 10:32:15 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.90.181.16 with SMTP id d16ls577421agf.3.p; Fri, 29 Oct 2010 10:32:15 -0700 (PDT) Received: by 10.90.72.4 with SMTP id u4mr370635aga.18.1288373535305; Fri, 29 Oct 2010 10:32:15 -0700 (PDT) Received: by 10.90.72.4 with SMTP id u4mr370634aga.18.1288373535290; Fri, 29 Oct 2010 10:32:15 -0700 (PDT) Received: from mail-yw0-f48.google.com (mail-yw0-f48.google.com [209.85.213.48]) by gmr-mx.google.com with ESMTP id b10si763018yha.7.2010.10.29.10.32.14; Fri, 29 Oct 2010 10:32:14 -0700 (PDT) Received-SPF: pass (google.com: domain of adamlopresto@gmail.com designates 209.85.213.48 as permitted sender) client-ip=209.85.213.48; Received: by ywp4 with SMTP id 4so2317980ywp.21 for ; Fri, 29 Oct 2010 10:32:14 -0700 (PDT) Received: by 10.239.152.211 with SMTP id w19mr219691hbb.131.1288373533882; Fri, 29 Oct 2010 10:32:13 -0700 (PDT) MIME-Version: 1.0 Received: by 10.239.191.202 with HTTP; Fri, 29 Oct 2010 10:31:53 -0700 (PDT) In-Reply-To: <20101029170344.GB47249@alice.local> References: <20101029170344.GB47249@alice.local> From: Adam Lopresto Date: Fri, 29 Oct 2010 12:31:53 -0500 Message-ID: Subject: Re: [lojban] lujvo deconstruction To: lojban@googlegroups.com X-Original-Sender: adamlopresto@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of adamlopresto@gmail.com designates 209.85.213.48 as permitted sender) smtp.mail=adamlopresto@gmail.com; dkim=pass (test mode) header.i=@gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: multipart/mixed; boundary=001485f80e7ad33b850493c4d6e7 --001485f80e7ad33b850493c4d6e7 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable When I attacked this in a perl script some time ago, I worked by creating regexps for the different classes of rafsi, and then attempting to match the entire word against more and more rafsi at a time until something fit. It has some false positives (it treats illegal lujvo and cmavo clusters as though they were normal lujvo), but it gets the job done. On Fri, Oct 29, 2010 at 12:03 PM, .alyn.post. wrote: > On Fri, Oct 29, 2010 at 12:08:09PM -0400, Luke Bergen wrote: >> =A0 =A0When I first started learning lojban I wrote up a quick'n dirty s= cript to >> =A0 =A0make looking up words faster and easier. gismu and cmavo were eas= y, but I >> =A0 =A0could never figure out lujvo. So I'm taking another stab at it. I >> =A0 =A0currently have something that works in the general cases of {bajd= ri}, >> =A0 =A0{ba'udri}, and {bagypau}. But currently I'm not sure how to deal = with 4 >> =A0 =A0letter rafsi and non "y" buffer letters. >> =A0 =A0To deal with the non "y" buffer letters I thought I could just sa= y: >> =A0 =A0strip all "y" from the word >> =A0 =A0get first three non "'" chars >> =A0 =A0if the first letter is "r", "l", "m", or "n" and the second lette= r is a >> =A0 =A0consonant, then chop off the first letter and grab another letter= from the >> =A0 =A0right >> =A0 =A0(so if I was parsing "bacru zei bevri" =3D "ba'urbei" I would (af= ter >> =A0 =A0handling ba'u in the first iteration) end up with "rbe" and due t= o the >> =A0 =A0above step, I'd strip off the "r" and grab the next letter thus e= nding >> =A0 =A0with "bei" which is the right result). >> =A0 =A0But this produces strange results because there ARE cases where b= uffer >> =A0 =A0letters are followed by consonants (morsi for instance). >> =A0 =A0Is there a way to un-ambiguously and algorithmically break a lujv= o down >> =A0 =A0into its component gismu? >> > > I haven't rigorously looked at this, so please excuse me if I'm way > off base. > > What if you start at the left side of the word and match characters > until you get a matching rafsi, then look for optional buffer > characters before matching your next rafsi, &c? =A0You could be much > more sophisticated by adding detection for valid lerfu clustering > to throw out what would otherwise be an ambiguous case. > > It sounds like you're working top down on the problem rather than > going from left to right, but I don't know what is wrong with my > suggestion yet. > > I see you've provided 3 simple examples, but can you provide an > example for morsi which you mention at the end? > > -Alan > -- > .i ko djuno fi le do sevzi > > -- > You received this message because you are subscribed to the Google Groups= "lojban" group. > To post to this group, send email to lojban@googlegroups.com. > To unsubscribe from this group, send email to lojban+unsubscribe@googlegr= oups.com. > For more options, visit this group at http://groups.google.com/group/lojb= an?hl=3Den. > > --=20 You received this message because you are subscribed to the Google Groups "= lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou= ps.com. For more options, visit this group at http://groups.google.com/group/lojban= ?hl=3Den. --001485f80e7ad33b850493c4d6e7 Content-Type: application/octet-stream; name="jvokatna.pl" Content-Disposition: attachment; filename="jvokatna.pl" Content-Transfer-Encoding: base64 X-Attachment-Id: f_gfvcbfp40 IyEvdXNyL2Jpbi9wZXJsIC1sdwoKJEMgPSBxci9bYmNkZmdqa2xtbnByc3R2eHpdLzsKJFYgPSBx ci9bYWVpb3VdLzsKCiNjb25zb25hbnQgcGFpcnMKJENDID0gcXIvKD86CiAgICBibHxicnwKICAg IGNmfGNrfGNsfGNtfGNufGNwfGNyfGN0fAogICAgZGp8ZHJ8ZHp8CiAgICBmbHxmcnwKICAgIGds fGdyfAogICAgamJ8amR8amd8am18anZ8CiAgICBrbHxrcnwKICAgIG1sfG1yfAogICAgcGx8cHJ8 CiAgICBzZnxza3xzbHxzbXxzbnxzcHxzcnxzdHwKICAgIHRjfHRyfHRzfAogICAgdmx8dnJ8eGx8 eHJ8emJ8emR8emd8em18enYKKS94OwoKI2RpcHRob25ncwokVlYgPSBxci8oPzphaXxlaXxvaXxh dSkvOwoKJHJhZnNpM3YgPSBxci8oPzokQ0MkVnwkQyRWVnwkQyRWJyRWKS87CiRyYWZzaTMgPSBx ci8oPzokcmFmc2kzdnwkQyRWJEMpLzsKJHJhZnNpNCA9IHFyLyg/OiRDJFYkQyRDfCRDQyRWJEMp LzsKJHJhZnNpNSA9IHFyLyRyYWZzaTQkVi87CgpJTlBVVDogZm9yIChAQVJHVil7CiAgICBzL2gv Jy9nOwoKICAgIGZvciBteSAkaSAoMSAuLiBsZW5ndGgoKS8zKXsKICAgICAgICBteSAkcmUgPSAi KD86KCRyYWZzaTMpW25yeV0/P3woJHJhZnNpNCl5KSIgeCAkaTsKICAgICAgICBteSBAbWF0Y2hl cyA9IC9eJHJlKCRyYWZzaTN2fCRyYWZzaTUpJC8gb3IgbmV4dDsKCiAgICAgICAgcHJpbnQgam9p biAiXG4iLCBtYXAge2RlZmluZWQoJF8pID8gJF8gOiAoKX0gQG1hdGNoZXM7CiAgICAgICAgbmV4 dCBJTlBVVDsKICAgIH0KCiAgICB3YXJuKCJVbnJlY29nbml6YWJsZSBsdWp2bzogJF8iKTsKfQo= --001485f80e7ad33b850493c4d6e7--