From lojban+bncCMHEmaCOBhCsrL3mBBoESo6qQA@googlegroups.com Mon Nov 01 17:07:24 2010 Received: from mail-gw0-f61.google.com ([74.125.83.61]) by chain.digitalkingdom.org with esmtp (Exim 4.72) (envelope-from ) id 1PD4PJ-0000DX-Ip; Mon, 01 Nov 2010 17:07:24 -0700 Received: by gwj20 with SMTP id 20sf10297231gwj.16 for ; Mon, 01 Nov 2010 17:07:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:x-beenthere:received:received:received :received:received-spf:received:mime-version:received:received :in-reply-to:references:date:message-id:subject:from:to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type; bh=x6uRN2UDza2+TQ1LQ41QwGKJl+/ebiT/WW72hWzxg0Q=; b=G3R7B/CSatNdaIhtO+Si/a4Ai+SfSujx7BpWyINNmWAsq868aan6VQRjFNMqz4y/ui pnzP1Bh9RY//CVkIfHGo+rHDTDkg7BlYr473FQABuBBDgX2uA+UT9mp7tL255u0U2Chf LH12zvv7VQBigWVDAPuKL+E7zrCgGPYlLELaI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:mime-version:in-reply-to:references:date :message-id:subject:from:to:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe:content-type; b=KlphliblKXKmllQH3tYl8ZcxZ8IQEcIQbZDaUZs2LCrbvu/pI97pF37GTM9kQigSi6 s7VbPMIfcItYj0XpQRhx0dC4Xe3uDGkSykskdlbN/bHhHqI3h9vk882gC10cVw1mifvj t/Zy8exMAplRR4MhXN77NC9O/aO8IoVqCWLK0= Received: by 10.151.62.4 with SMTP id p4mr2416841ybk.72.1288656428895; Mon, 01 Nov 2010 17:07:08 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.150.56.35 with SMTP id e35ls771674yba.5.p; Mon, 01 Nov 2010 17:07:08 -0700 (PDT) Received: by 10.150.220.18 with SMTP id s18mr7189116ybg.26.1288656428296; Mon, 01 Nov 2010 17:07:08 -0700 (PDT) Received: by 10.150.220.18 with SMTP id s18mr7189115ybg.26.1288656428249; Mon, 01 Nov 2010 17:07:08 -0700 (PDT) Received: from mail-gw0-f50.google.com (mail-gw0-f50.google.com [74.125.83.50]) by gmr-mx.google.com with ESMTP id u10si2599961yba.6.2010.11.01.17.07.07; Mon, 01 Nov 2010 17:07:07 -0700 (PDT) Received-SPF: pass (google.com: domain of eyeonus@gmail.com designates 74.125.83.50 as permitted sender) client-ip=74.125.83.50; Received: by gwb20 with SMTP id 20so3915772gwb.37 for ; Mon, 01 Nov 2010 17:07:07 -0700 (PDT) MIME-Version: 1.0 Received: by 10.42.1.78 with SMTP id 14mr7690622icf.111.1288656425664; Mon, 01 Nov 2010 17:07:05 -0700 (PDT) Received: by 10.231.208.16 with HTTP; Mon, 1 Nov 2010 17:07:05 -0700 (PDT) In-Reply-To: References: Date: Mon, 1 Nov 2010 18:07:05 -0600 Message-ID: Subject: Re: [lojban] lujvo deconstruction From: Jonathan Jones To: lojban@googlegroups.com X-Original-Sender: eyeonus@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of eyeonus@gmail.com designates 74.125.83.50 as permitted sender) smtp.mail=eyeonus@gmail.com; dkim=pass (test mode) header.i=@gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: multipart/alternative; boundary=90e6ba18183a7d462c049406b47c --90e6ba18183a7d462c049406b47c Content-Type: text/plain; charset=ISO-8859-1 On Fri, Oct 29, 2010 at 10:08 AM, Luke Bergen wrote: > When I first started learning lojban I wrote up a quick'n dirty script to > make looking up words faster and easier. gismu and cmavo were easy, but I > could never figure out lujvo. So I'm taking another stab at it. I > currently have something that works in the general cases of {bajdri}, > {ba'udri}, and {bagypau}. But currently I'm not sure how to deal with 4 > letter rafsi and non "y" buffer letters. > > To deal with the non "y" buffer letters I thought I could just say: > > strip all "y" from the word > get first three non "'" chars > if the first letter is "r", "l", "m", or "n" and the second letter is a > consonant, then chop off the first letter and grab another letter from the > right > (so if I was parsing "bacru zei bevri" = "ba'urbei" I would (after handling > ba'u in the first iteration) end up with "rbe" and due to the above step, > I'd strip off the "r" and grab the next letter thus ending with "bei" which > is the right result). > > But this produces strange results because there ARE cases where buffer > letters are followed by consonants (morsi for instance). > > Is there a way to un-ambiguously and algorithmically break a lujvo down > into its component gismu? > The website Jvozba is a front end to a program that can both form and deconstruct lujvo and has not in my experience made any mistakes in such a task. In forming lujvo, it actually provides multiple suggestions as well as their scores. http://jwodder.freeshell.org/lojban/jvozba.cgi I beleive that this page is the source code for jvozba: http://jwodder.freeshell.org/lojban/jvozba.cgi?sehicta=1 You'll obviously want to talk to the author about this, as he would be the most able to help in this regard. -- mu'o mi'e .aionys. .i.a'o.e'e ko cmima le bende pe lo pilno be denpa bu .i doi.luk. mi patfu do zo'o (Come to the Dot Side! Luke, I am your father. :D ) -- You received this message because you are subscribed to the Google Groups "lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/lojban?hl=en. --90e6ba18183a7d462c049406b47c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
On Fri, Oct 29, 2010 at 10:08 AM, Luke Bergen <lukeabergen@gm= ail.com> wrote:
When I first started learning lo= jban I wrote up a quick'n dirty script to make looking up words faster = and easier. =A0gismu and cmavo were easy, but I could never figure out lujv= o. =A0So I'm taking another stab at it. =A0I currently have something t= hat works in the general cases of {bajdri}, {ba'udri}, and {bagypau}. = =A0But currently I'm not sure how to deal with 4 letter rafsi and non &= quot;y" buffer letters.=20

To deal with the non "y" buffer letters I thought I could ju= st say:

strip all "y" from the word
get first three non "'" chars
if the first letter is "r", "l", "m", or= "n" and the second letter is a consonant, then chop off the firs= t letter and grab another letter from the right
(so if I was parsing "bacru zei bevri" =3D "ba'urbe= i" I would (after handling ba'u in the first iteration) end up wit= h "rbe" and due to the above step, I'd strip off the "r&= quot; and grab the next letter thus ending with "bei" which is th= e right result).

But this produces strange results because there ARE cases where buffer= letters are followed by consonants (morsi for instance).

Is there a way to un-ambiguously and algorithmically break a lujvo dow= n into its component gismu?

The website Jvozba is a front end to a program that = can both form and deconstruct lujvo and has not in my experience made any m= istakes in such a task. In forming lujvo, it actually provides multiple sug= gestions as well as their scores. http://jwodder.freeshell.org/lojban/jvozba.cgi
=A0
I beleive that this page is the source code for jvozba: http://jwodder.fre= eshell.org/lojban/jvozba.cgi?sehicta=3D1
=A0
You'll obviously want to talk to the author about this, as he woul= d be the most able to help in this regard.

--
mu'o mi'e .aionys.

.i.a'o.e'e ko cmi= ma le bende pe lo pilno be denpa bu .i doi.luk. mi patfu do zo'o
(Co= me to the Dot Side! Luke, I am your father. :D )

--
You received this message because you are subscribed to the Google Groups "= lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou= ps.com.
For more options, visit this group at http://groups.google.com/group/lojban= ?hl=3Den.
--90e6ba18183a7d462c049406b47c--