From lojban+bncCLr6ktCfBBC_nazmBBoELj2M7Q@googlegroups.com Fri Oct 29 11:13:36 2010 Received: from mail-yw0-f61.google.com ([209.85.213.61]) by chain.digitalkingdom.org with esmtp (Exim 4.72) (envelope-from ) id 1PBtSF-0004Az-RW; Fri, 29 Oct 2010 11:13:35 -0700 Received: by ywk9 with SMTP id 9sf3652871ywk.16 for ; Fri, 29 Oct 2010 11:13:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:x-beenthere:received:received:received :received:received-spf:received:received:received:date:from:to :subject:message-id:mail-followup-to:references:mime-version :in-reply-to:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:sender:list-subscribe:list-unsubscribe:content-type :content-disposition; bh=qs3PzOmZR7at1XaA4awi6L9uq8yAazVKgAGjTI3vgvg=; b=6kdzff+7o8U1htUW+GAD7L77Q6Ua4dWcJDQzgdu3Pn7y9OiCyP8CEQaZV5qN9Y65S3 7UB5QBLosFbnY54EJJpNBDDXDDLxbT2PZas5rm20Ix2H760CiQNGr0BpLQVbZVR7vbCd z0cPJDamoRgnJZuf3YDy7IoZ/BS3rEOI9CnTY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:date:from:to:subject:message-id :mail-followup-to:references:mime-version:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type :content-disposition; b=T/8gru/uvic025Ub7ITDJ42pSAGOrZEbTTyADGxg0PHzKJAM9OayWLhj9w2sPweU1w QPgoj5QA1AgCPBN2+PK5Lfn2lLer0kg30nnOcqOtEcMrGDCxjCJJ36WFAuE2fuwewwTf r4SrjS4Gpbr7KEN5uwfVdYhpwg3RB+Ey4QK0Y= Received: by 10.90.155.19 with SMTP id c19mr357951age.51.1288375999713; Fri, 29 Oct 2010 11:13:19 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.91.159.21 with SMTP id l21ls616834ago.7.p; Fri, 29 Oct 2010 11:13:19 -0700 (PDT) Received: by 10.90.4.21 with SMTP id 21mr2487722agd.8.1288375999098; Fri, 29 Oct 2010 11:13:19 -0700 (PDT) Received: by 10.90.4.21 with SMTP id 21mr2487720agd.8.1288375999021; Fri, 29 Oct 2010 11:13:19 -0700 (PDT) Received: from mail-gw0-f50.google.com (mail-gw0-f50.google.com [74.125.83.50]) by gmr-mx.google.com with ESMTP id b10si773830yha.7.2010.10.29.11.13.18; Fri, 29 Oct 2010 11:13:18 -0700 (PDT) Received-SPF: neutral (google.com: 74.125.83.50 is neither permitted nor denied by best guess record for domain of alanpost@sunflowerriver.org) client-ip=74.125.83.50; Received: by gwb20 with SMTP id 20so1597680gwb.23 for ; Fri, 29 Oct 2010 11:13:18 -0700 (PDT) Received: by 10.150.49.13 with SMTP id w13mr19071982ybw.107.1288375997993; Fri, 29 Oct 2010 11:13:17 -0700 (PDT) Received: from sunflowerriver.org (173-10-243-253-Albuquerque.hfc.comcastbusiness.net [173.10.243.253]) by mx.google.com with ESMTPS id k2sm6982889ybj.20.2010.10.29.11.13.15 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 29 Oct 2010 11:13:16 -0700 (PDT) Date: Fri, 29 Oct 2010 12:13:12 -0600 From: ".alyn.post." To: lojban@googlegroups.com Subject: Re: [lojban] lujvo deconstruction Message-ID: <20101029181312.GG47249@alice.local> Mail-Followup-To: lojban@googlegroups.com References: <20101029170344.GB47249@alice.local> <20101029174411.GF47249@alice.local> Mime-Version: 1.0 In-Reply-To: X-Original-Sender: alyn.post@lodockikumazvati.org X-Original-Authentication-Results: gmr-mx.google.com; spf=neutral (google.com: 74.125.83.50 is neither permitted nor denied by best guess record for domain of alanpost@sunflowerriver.org) smtp.mail=alanpost@sunflowerriver.org Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline What language are you going to use/are you using? I've been teaching myself about PEG grammar and packrat parsing, and my experience so far has been quite positive. Brian Ford's Master's thesis is quite easy to read, and as a technique packrat parsing is probably easier to understand than any other parsing technique. There are a *lot* of parsing problems in the world that have half-baked solutions, with someone trying to work around having to understanding parsing. By way of an example, have a look at how syntax hightlighting works in vim: http://vim.wikia.com/wiki/Creating_your_own_syntax_files That is a stunning amount of work put into doing something the wrong way, all presumably to avoid having to actually learning about parsing. I can't imagine writing a syntax file for vim that would Do The Right Thing(tm) with Lojban. You'd be fighting the code trying to make it act like a parser. You won't regret using Lojban's formal grammar for your project. :-) -Alan On Fri, Oct 29, 2010 at 01:55:42PM -0400, Luke Bergen wrote: > well shite. I was hoping to get away with a shortcut that wouldn't require > me to learn and implement a piece of the peg grammar. I don't even know > PEG. I guess I have a good reason to now. > > On Fri, Oct 29, 2010 at 1:44 PM, .alyn.post. > <[1]alyn.post@lodockikumazvati.org> wrote: > > I think your message here contains the kernel of the solution, > namely that you can't just chop off three letters and call that a > rafsi, but you must grab three (four) letters that form a valid > rafsi or it isn't one. > > The PEG grammar for Lojban morphology: > > [2]http://www.lojban.org/tiki/tiki-index.php?page=BPFK+Section%3A+PEG+Morphology+Algorithm > > Shows what makes a valid Lujvo, and the process is more subtle than > "grab three letters and pretend they're a rafsi." But if you follow > the formal grammar, you'll get an abstract syntax tree that fully > delimits each piece of the lujvo. > > -Alan > On Fri, Oct 29, 2010 at 01:37:23PM -0400, Luke Bergen wrote: > > Actually I guess that was a bad example at the end because a lujvo > ending > > with "rat" would definitely be wrong. But you get where I'm going with > it. > > > > On Fri, Oct 29, 2010 at 1:34 PM, Luke Bergen > <[1][3]lukeabergen@gmail.com> > > wrote: > > > > Sorry, yes, I was providing very rough pseudocode for my script. I do > > look from left to right. But since rafsi are always 3 letters (minus > any > > ' characters and excluding 4 letter rafsi), I take them in chunks of > 3. > > an example with morsi would be "xamymro". My code would go like: > > grab left most three chars, check for .y'ys and grab a fourth char if > > there is a .y'y > > look up the rafsi, chop off what you found to be the "leftmost" rafsi > > and loop again with what you have left > > Now we're looking at "ymro" > > Strip off "y" and we're left with "mro". Now because I'm assuming that > > "r", "l", "m", or "n" followed by a consonant is a buffer vowel, I see > > "mro" and think "ok, the 'm' is a buffer vowel so grab another char so > > we're back to a 3 letter rafsi", I then try to grab whatever comes > after > > "o" and get a null-pointer or some such. > > It just occurred to me that I might deal with 4 letter rafsi by > keeping > > in mind that they always end with "y". So my revised "grab leftmost > > rafsi" code would look something like: > > word = xajmymro > > if (word = "....y") // where this is "word" = any 4 characters > followed > > by an "y" > > return substring(word, 0, 4) > > Then in the calling function I just have to look for gismu of the form > > rafsi+a, rafsi+e, etc... till I find one that matches a gismu. > > I'm still stuck on the buffer consonant problem though. > > It feels wrong to use guesswork like "if you see [r|l|m|n]C then check > > to see if it's a valid rafsi, if it's not, strip off the [r|l|m|n], > grab > > another char from the right, and look THAT up and see if it's a > rafsi". > > Here's a non-code way to think of the problem. How would a parser > figure > > out whether "co'amrobratroci" is "co'a mro bra troci" or "co'a m rob > rat > > ro ci"? > > On Fri, Oct 29, 2010 at 1:03 PM, .alyn.post. > > <[2][4]alyn.post@lodockikumazvati.org> wrote: > > > > On Fri, Oct 29, 2010 at 12:08:09PM -0400, Luke Bergen wrote: > > > When I first started learning lojban I wrote up a quick'n dirty > > script to > > > make looking up words faster and easier. gismu and cmavo were easy, > > but I > > > could never figure out lujvo. So I'm taking another stab at it. I > > > currently have something that works in the general cases of > > {bajdri}, > > > {ba'udri}, and {bagypau}. But currently I'm not sure how to deal > > with 4 > > > letter rafsi and non "y" buffer letters. > > > To deal with the non "y" buffer letters I thought I could just say: > > > strip all "y" from the word > > > get first three non "'" chars > > > if the first letter is "r", "l", "m", or "n" and the second letter > > is a > > > consonant, then chop off the first letter and grab another letter > > from the > > > right > > > (so if I was parsing "bacru zei bevri" = "ba'urbei" I would (after > > > handling ba'u in the first iteration) end up with "rbe" and due to > > the > > > above step, I'd strip off the "r" and grab the next letter thus > > ending > > > with "bei" which is the right result). > > > But this produces strange results because there ARE cases where > > buffer > > > letters are followed by consonants (morsi for instance). > > > Is there a way to un-ambiguously and algorithmically break a lujvo > > down > > > into its component gismu? > > > > > > > I haven't rigorously looked at this, so please excuse me if I'm way > > off base. > > > > What if you start at the left side of the word and match characters > > until you get a matching rafsi, then look for optional buffer > > characters before matching your next rafsi, &c? You could be much > > more sophisticated by adding detection for valid lerfu clustering > > to throw out what would otherwise be an ambiguous case. > > > > It sounds like you're working top down on the problem rather than > > going from left to right, but I don't know what is wrong with my > > suggestion yet. > > > > I see you've provided 3 simple examples, but can you provide an > > example for morsi which you mention at the end? > > > > -Alan > > -- > > .i ko djuno fi le do sevzi > > -- > > You received this message because you are subscribed to the Google > > Groups "lojban" group. > > To post to this group, send email to [3][5]lojban@googlegroups.com. > > To unsubscribe from this group, send email to > > [4][6]lojban+unsubscribe@googlegroups.com. > > For more options, visit this group at > > [5][7]http://groups.google.com/group/lojban?hl=en. > > > > -- > > You received this message because you are subscribed to the Google > Groups > > "lojban" group. > > To post to this group, send email to [8]lojban@googlegroups.com. > > To unsubscribe from this group, send email to > > [9]lojban+unsubscribe@googlegroups.com. > > For more options, visit this group at > > [10]http://groups.google.com/group/lojban?hl=en. > > > > References > > > > Visible links > > 1. mailto:[11]lukeabergen@gmail.com > > 2. mailto:[12]alyn.post@lodockikumazvati.org > > 3. mailto:[13]lojban@googlegroups.com > > 4. mailto:[14]lojban%2Bunsubscribe@googlegroups.com > > 5. [15]http://groups.google.com/group/lojban?hl=en > -- > .i ko djuno fi le do sevzi > > -- > You received this message because you are subscribed to the Google > Groups "lojban" group. > To post to this group, send email to [16]lojban@googlegroups.com. > To unsubscribe from this group, send email to > [17]lojban+unsubscribe@googlegroups.com. > For more options, visit this group at > [18]http://groups.google.com/group/lojban?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "lojban" group. > To post to this group, send email to lojban@googlegroups.com. > To unsubscribe from this group, send email to > lojban+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/lojban?hl=en. > > References > > Visible links > 1. mailto:alyn.post@lodockikumazvati.org > 2. http://www.lojban.org/tiki/tiki-index.php?page=BPFK+Section%3A+PEG+Morphology+Algorithm > 3. mailto:lukeabergen@gmail.com > 4. mailto:alyn.post@lodockikumazvati.org > 5. mailto:lojban@googlegroups.com > 6. mailto:lojban%2Bunsubscribe@googlegroups.com > 7. http://groups.google.com/group/lojban?hl=en > 8. mailto:lojban@googlegroups.com > 9. mailto:lojban%2Bunsubscribe@googlegroups.com > 10. http://groups.google.com/group/lojban?hl=en > 11. mailto:lukeabergen@gmail.com > 12. mailto:alyn.post@lodockikumazvati.org > 13. mailto:lojban@googlegroups.com > 14. mailto:lojban%252Bunsubscribe@googlegroups.com > 15. http://groups.google.com/group/lojban?hl=en > 16. mailto:lojban@googlegroups.com > 17. mailto:lojban%2Bunsubscribe@googlegroups.com > 18. http://groups.google.com/group/lojban?hl=en -- .i ko djuno fi le do sevzi -- You received this message because you are subscribed to the Google Groups "lojban" group. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/lojban?hl=en.