From seidensticker@msn.com Fri Mar 09 22:19:57 2001
Return-Path: <seidensticker@msn.com>
X-Sender: seidensticker@msn.com
X-Apparently-To: lojban@yahoogroups.com
Received: (EGP: mail-7_0_4); 10 Mar 2001 06:19:57 -0000
Received: (qmail 42506 invoked from network); 10 Mar 2001 06:19:57 -0000
Received: from unknown (10.1.10.26) by l10.egroups.com with QMQP; 10 Mar 2001 06:19:57 -0000
Received: from unknown (HELO ej.egroups.com) (10.1.10.49) by mta1 with SMTP; 10 Mar 2001 06:19:56 -0000
X-eGroups-Return: seidensticker@msn.com
Received: from [10.1.10.98] by ej.egroups.com with NNFMP; 10 Mar 2001 06:19:56 -0000
Date: Sat, 10 Mar 2001 06:19:54 -0000
To: lojban@yahoogroups.com
Subject: Re: How do you parse lujvo into the component rafsi?
Message-ID: <98ch2a+10601@eGroups.com>
In-Reply-To: <002d01c0a920$b720c680$825681ce@wlink.net>
User-Agent: eGroups-EW/0.82
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Length: 1991
X-Mailer: eGroups Message Poster
X-Originating-IP: 206.129.86.130
From: seidensticker@msn.com
X-Yahoo-Message-Num: 5748

Assuming the grammar below is correct, I've made the following 
algorithm, which I assume to be equivalent.

if the remainder of the string begins CVVr or CVVn or CVV or CVCy
then chop off that token and recurse
else if the remainder begins CCV 
     then if the remainder begins Cy 
          then chop off the CCVCy and recurse
          else if the remainder begins CV<eof>
               then chop off the terminal CCVCV and end
               else chop off the CCV and recurse
     else if the remainder begins CVC
          then if the remainder begins Cy 
               then chop off the CVCCy and recurse
               else if the remainder begins CV<eof>
                    then chop off the terminal CVCCV and end
                    else chop off the CVC and recurse

Does this sound like the correct way to parse a lujvo into rafsi 
tokens?
--- In lojban@y..., "seidensticker" <seidensticker@m...> wrote:
> I'm working on an algorithm for breaking a lujvo into its component 
parts.  (My goal: given an unknown lujvo, break it up into parts and 
display the definitions of each of those parts.)  Chapter 4, section 
11 of the grammar book ("The lujvo-making algorithm") talks about 
creating lujvo, but my question is about the reverse.  Is there a 
place where this is simply described?
> 
> If there's not, let me try this: I've tried to compose a grammar 
that defines a lujvo.  Could someone critique it?
> 
> lujvo  =  InitialRafsi  TermainlRafsi
> InitialRafsi  =  Rafsi  InitialRafsi  |  <null>
> Rafsi  =  4Rafsi  |  3Rafsi
> 
> TerminalRafsi  =  CCV | CVV | CVCCV | CCVCV
> 4Rafsi  =  CVCCy | CCVCy
> 3Rafsi  =  CVV | CCV | CVVr | CVVn | CVC | CVCy
> 
> Must the parsing of the unknown lujvo begin from the right?  or 
left?  or is it unambiguous regardless?  Given a 4Rafsi of the form 
CVCCy or CCVCy, I'm assuming that there's only one gismu with those 
first 4 letters -- right?  
> 
> Any other suggestions for how to do the parsing?
> 
> Thanks.