From egroups@solipsys.co.uk Sat Mar 10 03:43:27 2001 Return-Path: X-Sender: the_wrights@solipsys.co.uk X-Apparently-To: lojban@yahoogroups.com Received: (EGP: mail-7_0_4); 10 Mar 2001 11:43:27 -0000 Received: (qmail 2707 invoked from network); 10 Mar 2001 11:43:26 -0000 Received: from unknown (10.1.10.27) by l9.egroups.com with QMQP; 10 Mar 2001 11:43:26 -0000 Received: from unknown (HELO sulphur.cix.co.uk) (212.35.225.149) by mta2 with SMTP; 10 Mar 2001 11:43:26 -0000 Received: from s30.pool.pm3-tele-6.cix.co.uk (s30.pool.pm3-tele-6.cix.co.uk [194.153.24.150]) by sulphur.cix.co.uk (8.11.3/CIX/8.11.3) with SMTP id f2ABhM828049 for ; Sat, 10 Mar 2001 11:43:23 GMT X-Envelope-From: the_wrights@solipsys.co.uk Message-Id: <200103101143.f2ABhM828049@sulphur.cix.co.uk> Comments: Authenticated sender is To: lojban@yahoogroups.com Date: Sat, 10 Mar 2001 11:47:22 +0000 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Subject: Re: [lojban] Parsing lujvo Reply-to: c.d.wright@solipsys.co.uk Priority: normal X-mailer: Pegasus Mail for Windows (v2.31) X-eGroups-From: the_wrights@solipsys.co.uk From: egroups@solipsys.co.uk X-Yahoo-Message-Num: 5751 > Date: Fri, 9 Mar 2001 21:12:33 -0800 > From: "seidensticker" > Subject: How do you parse lujvo into the component rafsi? > > I'm working on an algorithm for breaking a lujvo into its > component parts. (My goal: given an unknown lujvo, break > it up into parts and display the definitions of each of > those parts.) I have a set of programs which, given a (grammatically correct) lojban utterance, generates a gloss of it. This includes having the bracketing of the original to see the grammatical structure, and looking up the word-for-word "translations" of the words, including breaking up lujvo and finding the definitions of each component. It was heavily critised by everyone who tried it, apparently because it's text based, doesn't have pretty colours, and runs under DOS. However, I now have versions that run under Linux, NetBSD, RiscOS and Solaris, although it's still entirely text based. I use it all the time, the main problem being that it doesn't cope at all gracefully with grammatically incorrect material, and much of what's written on this list is. Anyway, the code is yours if you want it. It's mostly C, but with some script or batch files to glue together the separate components. cdw -- \\// ze'uku ko jmive gi'e snada