Message-Id: From: cowan@snark.thyrsus.com (John Cowan) Subject: Re: AI Project To: nsn@mullian.ee.mu.oz.au (Nick Nicholas) Date: Tue, 9 Mar 1993 11:53:00 -0500 (EST) Cc: lojbab@grebyn.com X-Mozilla-Status: 0011 > The problem I have now is: how do I shoehorn this project, which could go > on forever (especially with tanru) into something I can spend at most 80 > hours on (and I'd prefer 60)? We will need to decide what domains of the > language we'll have to leave out: this will need to work on a subset of > the language. And I'm sorry to get pushy at this dictionary-producing time, > but I'd like some ideas from you soon: I want to put a proposal on Prof. > Caelli's desk by the end of the week. (It's already second week of the > thirteen-week semester). Of course, I could continue work on the project > after this semester, though the net.love-of-my-life coming over for a visit > in July may slow things down :) I think that you should simply not worry about the internal semantics of tanru, or indeed anything about selbri internals except possibly a place-structure-affecting SE (essentially, one that converts the last component of the tanru at whatever level of nesting, the ter(ter(ter...tertanru. Here's a very sketchy draft of something I wrote once; it actually does stop in the middle of a sentence -- I got dragged away to do something else and never went back -- that should give some idea of what can be done. Preliminary Notes for A Lojban Canonicalizer Draft 1.0 1. Introductory Lojban is a predicate language; that is, Lojban utterances are for the most part predications. Tools exist in the computer world to process rules and facts expressed in the form of predications, and to answer queries based on those rules and facts. A well-known example is Prolog. Prolog is isomorphic to a small subset of Lojban, but relatively simple processing techniques would suffice to render a much larger set of Lojban utterances Prolog-compatible. A Lojban Canonicalizer (LC) program would manipulate Lojban utterances, previously parsed by the standard Lojban parser, to produce other Lojban utterances belonging to the Prolog-isomorphic subset. The basic techniques employed include: stripping of metalinguistics argument order standardization semantic transformations expansion of logical connectives and others to be defined (or thought of) later. The rest of this document details the techniques above. 2. Stripping of Metalinguistics This is the easiest topic. Lojban allows for a variety of methods for adding metalinguistic comments to mainstream text. There are UI indicators, SEI comments, and TO/TOI parenthetical remarks. All of these can simply be removed from the parsed text. It is forbidden for text at a lower metalinguistic level to refer to text at a higher level, so removal cannot lead to loss of information (although it may lead to loss of context). 3. Argument Order Standardization The Lojban predication, or bridi, is delivered by the parser as a predicate, or selbri. preceded and/or followed by "terms". There are four kinds of terms: arguments, or sumti; tagged sumti, where the tag either specifies which (numerical) argument of the selbri is involved or indicates a "modal" sumti outside the regular argument structure; bare tags with unspecified sumti; and negation boundaries. In addition, there can be a "prenex" which specifies the quantification of bound variable sumti. Argument order standardization will rearrange every bridi to get the sumti into a fixed order, either x1, x2, x3, ... selbri or x1, selbri, x2, x3 ... A lookup will be done against the dictionary database to determine how many sumti this selbri should have; any missing sumti will be replaced with the Lojban place-filler sumti, "zo'e". Modal sumti will be moved to the end of the bridi and placed into a canonical order (perhaps alphabetical by tag; the set of tags is potentially unbounded). A prenex will be created with appropriate default quantifications, and all negations will be moved to it. 4. Semantic Transformations Like other natural languages, Lojban possesses a "deep structure", in the sense (without prejudice to any particular linguistic theories) that some utterances with very different grammar "mean the same thing", with differences of emphasis and the like. The argument-order standardization discussed above involves applying certain transformations which affect sumti. The type discussed here, however, involves the "redundant structures" of Lojban. In pursuit of linguistic neutrality, Lojban features certain pervasive schemas of grammatical alternatives. The most pervasive by far is the afterthought vs. forethought opposition. In such structures as possessives, logical and non-logical connectives, -- John Cowan cowan@snark.thyrsus.com ...!uunet!lock60!snark!cowan e'osai ko sarji la lojban.