Message-Id: <m0nW7Y7-00019xC@snark.thyrsus.com>
From: cowan@snark.thyrsus.com (John Cowan)
Subject: Re: AI Project
To: nsn@mullian.ee.mu.oz.au (Nick Nicholas)
Date: Tue, 9 Mar 1993 11:53:00 -0500 (EST)
Cc: lojbab@grebyn.com
X-Mozilla-Status: 0011

> The problem I have now is: how do I shoehorn this project, which could go
> on forever (especially with tanru) into something I can spend at most 80
> hours on (and I'd prefer 60)? We will need to decide what domains of the
> language we'll have to leave out: this will need to work on a subset of
> the language. And I'm sorry to get pushy at this dictionary-producing time,
> but I'd like some ideas from you soon: I want to put a proposal on Prof.
> Caelli's desk by the end of the week. (It's already second week of the
> thirteen-week semester). Of course, I could continue work on the project
> after this semester, though the net.love-of-my-life coming over for a visit
> in July may slow things down :)

I think that you should simply not worry about the internal semantics of
tanru, or indeed anything about selbri internals except possibly a
place-structure-affecting SE (essentially, one that converts the last
component of the tanru at whatever level of nesting, the ter(ter(ter...tertanru.
Here's a very sketchy draft of something I wrote once; it actually does
stop in the middle of a sentence -- I got dragged away to do something else
and never went back -- that should give some idea of what can be done.


Preliminary Notes for A Lojban Canonicalizer
Draft 1.0

1.  Introductory

Lojban is a predicate language; that is, Lojban utterances are for the
most part predications.  Tools exist in the computer world to process
rules and facts expressed in the form of predications, and to answer
queries based on those rules and facts.  A well-known example is Prolog.
Prolog is isomorphic to a small subset of Lojban, but relatively simple
processing techniques would suffice to render a much larger set of Lojban
utterances Prolog-compatible.

A Lojban Canonicalizer (LC) program would manipulate Lojban utterances,
previously parsed by the standard Lojban parser, to produce other Lojban
utterances belonging to the Prolog-isomorphic subset.  The basic techniques
employed include:

	stripping of metalinguistics
	argument order standardization
	semantic transformations
	expansion of logical connectives

and others to be defined (or thought of) later.  The rest of this
document details the techniques above.


2.  Stripping of Metalinguistics

This is the easiest topic.  Lojban allows for a variety of methods for
adding metalinguistic comments to mainstream text.  There are UI indicators,
SEI comments, and TO/TOI parenthetical remarks.  All of these can simply
be removed from the parsed text.  It is forbidden for text at a lower
metalinguistic level to refer to text at a higher level, so removal cannot
lead to loss of information (although it may lead to loss of context).


3.  Argument Order Standardization

The Lojban predication, or bridi, is delivered by the parser as a predicate,
or selbri. preceded and/or followed by "terms".  There are four kinds of
terms:  arguments, or sumti; tagged sumti, where the tag either specifies
which (numerical) argument of the selbri is involved or indicates a "modal"
sumti outside the regular argument structure; bare tags with unspecified
sumti; and negation boundaries.  In addition, there can be a "prenex" which
specifies the quantification of bound variable sumti.

Argument order standardization will rearrange every bridi to get the sumti
into a fixed order, either x1, x2, x3, ... selbri or x1, selbri, x2, x3 ...
A lookup will be done against the dictionary database to determine how many
sumti this selbri should have; any missing sumti will be replaced with
the Lojban place-filler sumti, "zo'e".  Modal sumti will be moved to the
end of the bridi and placed into a canonical order (perhaps alphabetical
by tag; the set of tags is potentially unbounded).  A prenex will be
created with appropriate default quantifications, and all negations will
be moved to it.


4.  Semantic Transformations

Like other natural languages, Lojban possesses a "deep structure", in the
sense (without prejudice to any particular linguistic theories) that some
utterances with very different grammar "mean the same thing", with differences
of emphasis and the like.  The argument-order standardization discussed
above involves applying certain transformations which affect sumti.  The
type discussed here, however, involves the "redundant structures" of Lojban.

In pursuit of linguistic neutrality, Lojban features certain pervasive
schemas of grammatical alternatives.  The most pervasive by far is the
afterthought vs. forethought opposition.  In such structures as possessives,
logical and non-logical connectives, 

-- 
John Cowan	cowan@snark.thyrsus.com		...!uunet!lock60!snark!cowan
			e'osai ko sarji la lojban.