Return-Path: <jimc@math.ucla.edu>
	(Sendmail 5.61/1.07) id AA05237; Fri, 12 Mar 93 09:48:13 -0800
Message-Id: <9303121748.AA05237@hilgard.math.ucla.edu>
Subject: Re: TECH: AI Project Proposal 
             <9303121406.AA25269@julia.math.ucla.edu> 
Date: Fri, 12 Mar 93 09:48:12 -0800
From: jimc@math.ucla.edu
X-Mozilla-Status: 0011

Your lojban->prolog translator sounds like a really neat project.  

Comment #1:  I understand that the goal is for a program to read Lojban
text and store an interpreted meaning in a Prolog database.  To me, a
"translator" takes input in (e.g.) Lojban and spits out (e.g.) English.
It's true that the saved representation of a Prolog database has the
form of a Prolog program that would re-create the database, so the
proposed program really is a "translator".  But as I see it, your main
goal is to prove that the meaning represented by the Lojban text can be
extracted and gotten into the database; spitting out Prolog is
incidental.  Therefore I suggest you call the project something else,
such as a "semantic analyser" or something like that.

Comment 1a: State the project's goal early in the text, with no more
than 2% of total length ahead of it for introduction.  After that,
justify why it's interesting, and describe Lojban.  You put the goal
about 30% of the way through the proposal.  The Authority will have to
store all the intro material as prenex-oid stuff until the thesis is
reached, then dump it into his database in an organized fashion.  Few
will have either the patience or the short term memory to accomodate
you.  

Comment 1b: It's obvious to you and me why the project is interesting,
but not to the Authority, and that section of the proposal is a little
weak.  The justification should motivate the guy to pay attention,
before you lay on the details of Lojban language.

Comment 1c: Lose the Whorf-Sapir hypothesis.  You're not writing an AI
to take the Turing test; the W-S test isn't a project goal; and you
don't even believe much in W-S.  So don't scatter the Authority's
attention by mentioning it.

Comment 2:  You're not very clear what are the Results to be achieved. 
In exploratory work, of course, the results can't be predicted, but
here I think both your proposal and your implementation will be
strengthened if the results are prespecified.  Here's one possibility:
to take running Lojban text, get it into the database, and to generate
answers (in Lojban) to any questions in the text.  By the way, your
prof doesn't know Lojban, and a pidgin English output option could save
your grade.  Also, a tool to view tree structures is valuable both for
debugging and for a demo, even if only implemented as text and not as a
snazzy GUI.

Comment #3:  Speaking of the Turing test, you refer to "a Prolog
database storing the information denoted in the text".  The word
"denoted" is inherently anthropocentric, and so success in this goal is
equivalent to passing the Turing test restricted to information
retrieval.  Or stated inversely, if a vicious judge decides that the
Lojban input "denotes" gobbledygook...  The obvious logical structure
of Lojban makes it fairly easy to convince a reviewer that there's
something in the input to denote, and that the output actually does
match the input, but you have to remember that adequacy of denotation
will be judged by humans on human terms -- no matter how much I rave
about algorithmic interpretation of dikyjvo.  

Comment #4: At one time I tried doing a parser for gua!spi in Prolog. I
found that parsing required too much procedural action (or maybe I just
wasn't good enough in Prolog).  Anyway, consider parsing the stuff in a
procedural language and feeding the result to a Prolog back end.

You asked about which semantic features should be included.  Never
doubt the power of recursion!  If you can get something to work on "mi
prami da" it should work trivially on termsets, for example, and on
second order predicates (abstractions) equally easily, provided you can
glork the proper binding of propositional function arguments in the
absence of official rules.  

(See John Cowan's recent posting about property abstractions for the
meaning of "glork" and of "propositional function".  Lojbab and Cowan
strongly resist dikyjvo because it exposes the hair of Hydra.  To
believe in dikyjvo they also have to deal with diktanru and with
propositional functions, and that's too much for them to bite off at
once.  It seems to me that glorking may be the key to resolving their
resistance, and that Cowan may (or may not) now be motivated to
determine authoritatively how to glork.  In your project if you want to
deal with second order predicates you are going to have to do the same
thing.  Once a rule for glorking has been deduced, the second order
predicates fall into place.  Then you can *easily* say that the same
rules apply to diktanru (belenu style) as an abbreviation for
abstractions written in full, and you then *easily* extend the same
interpretation to dikyjvo derived from those diktanru.)


> 1. Simple predications with a known predicate, and with arguments without
> internal structure (Proper names, logical variables). No quantification
> other than existential.
> eg. mi prami da --- There exists an X such that LOVES(i,X).

No problem.  By the way, in a related but less ambitious project I put
a lot of emphasis on identifying pronoun antecedents and on always
having the pronoun linked up with its antecedent.  For me, a proper
name is a pronoun, and its antecedent is established in some kind of
naming ceremony.  The situation must be dealt with, of course, of a
pronoun whose antecedent is not manifest to the program.

> 2. Non-Veridical arguments (cf. English "the") based on predicates, with
> internal arguments.
> eg. mi catra le prami be le pulji --- KILLS(i,x) & LOVE(x,y) & POLICE(y):
> I kill the lover of the policeman.

In-mind arguments are very hard to handle if you don't have a mind. 
I would suggest that you either forget entirely about "le", or cheat and
say "le means that the referent is 'suggested' by the bridi-tail, and
for this AI, an exact match is the best we can do for suggestions."

> 3. Veridical arguments (cf. English "an") based on predicates, with
> internal arguments.
> eg. mi catra lo prami be lo pulji --- There exist X and Y such that:
> KILLS(i,X) & LOVE(X,Y) & POLICE(Y): I kill a lover of a policeman.

No problem (I think).

> 4. Resolution of logical connectives.
> eg. mi nelci do .e ko'a ---> mi nelci do .ije mi nelci ko'a ---
> LIKES(i,you) & LIKES(i,x1): I like you and him.

No problem.

> 5. Restrictive and non-restrictive relative clauses.
> eg. mi nelci le prenu poi do xebni ke'a --- (There exists x such that
> HATES(you,x)) & LIKES(i,x) & person(x): I like the person you hate.

Nonrestrictive clauses are harder than restrictive ones, because you
have to jerk a buried clause onto the chain of clauses that are actually
asserted, rather than being portions of the asserted clause.  

I have always assumed that the semantics of a restrictive subordinate 
clause match(es) its syntax, i.e. it is a sub-unit of the restricted
sumti (or bridi, with fi'o).  Recently I have come to look at it
differently, as an output filter.  Each bridi (including S-bridi) has
an export path.  For sumti it's the virtual pronoun (ke'a?) in x1 after
conversion; for bridi it is not well represented in Lojban syntax.  The
subordinate clause, as a propositional function, its argument binds to
that export variable, and only referents for which the function is true
are retained for export.  

> 6. Higher order predicates.
> eg. lenu mi cadzu cu nandu --- DIFFICULT(event:WALKS(i)): My walking is
> difficult.

Not difficult, if you know how to glork.

> 7. Prepositional phrases (other than tense and location).
> eg. mi naumau do nelci ko'a ---> mi zmadu do leni da nelci ko'a ---
> EXCEEDS(i,you,quantity:LIKES(X,x1)): I like him more than you.

You picked the most complicated example.  Most modal phrases -- not
this one, but including tenses -- can be mapped 1-1 into restrictive
subordinate clauses with the single governed sumti in x2 after
conversion.  E.g. 

	lo catra besepi'o lo mrudakfu
	lo catra poi pilno lo mrudakfu
	An axe-murderer

> 8. Attitudinals.
> eg. mi .ui sidju do ---> mi sidju do .ije mi gleki mi va'o lenu mi sidju
> do: HELP(i,you) & HAPPY(i,i) & CONTEXT((state:HAPPY(i,i),event:HELP(i,you)):
> I *smile* will help you, I am happy to help you.

I'm happier to render attitudinals and discursives as supplementary
subordinate clauses (fi'o style, actually emulating <BAI>) on the main
bridi, rather than with the logical connective.  Lojbab and Cowan
explode, that attitudinal indicators are not assertions, are not
veridical, are not anything but metalinguistic noise.  I think they
should stuff it.  Put indicators in your program.  You do, however,
have to be forceful in selecting the default x2 occupant in the clause.
I find for attitudinals, not discursives, that x1 of the modified
bridi is more often the right choice than "mi".  Note that in many
sentences with indicators, like your example, "mi" is naturally
occupying x1 already.  

> 9. Tense (including location), and prepositions of tense (including location)
 > .
> Also includes modality and event contours.
> eg. mi ba'o tavla ---> lenu mi tavla cu ba'o zei balvi zo'e:
> AFTERMATH(event:talk(i,_,_,_),_): I have spoken.

It's hard for the program to handle the metric semantics of tense predicates,
but the syntax is identical to general <BAI>.

> 10. Non-logical connectors. 
> eg. la gilbrt. joi la salivn. cu finti la mikadon. --- INVENT(X,mikado) & 
> JOINT_MASS(X,gilbert,sullivan): G & S (as a joint unit) wrote The Mikado. 

Syntax: easy.  Semantics: hard.  Maybe leave for later -- after mass/set
arguments.

> 11. Masses and sets as arguments. 
> eg. loi remna cu sipna: the mass of humans sleep (Even though it is not 
> true at any given moment that For all X: HUMAN(X) => SLEEPS(X) 

I know that Paradox can do sets; programming sets may not be trivial but
it probably is doable.  Masses are another matter.  You certainly will
have to learn what a mass really is.  If you do, please tell me.

> 12. Quantification (including numerical): eg. mu le ze
mensi cu cucycau: five of the seven sisters are barefoot. 

I assume the difficulty here is handling the metric aspect.  Interpreting
"most" or "some" would be the hardest.

> 13, Negation. Contradictory, scalar. Use of prenexes. 
> mi naku ro prenu cu
prami: NOT(For all X:PERSON(X), LOVES(i,X)) 
> mi ro prenu na prami: For all X:PERSON(X), NOT(LOVES(i,X)) 

I would think that prenexes would be trivial; the challenge is to do the
quantified variable, whether in a prenex or not.  This challenge belongs
with quantification.  

As for scalar and polar negation, two approaches suggest themselves: to
derive the meaning of the negated predicate from the basic predicate,
or to punt, assuming that the negated predicate is separate and
unrelated. If you can figure out what "ni" (quantity) means, you can do
this right.

> Sections of Lojban Grammar not
anticipated to be included in the model: 
>  
> 1. The mathematical subgrammar of Lojban. 

Right.  

> 2. Any analysis of word compounds. 

Awww!  With the present rules tanru are by definition inaccessible to
your program, and you have to handle lujvo as junior gismu.

> 3. Metalinguistic comments.

Right, they should be thrown out as being irrelevant.  However, you could
use the protocol words (coi, co'o, be'e, fe'o, etc.) to control online
interaction with your program.  

Anyway, it looks like a very interesting proposal.  The stepwise
addition of goals, ending when the due date approaches, is a very good
strategy.  Be sure to keep a completely working copy of the program and
test data sets (not "almost working", complete!), when each step is
finished, in case the next step is intractible and you have to turn in
the previous version.  Good luck, and keep me posted!

		-- jimc