Return-Path: (Sendmail 5.61/1.07) id AA05237; Fri, 12 Mar 93 09:48:13 -0800 Message-Id: <9303121748.AA05237@hilgard.math.ucla.edu> Subject: Re: TECH: AI Project Proposal <9303121406.AA25269@julia.math.ucla.edu> Date: Fri, 12 Mar 93 09:48:12 -0800 From: jimc@math.ucla.edu X-Mozilla-Status: 0011 Your lojban->prolog translator sounds like a really neat project. Comment #1: I understand that the goal is for a program to read Lojban text and store an interpreted meaning in a Prolog database. To me, a "translator" takes input in (e.g.) Lojban and spits out (e.g.) English. It's true that the saved representation of a Prolog database has the form of a Prolog program that would re-create the database, so the proposed program really is a "translator". But as I see it, your main goal is to prove that the meaning represented by the Lojban text can be extracted and gotten into the database; spitting out Prolog is incidental. Therefore I suggest you call the project something else, such as a "semantic analyser" or something like that. Comment 1a: State the project's goal early in the text, with no more than 2% of total length ahead of it for introduction. After that, justify why it's interesting, and describe Lojban. You put the goal about 30% of the way through the proposal. The Authority will have to store all the intro material as prenex-oid stuff until the thesis is reached, then dump it into his database in an organized fashion. Few will have either the patience or the short term memory to accomodate you. Comment 1b: It's obvious to you and me why the project is interesting, but not to the Authority, and that section of the proposal is a little weak. The justification should motivate the guy to pay attention, before you lay on the details of Lojban language. Comment 1c: Lose the Whorf-Sapir hypothesis. You're not writing an AI to take the Turing test; the W-S test isn't a project goal; and you don't even believe much in W-S. So don't scatter the Authority's attention by mentioning it. Comment 2: You're not very clear what are the Results to be achieved. In exploratory work, of course, the results can't be predicted, but here I think both your proposal and your implementation will be strengthened if the results are prespecified. Here's one possibility: to take running Lojban text, get it into the database, and to generate answers (in Lojban) to any questions in the text. By the way, your prof doesn't know Lojban, and a pidgin English output option could save your grade. Also, a tool to view tree structures is valuable both for debugging and for a demo, even if only implemented as text and not as a snazzy GUI. Comment #3: Speaking of the Turing test, you refer to "a Prolog database storing the information denoted in the text". The word "denoted" is inherently anthropocentric, and so success in this goal is equivalent to passing the Turing test restricted to information retrieval. Or stated inversely, if a vicious judge decides that the Lojban input "denotes" gobbledygook... The obvious logical structure of Lojban makes it fairly easy to convince a reviewer that there's something in the input to denote, and that the output actually does match the input, but you have to remember that adequacy of denotation will be judged by humans on human terms -- no matter how much I rave about algorithmic interpretation of dikyjvo. Comment #4: At one time I tried doing a parser for gua!spi in Prolog. I found that parsing required too much procedural action (or maybe I just wasn't good enough in Prolog). Anyway, consider parsing the stuff in a procedural language and feeding the result to a Prolog back end. You asked about which semantic features should be included. Never doubt the power of recursion! If you can get something to work on "mi prami da" it should work trivially on termsets, for example, and on second order predicates (abstractions) equally easily, provided you can glork the proper binding of propositional function arguments in the absence of official rules. (See John Cowan's recent posting about property abstractions for the meaning of "glork" and of "propositional function". Lojbab and Cowan strongly resist dikyjvo because it exposes the hair of Hydra. To believe in dikyjvo they also have to deal with diktanru and with propositional functions, and that's too much for them to bite off at once. It seems to me that glorking may be the key to resolving their resistance, and that Cowan may (or may not) now be motivated to determine authoritatively how to glork. In your project if you want to deal with second order predicates you are going to have to do the same thing. Once a rule for glorking has been deduced, the second order predicates fall into place. Then you can *easily* say that the same rules apply to diktanru (belenu style) as an abbreviation for abstractions written in full, and you then *easily* extend the same interpretation to dikyjvo derived from those diktanru.) > 1. Simple predications with a known predicate, and with arguments without > internal structure (Proper names, logical variables). No quantification > other than existential. > eg. mi prami da --- There exists an X such that LOVES(i,X). No problem. By the way, in a related but less ambitious project I put a lot of emphasis on identifying pronoun antecedents and on always having the pronoun linked up with its antecedent. For me, a proper name is a pronoun, and its antecedent is established in some kind of naming ceremony. The situation must be dealt with, of course, of a pronoun whose antecedent is not manifest to the program. > 2. Non-Veridical arguments (cf. English "the") based on predicates, with > internal arguments. > eg. mi catra le prami be le pulji --- KILLS(i,x) & LOVE(x,y) & POLICE(y): > I kill the lover of the policeman. In-mind arguments are very hard to handle if you don't have a mind. I would suggest that you either forget entirely about "le", or cheat and say "le means that the referent is 'suggested' by the bridi-tail, and for this AI, an exact match is the best we can do for suggestions." > 3. Veridical arguments (cf. English "an") based on predicates, with > internal arguments. > eg. mi catra lo prami be lo pulji --- There exist X and Y such that: > KILLS(i,X) & LOVE(X,Y) & POLICE(Y): I kill a lover of a policeman. No problem (I think). > 4. Resolution of logical connectives. > eg. mi nelci do .e ko'a ---> mi nelci do .ije mi nelci ko'a --- > LIKES(i,you) & LIKES(i,x1): I like you and him. No problem. > 5. Restrictive and non-restrictive relative clauses. > eg. mi nelci le prenu poi do xebni ke'a --- (There exists x such that > HATES(you,x)) & LIKES(i,x) & person(x): I like the person you hate. Nonrestrictive clauses are harder than restrictive ones, because you have to jerk a buried clause onto the chain of clauses that are actually asserted, rather than being portions of the asserted clause. I have always assumed that the semantics of a restrictive subordinate clause match(es) its syntax, i.e. it is a sub-unit of the restricted sumti (or bridi, with fi'o). Recently I have come to look at it differently, as an output filter. Each bridi (including S-bridi) has an export path. For sumti it's the virtual pronoun (ke'a?) in x1 after conversion; for bridi it is not well represented in Lojban syntax. The subordinate clause, as a propositional function, its argument binds to that export variable, and only referents for which the function is true are retained for export. > 6. Higher order predicates. > eg. lenu mi cadzu cu nandu --- DIFFICULT(event:WALKS(i)): My walking is > difficult. Not difficult, if you know how to glork. > 7. Prepositional phrases (other than tense and location). > eg. mi naumau do nelci ko'a ---> mi zmadu do leni da nelci ko'a --- > EXCEEDS(i,you,quantity:LIKES(X,x1)): I like him more than you. You picked the most complicated example. Most modal phrases -- not this one, but including tenses -- can be mapped 1-1 into restrictive subordinate clauses with the single governed sumti in x2 after conversion. E.g. lo catra besepi'o lo mrudakfu lo catra poi pilno lo mrudakfu An axe-murderer > 8. Attitudinals. > eg. mi .ui sidju do ---> mi sidju do .ije mi gleki mi va'o lenu mi sidju > do: HELP(i,you) & HAPPY(i,i) & CONTEXT((state:HAPPY(i,i),event:HELP(i,you)): > I *smile* will help you, I am happy to help you. I'm happier to render attitudinals and discursives as supplementary subordinate clauses (fi'o style, actually emulating ) on the main bridi, rather than with the logical connective. Lojbab and Cowan explode, that attitudinal indicators are not assertions, are not veridical, are not anything but metalinguistic noise. I think they should stuff it. Put indicators in your program. You do, however, have to be forceful in selecting the default x2 occupant in the clause. I find for attitudinals, not discursives, that x1 of the modified bridi is more often the right choice than "mi". Note that in many sentences with indicators, like your example, "mi" is naturally occupying x1 already. > 9. Tense (including location), and prepositions of tense (including location) > . > Also includes modality and event contours. > eg. mi ba'o tavla ---> lenu mi tavla cu ba'o zei balvi zo'e: > AFTERMATH(event:talk(i,_,_,_),_): I have spoken. It's hard for the program to handle the metric semantics of tense predicates, but the syntax is identical to general . > 10. Non-logical connectors. > eg. la gilbrt. joi la salivn. cu finti la mikadon. --- INVENT(X,mikado) & > JOINT_MASS(X,gilbert,sullivan): G & S (as a joint unit) wrote The Mikado. Syntax: easy. Semantics: hard. Maybe leave for later -- after mass/set arguments. > 11. Masses and sets as arguments. > eg. loi remna cu sipna: the mass of humans sleep (Even though it is not > true at any given moment that For all X: HUMAN(X) => SLEEPS(X) I know that Paradox can do sets; programming sets may not be trivial but it probably is doable. Masses are another matter. You certainly will have to learn what a mass really is. If you do, please tell me. > 12. Quantification (including numerical): eg. mu le ze mensi cu cucycau: five of the seven sisters are barefoot. I assume the difficulty here is handling the metric aspect. Interpreting "most" or "some" would be the hardest. > 13, Negation. Contradictory, scalar. Use of prenexes. > mi naku ro prenu cu prami: NOT(For all X:PERSON(X), LOVES(i,X)) > mi ro prenu na prami: For all X:PERSON(X), NOT(LOVES(i,X)) I would think that prenexes would be trivial; the challenge is to do the quantified variable, whether in a prenex or not. This challenge belongs with quantification. As for scalar and polar negation, two approaches suggest themselves: to derive the meaning of the negated predicate from the basic predicate, or to punt, assuming that the negated predicate is separate and unrelated. If you can figure out what "ni" (quantity) means, you can do this right. > Sections of Lojban Grammar not anticipated to be included in the model: > > 1. The mathematical subgrammar of Lojban. Right. > 2. Any analysis of word compounds. Awww! With the present rules tanru are by definition inaccessible to your program, and you have to handle lujvo as junior gismu. > 3. Metalinguistic comments. Right, they should be thrown out as being irrelevant. However, you could use the protocol words (coi, co'o, be'e, fe'o, etc.) to control online interaction with your program. Anyway, it looks like a very interesting proposal. The stepwise addition of goals, ending when the due date approaches, is a very good strategy. Be sure to keep a completely working copy of the program and test data sets (not "almost working", complete!), when each step is finished, in case the next step is intractible and you have to turn in the previous version. Good luck, and keep me posted! -- jimc