From cbmvax!uunet!mcnc.org!aurs01!aurw31!waugh Tue Mar 5 13:17:40 1991 Return-Path: for lojban-list@snark.thyrsus.com Date: Tue, 5 Mar 91 10:47:08 -0500 From: cbmvax!uunet!mcnc.org!aurs01!aurw31!waugh (Jack Waugh) Message-Id: <9103051547.AA01181@aurw31.local> To: aurs01!lojban-list@snark.thyrsus.com@mcnc.org Subject: Grammar Patterns to Aid Parsing Status: RO What I meant by "hard" little words was ones that the listner would have to know in order to parse a sentence. Let me refine that proto-proposal a little (forget about "hard"): An adequate model to use for the first stages of listening would be to say that the listner parses first and then starts semantic processing. Parsing is casting the utterance into a tree, so as to know the grouping relationships. Once this is done, the listener turns to the semantics of the words. For example, in Lisp, you might have an expression: (CONS A B) The parentheses say that CONS is to operate on A and B; this can be determined without any semantic information on CONS. Once the parse is done, the interpreter can start to ask "what does B mean", what does A mean, and what does CONS do with these two operands. Of course, the parsing rules for Lisp would not be appropriate for a spoken language for use between humans. However, maybe we could design a set of conventions which would require the parsing operation to act on less information than the whole cmavo vocabulary, just as a Lisp parser doesn't have to know the behavior of the functions. An approach would involve words in phonological series. The predicate words and names already conform to this. It seems to me quite a strength and argument for Loglan that the parser can easily recognize a predicate word, and the thinking about what it means can be delayed so as not to stand in the way of rapid parsing. Perhaps we could buy the same facility for the rest of the language. Some of the meanings that today reside in cmavo (little words) could be put in words in such series. There would be one series per parsing-category. When a category is not frequently used and has a lot of semantic choices, the way to make the series would be to always start with a marker syllable (or string of syllables) (I would write the marker as a separate "word", although it would have no semantics without what follows it) and follow the marker with some more syllables to specify the semantic choice. The semantic part would of course have to follow some rules that would let you know where it ended.