Return-Path: Message-Id: Date: Mon, 18 Mar 91 01:12 EST From: lojbab (Bob LeChevalier) To: lojban-list Subject: are artificial languages scientifically interesting Status: RO X-From-Space-Date: Mon Mar 18 01:12:56 1991 X-From-Space-Address: lojbab Following is a post I'm making to sci.lang, but for which I'd like comments from inside the community on how to improve (and suggestions for additional examples along the same lines). Post to the list if they are of general interest. Subject: The Scientific Value of Artificial Languages Newsgroups: sci.lang Organization: The Logical Language Group, Inc. Summary: Several ideas, and more if you want them Keywords: science, experiment, AL, model, Lojban pautler@ils.nwu.edu (david pautler) says (12 Mar 91 15:59:02 GMT): > I did not say that ALs have no good use. I said there's nothing >particularly interesting about them (from a scientific viewpoint - this is >a `sci' group) *because* they're artificial. Some interesting sociological >behaviors may appear if these languages come into widespread use, perhaps >even some interesting linguistic phenomena if enough spontaneous innovation >occurs (although AL enthusiasts seem to want to prevent this). But there >certainly doesn't appear to be anything interesting about them now, because >AL enthusiasts in this group prefer to argue over which of several (truly >arbitrary) conventions are "better". > I am willing to admit I am wrong about all this if some of you AL >enthusiasts can give the rest of us some good reasons why ALs *are* >scientifically interesting. David later adds ( 15 Mar 91 04:46:32 GMT): > I still believe that knowing the design principles of any system >beforehand makes a scientific study of those principles silly, but I'm >going to get off my high horse and go back to being a level-headed >contributor. This addition definitely clarifies the goal, and the problem, especially since it removes the loaded topic 'AL' from the question. I will answer primarily from the standpoint of Lojban, though some of my points are applicable to Esperanto and other ALs. David is taking a very limited view of science, to presume that the design principles of a system are the only interesting thing about that system to a scientist. I can see a few other possibilities: a) in a highly complex system (which even an AL is), the interaction of the design features displays properties that are 'more than the sum of the parts'. Thus it is possible that all language is merely a system comprised of a bunch of neurons releasing neurotransmitters. Biochemistry may eventually devise a complete explanation for the neuronic process (including genetic components), and we may then say we "know the design prnciples of the system". But this won't be the case, because the complexity of those neuronic interactions is so great that knowing the pieces does not give a total understanding of the >system<. This indeed may be what defines the concept 'system'. Knowing all the prescribed rules of an AL does not tell you how that AL will be used communicatively, and I don't mean in the sociological sense. A sample question: Given multiple ways of communicating the same idea, do users of the language choose particular forms over others, and why? This is similar to a question that presumably is commonly asked about natural languages. I can come up with many other sample questions of science that can be applied to the system of an AL that are not compromised by 'knowing the design', but let's move on. (Feel free to ask, though). b) A simpler system, which can be more fully understood, may serve as an excellent model for a less understood, more complex system. Thus the simpler system could be examined for parallels to hypotheses about the more complex system. Examination of the simpler system may suggest properties to look for in the more complex system, or it may even suggest hypotheses that can be tested in the more complex system. A 'hot' topic in parts of the Lojban community is whether the language has or should have, an underlying semantic theory. If there is one, it is certainly not as developed or prescribed as the syntactic design and theory. As a result, filtering out syntactic ambiguity allows a more direct examination of semantic ambiguities, including the properties of modification and restriction, resolution of anaphora, and identification of ellipses. Any semantic theories proposed for natural language can be looked at in terms of semantic usage in the simpler Lojban system. It seems likely that any theory NOT true of Lojban is at least suspicious with regard to natural language, thus allowing partial verification of theories (not complete - I would never say that ALs should be studied to the exclusion of natural languages, but rather in relation to them); if it however IS true of natural language, then you have found evidence that Lojban is in some way unnatural. Then you get to try to explain which of the known design features of Lojban causes this unnaturalness. By counterexample that design feature is NOT a feature of natural languages. Pragmatic effects can be more easily recognized in the simpler Lojban system, and can clearly be identified as pragmatic. Thus insights about pragmatic effects may be more visible in Lojban, insights that would then be tested in the natural languages. Again, moving on. c) Another aspect of a simple system is that it is easier to perform experiments on than a more complex system. There are fewer variables, and if the system is 'designed', some things that are variables are in effect TUNABLE constants, so that you can rerun the experiment with minor changes to explore the effects of those variables. Experimental linguistics is a virtually unthinkable possibility with the natural langauges. The Sapir-Whorf Hypothesis (no I'm not trying to ge susceptible to the same analysis as natural language in terms of TG, GB, UG (or whatever initials suit you %^). Take even a few children during the critical period and teach them this artificial language (at the same time as they learn their traditional language). Do they become truly bilingual? If they are as fluently flexible; you can evolve slightly different versions of the language very easily by simply changing some features. Forbid a given construct in the prescription, and do not teach it to a child. Does the child develop that construct anyway by analogy to other languages known, or does the child successfully adapt to whatever other processes you've designed into the language instead of the construct. It seems that all manner of linguistic universals could be investigated in this way. d) I've mentioned only child learning, because this is what many linguists concentrate on, as revealing the essential nature of language. But there is also the applied linguistics problems of teaching foreign languages. It is much easier to test a method or theory of vocabulary teaching/learning with an artificial language than with a natural language; I don't think particularly controversial the statement that ALs are more quickly (I didn't say easily!) learned then NLs. The pragmatic problems of language learning are alone justification into researching using ALs. But ALs may provide the solution as well as the means of testing. It seems to be well accepted that in learning a second language and then learning a third, you learn the third MUCH more quickly than the second. The example I've heard is that it might take 4 years to learn French and then 2 to learn German thereafter; and vice versa. If this is true, then, if you can learn an AL comparably well in 1 year as French in 4, then you can learn the AL and German in 3 years instead of 4, a gain of a year EVEN IF YOU NEVER AGAIN HAVE A USE FOR THE AL. But I don't claim this as a fact - it should be easily testable in a controlled experiment, and this seems much more scientific than arguments about what ALs and NLs are 'easier to learn'. e) Lojban has one feature designed to explore a less-understood aspect of language - the expression of emotion. Lojban allows expressive communication of emotions in words without suprasegmentals (this presumably unlike all natural languages, but not entirely, as many languages have a limited set of indicators of attitude in the form of interjections and some discursive function words e.g. 'but'). Can human beings manipulate the symbols of emotion in the same way they manipulate the comparable symbols of non-emotional expression? There is a whole range of experimental questions raised by this design element, probably the most 'unnatural' element of Lojban's design. f) The latter points to the one other aspect of a well-designed artificial language of scientific interest and value to linguistics - as a tool of analysis. Best an example. The new Scientific American Library book _The Science of Words_, by George A. Miller of Princeton (just out and I'm finding it quite interesting). A picture caption notes that Nootka (a Pacific Northwest language) has the single word 'inikwihl'minik'isit' meaning the equivlanet of the entire English sentence "Several small fires were burning in the house." I won't presume to know any more about Nootka than I've just told you, but in Lojban, I can express that sentence parallelling the English: so'i cmalu fagri puca jelca vine'i le prezda Many small fires were-then burning at-within the person-nest. and analytically as a single word (though not with the same structure as Nootka) prezdane'ikemcmafagyso'ikempruje'a person-house-inside-type_of-small-fire-many_some-type_of-previous-burner (Yes, I can say it! :^) Actually, according to Miller, the Nootka breaks down as: inikw -ihl -'minih -'is -it fire/burn in-the-house plural diminuitive past-tense This order is also expressible in Lojban: fagykemprezdanerso'icmapru fire-type_of-person-nest-inside-many_some-small-past_thing/event I don't know which of the two orders more accurately conveys how the Nootka speaker thinks of the concept expressed by the word, or whether others are better still. The Lojban in either case more accurate tracks the semantics of the Nootka, demonstrating the inadequacy of the English - the actual word as broken out did not require two separate particles for fire and burn as did the English equivalent, and the English translation used the more complicated tense 'were-burning' instead of the simpler, and presumably more accurate 'burnt'. (I'll plainly admit that I'm relying on the given explanations by Miller, which are in English, but it seems clear that in translating the word-sentence into English there is a considerable ambiguity introduced. I won't claim that Lojban can express EVERYTHING in the natural form of any language (Lojban has a less-marked syntactic word-order, and expressing other orders requires marking particles that would not be found in the source language. Thus there is a tradeoff between semantic representation and syntactic representation.) Still, I think a convincing case can be made that, as a predicate language, Lojban is a much more effective tool at studying both the forms and semantics of other languages than is English, which has its own cultural, syntactic and semantic complexities to gum up the analysis (especially if the analysis is being done by a non-native English speaker - if there is any place where there is a justification for an international, minimal-culture language, it is when linguists from different native language backgrounds try to perform and communicate their linguistic analyses). g) There is also the 'other' tool aspect of an artificial language, in computer and AI applications. A predicate language like Lojban should be especially amenable to AI processes - the programmers are familiar with predicate language expression and manipulation, and often store the data in predicate form internally for manipulation. With Lojban, such storage becomes a fairly trivial process. If Lojban is proven by experiment (per above) to have the systemic properties of a natural language, and is easier to implement in computational linguistics research problems, it serves as a tool to bridge those two disciplines, leading to more rapid and effective NLP. But only if it is tried. Even if it proves less than ideal, I have little doubt that study of natural language using computational linguistic techniques and a Lojban-based tool will be productive in ways not possible with any natural language. (In effect, this argument is the same as the last one, except that instead of two different-natural-language speakers trying to communicate about language, you have a human and a computer, who obviously speak different native languages, trying to communicate.) h) This has been raised before, but not as clearly perhaps: A highly prescribed language is an ideal test bed for examining the processes of language evolution. In the case of an AL like Lojban, as the speaking community in each culture grows, you can observe how the language creolizes in contact with those other languages. Because of the speed of learning, artificial languages should tend to show effects more quickly (by being mastered to a communicative level more quickly), and anecdotal evidence about Esperanto tends to support this idea. Does this mean that the conclusions are absolutely valid for natural language evolutionary processes. I don't claim so. But again, we are performing experiments with a model, somewhat idealized, of a natural language. Unlike a paper-theoretic model (as all linguistic theories must inherently be), this is a model that can be experimented with using live speakers. Provided that we understand the model as it evolves, that understanding much more approximates an understanding of natural language as time goes on. i) The large majority of languages have some degree, more or less, of prescription. In addition, some 'natural' languages, like modern Hebrew, formal Swahili, and some standardized dialects (e.g. Mandarin, which has been noted as being related but not identical to the Beijing dialect), are not all that far from being true artificial languages, but are much more interesting to linguists. A predominantly prescribed language would seem an especially effective tool for studying the effects of prescription on language development and use (again, linguistic and not sociological effects). None of these scientific applications of Lojban inherently requires a large fluent body of speakers, or any solely-native speaker of that tongue. If any of the less scientific applications of Lojban serve to justify it developing such a speaker base, the nature of its usefulness as a model will change. New applications, as yet not really predictable will turn up, aided by our no doubt increased understanding of language. But the model, even if well understood, no longer is as simple, and new Loglans and other experimental linguistic tools, all artificial languages, will be developed to take the next step. I have hopefully given a bit of food for thought, yet with only a few hours preparation. I also only thought about this as somewhat an outsider to the profession of linguistics. With a different point-of-view others should be able to find many more questions of scientific interest using an AL like Lojban either as a model, an experimental test bed, or a tool. And if even a small fraction of these ideas are useful, then ALs have a valid scientific role in linguistics. -- lojbab = Bob LeChevalier, President, The Logical Language Group, Inc. 2904 Beau Lane, Fairfax VA 22031-1303 USA 703-385-0273 lojbab@snark.thyrsus.com