Return-Path: Message-Id: Date: Mon, 6 May 91 05:06 EDT From: lojbab (Bob LeChevalier) To: lojban-list Subject: culture words cultural bias Status: RO X-From-Space-Date: Mon May 6 05:08:35 1991 X-From-Space-Address: lojbab Several people have commented on the apparent biases of the cultural gismu (as well as element words, etc.). The set seleceed for Lojban has been discuissed in JL a couple of times, as well as here on the list. I fear that i it is a subject that will return to haunt us with most every new speaker. Several people have raised valid points indicating that the set of culture/ element words is biased/incomplete or what have you. Arthur Hyun takes either the most reasonable or most extreme position, depending on how you look at it - if Lojban claims to be culturally unbiased, then all cultures must be treated identically; otherwise Lojban inherently is slanted towards a particular culture, the design is arbitrary, and there is no point in debate. He also notes that there is no objective judge qualified to pass on the importance of gismu. My response: The set of gismu is certainly in one sense arbitrary - I can't state any external standard justifying the selection, and indeed we do not claim perfectly objective judgement. But I still claim that for all practical purposes the set is culturally neutral. Such a claim is always relative - there could me 'more perfect' neutrality in theory; I think we did a good job, and I do not think the list is 'slanted towards a particular culture', unless that culture is the non-existent Lojban culture. The set of gismu have derived over 35 years. Jim Brown selected the first set based on 3 or 4 sources, including BASIC English, some studies of words that are 'biologically primitive' in that they appear to be primitive in most every language, etc. He then used the Helen Eaton study of the most frequently used concepts in 4 languages (English/German/French/Spanish). This list is of course European biased, but it is the oinly such comparative study across several languages for word/concept frequencies, and Helen Eaton was doing so for AL research and was presumably aware of the neutrality issue. In any case, there is reason to believe that the list is more biased towards the obsoilescense of being 60 years old than it is toward a specific culture - key concepts in science and medicine areunknown in the list, while certain concepts no longer important rate highly. It is still a standard, and the only one. Brown assumes that Zipf's law holds. Zipf noted that word length was inversely proportionate to word frequency. Since gismu were the shortest content words they should be used for the most frequent concepts. He made gismu for most of the first 1000 concepts, unless there was an OBVIOUS 2-term lujvo based on higher frequency words. He then continued to the 2000 and 3000 concept levels, and ended up with about 750 gismu. From 1962-82 this list went from 750 to about 950. Because there WAS no l le'avla in the language design at that point, ALL of the elements were added as gismu, and many other rather idiossusncratic words like 'billiards'; if someone wrote something in Loglan and needed a word, a gismu was often the result. After GMR in 1982, there was the capability for le'avla, and some of these were backed out of the language, but JCB's Loglan still has a lot of historically idiosyncratic gismu which are giosmu only because they had no obvious 2-4 term tanru/lujvo. THis is the list we inherited when we redid the list into Lojban. Among the words were culture words for the 8 source langauges for Institute Loglan, (as well as SEPARATE gismu for the people and the culture) plus some idio- syncratic cultures that had been added, includin Italian, Scottish, Roman, and Amerind. We decided to regularize the set based on some external standard - the culkture words we used were those for JCB's 8 languages, and the other 4 we considered for Lojban (we once planned to use 12 languages instead of 6, then cut back to 6 for several reasons). We added the religions that were primary in the source cultures, and separate words for the several countries that used the source languages. Because we had le'avla, if we could not assign a good rafsi to any recognizeable form of the culture word, we left it out - the assignment of a short rafsi was the main justification for these words. The point of all this is that the culture words were added according to a s standard that is inherent in the history of the language and its design - thus no one really had to be an 'objective judge'. If it is accepted that our chicken mcnuggets word formation algorithm is culturally neutral, being based on 6 languages, then the culture words meet the same criteria of neutrality. In addition, the words are NOT slanted towards one culture - if so, we would have not used the Egyptian word for Egypt, the German word for Germany, etc. Yes, we had to leave some cultures out, and some countries that have speakers of the languages we do have. But the decision was not wholly arbitrary. The rest of the gismu were selected to complete various incomplete sets recognized by a Roget-like study of the gismu by Paul Doudna. Later, when Athelstan joined the project, we conducted two further reviews against Roget's Thesaurus looking to achieve 'completeness' in that the gismu could be used to form lujvo covering every concept in Roget. Roget is of course English-biased, but it also purports to be a comprehensive survey of the semantic word space and it is in that mode that we used the list. In the course of doing so we recognized that the rationale for gismu had changed since JCB first started Loglan (and in his versions this is also true, though he has never so-stated). At one point Brown thought his words were in some absolute sense 'primitive', partly based on his biological primitive research. This is not the current belief practice. gismu are in no way assumed to be the 'most basic', 'most important', or 'most' anything for one or several cultures. We now claim ONLY that the gismu we have are sufficient, using the lujvo- making rules to make reasonable length lujvo to cover any concept that is important across cultures (reasonable I set at about 4 terms, the longest lujvo ever made and published as a 'real word' for Loglan). Words that are specific to one culture, or are part of the international vocabulary of science are relegated to le'avla. BUT, in going to this definition of our gismu coverage, we did not claim the need to eliminate every gismu that had no obvious intercultural use. Indeed, if it was already made as a gismu, we kept it UNLESS, someone explicitly proposed its deletion accompanied by (usually) a 2-term lujvo for the concept. About 20 odd words were so deleted before the baseline. There is NO intent to delete any gismu prior to the 5-year usage baseline, because the only meaningful criteria taht would jsutify a deletion in the baseline period would be something like the word being impossibly vague (not likely since we have place structures for each). Arguments of usage -either potential or actual (including boycotts :-) are irrelevant; that is the point of the usage baseline, to see whether they are used. As a result of this long evolutionary process, it is clear that the list is not an arbitrary representation of one or two persons' biases. Being based on concepts of semeantic space, with some slight verification of usefulness in a few cultures, the list is close to comprehensive (with occasional new words proposed when we fins a gap). The list is not angled towards a specific one or even identifiable set of cultures, except that if some culture has a truly important concept that is not shared by any of the Eaton languages, it may be omitted. In which case, it will likely become a gismu later when recognized. Beyond this, I do not see the claims that the Lojban list is biased in some recognizable way towards any language. It can only b claimed that it is possibly biased AWAY from less common languages/cultures in the most trivial sense, since we are talking about exactly one word per such culture. No doubt if any of these less common cultures develops a significant Lojban speaker base during the formative years of the language, the culture will get a gismu. The argument that the remaining elelment words may be biased towards English or at least European cultures is plausible and proabbly valid. These were justified by their use in metaphors BEFORE we had the now clear policy against heavily figuartive metaphors. Even so, I believe there are ways to define these words based on the metaphorical properties attributed to the substance, leaving the 'chemical word' either for a lujvo (using pure or chemical) or a le'avla. Thus nickel is fine as is, chrome is highly reflective non-tarnishing metal, neon is flourescent (or possibly plasma? or possibly inert), chlorine can be used for all the halogens, (people put kliru lights on their cars?), etc. Thus eliminating the most obvious part of the bias, but more importantly allowing the words to be useful. Comments welcome, but recognize that changes to the baselined gismu list are most difficult to get approved, and we want it that way. lojbab