Return-Path: Message-Id: <9110240859.AA11133@relay1.UU.NET> Date: Thu Oct 24 15:58:17 1991 Reply-To: Logical Language Group Sender: Lojban list From: Logical Language Group Subject: JCB on word recognition scores - response to Bruce and Doug on conlang X-To: conlang@buphy.bu.edu, lojban@cuvmb.cc.columbia.edu To: John Cowan , Eric Raymond , Eric Tiedemann Status: RO X-From-Space-Date: Thu Oct 24 15:58:17 1991 X-From-Space-Address: cbmvax!uunet!CUVMA.BITNET!LOJBAN Yes Doug, >I< saw your posting. So I will excerpt from "Loglan 2", by JCB: " We need next to know how to score words in such a way that will maximize their "recognition scores". The recognition score of a word is the sum of the separate probabilities for each of the eight language groups that that word will be recognized by its members. For the purpose of computing these probablilities we define the target population as composed of persons who spoke at least one of these eight languages in 1950 ...[data given and explained: formula weight = native speakers + 1/2 of secondary speakers] Table 3 converts these probabilities into a form useful for scoring words. It gives the joint probability of word recognition by a person of a stated language when some stated proportion of a clue word is to be found in that word. The table is based on the linear assumption that the probability of recognition of an old word in a new one is identical to the proportion of the sounds of the old word which occur (in the same order) in the new one. Obviously such an assumption cannot represent the facts of learning; and no doubt the real influence of the phonetic similarity of clue-words on learning will be found to vary from language to language when the matter has been properly studied. But in the absence of detailed knowledge of these matters, the linear assumption is both a plausible assumption and a usefully simple one mathematically* *It is grossly false for at least one language, Namely Chinese. This is because it doesn not take the effect of the Chinses tone phoneme into account. " Loglan 2: Methods of Construction (1970) - pg 52-56 [JCB then goes on to note that an English speaker will recognize the clue word "Negro" in "nigro" 100% of the time, adding the footnote: "We also assume in constructiing this table that once a clue-word has been recognized, the probability of recall of the meaning previously associated with it is unity." This deals with situations where the clue-word is NOT an exact meaning translation of the Loglan. The unspoken assumption, which JCB clarified in print within the last year in 'Lognet', is that in teaching a Loglan word to a person, you will not merely leave it to that person's imagination to find the clue word's contribution, but that you will specifically point out to the learner what the clue word is and how it is presumed to contribute to recognition. Given this, it is MUCH more likely that the clue words will be effective. For example, I would ask Bruce whether he actually looked up "kandi" and "dim" from my earlier posting when he used it as an example of where he 'would not recognize' it. I would contend that if he did not, the example is prima facie evidence that Bruce has formed a link in his mind between "dim" and "kandi". Given some meaningful reinforcement of this link over a period of time (as we seek in the LogFlash teaching program), there is (presumably) a 2/3 chance that Bruce will remember that link enough to recognize the clue word at some arbitrary time in the future >knowing the rules whereby such recognition is measured<. JCB claimed to test this algorithm vs. others in clases at the U. of Florida, pointing out clue-word recognition factors in a corpus of Loglan words, and testing people on recognition a week later, verifying that the algorithm he used gave the closest correlation to a recognition 'score'. I use the word 'score' in its statistical sense, as a relative weight of goodness of fit. The concept of a recognition score as an absolute percentage is bogus, and JCB realized it and stated it - it is like a probability in that it ranges from 0 to 1 and that the sume of the individual scores for the languages measures a total 'score' for all 8 languages, but little more. As I have stated many times, we are actually about to start a test of the validity of the recognition score algorithm using the new version of LogFlash which we hope is sufficiently instrumented. One conlanger has volunteered as a guinea pig; we would like more. A meaningful commitment over a few months is expected, and if you want to learn Lojban as well, we will supply necessary materials, at our expense if necessary (but we are extremely cash-tight at this time, so we hope even voluntters will contribute funds if they can), to ensure this commitment. I believe that any criticism of the Loglan/Lojban algorithm that does not recognize its admitted goals and shortcomings is invalid. (For those who care, Lojban used the same algorithm, but used 6 languages based on the same criteria, with 1987 language population data. The choice of 6 vs. 8 had several reasons, including a close tie between 7th, 8th, and 9th places, and the instability of the recognition algorithm when there are more than 3 major language families involved because there are only 3 consonants in each Loglan/Lojban root. (By instability, I mean trial words that had no apparent relation to each other and minimum recognition scores tended to end up in low-scoring ties for the 'top score'. There have been many proposals for high-recognition algorithms along the way - indeed too many to mention. They range from taking words whole from other languages, weighting by language family speakers instead of individual languages, and using a single language (English of course) as the source for easy to recognize roots that would maximize the learning of the language in our true audience: those who read our English-language teaching materials and those who are likely to develop meaningful Lojban applications. This last suggestion has recurred often enough that one proposed name, "Anglan" has stuck as a generic for all such proposals. But such a proposal would chase off most people actually interested in ALs, as well as run into the Procrustean bed problem that Bruce mentions afflicts Volapu:k and vidpuni. ---- lojbab = Bob LeChevalier, President, The Logical Language Group, Inc. 2904 Beau Lane, Fairfax VA 22031-1303 USA 703-385-0273 lojbab@grebyn.com NOTE THAT THIS IS A NEW NET ADDRESS AND SUPERSEDES OTHERS IN MY POSTINGS OR LOGICAL LANGUAGE GROUP, INC. PUBLICATIONS