Message-Id: <9110240859.AA11133@relay1.UU.NET>
Date: Thu Oct 24 15:58:17 1991
Reply-To: Logical Language Group <cbmvax!uunet!GREBYN.COM!pucc.PRINCETON.EDU!lojbab>
Sender: Lojban list <cbmvax!uunet!CUVMA.BITNET!pucc.PRINCETON.EDU!LOJBAN>
From: Logical Language Group <cbmvax!uunet!GREBYN.COM!pucc.PRINCETON.EDU!lojbab>
Subject:      JCB on word recognition scores - response to Bruce and Doug on
              conlang
To: John Cowan <cowan@SNARK.THYRSUS.COM>,
        Eric Raymond <eric@SNARK.THYRSUS.COM>,
        Eric Tiedemann <est@SNARK.THYRSUS.COM>
Status: RO

Yes Doug, >I< saw your posting.  So I will excerpt from "Loglan 2",
by JCB:

"
We need next to know how to score words in such a way that will maximize
their "recognition scores".  The recognition score of a word is the sum
of the separate probabilities for each of the eight language groups that
that word will be recognized by its members.  For the purpose of
computing these probablilities we define the target population as
composed of persons who spoke at least one of these eight languages in
1950  ...[data given and explained: formula weight = native speakers +
1/2 of secondary speakers]

Table 3 converts these probabilities into a form useful for scoring
words.  It gives the joint probability of word recognition by a person
of a stated language when some stated proportion of a clue word is to be
found in that word.  The table is based on the linear assumption that
the probability of recognition of an old word in a new one is identical
to the proportion of the sounds of the old word which occur (in the same
order) in the new one.  Obviously such an assumption cannot represent
the facts of learning; and no doubt the real influence of the phonetic
similarity of clue-words on learning will be found to vary from language
to language when the matter has been properly studied.  But in the
absence of detailed knowledge of these matters, the linear assumption is
both a plausible assumption and a usefully simple one mathematically*

*It is grossly false for at least one language, Namely Chinese.  This is
because it doesn not take the effect of the Chinses tone phoneme into
account.
"
    Loglan 2: Methods of Construction  (1970) - pg 52-56

[JCB then goes on to note that an English speaker will recognize the
clue word "Negro" in "nigro" 100% of the time, adding the footnote:

"We also assume in constructiing this table that once a clue-word has
been recognized, the probability of recall of the meaning previously
associated with it is unity."

This deals with situations where the clue-word is NOT an exact
meaning translation of the Loglan.

The unspoken assumption, which JCB clarified in print within the last
year in 'Lognet', is that in teaching a Loglan word to a person, you
will not merely leave it to that person's imagination to find the clue
word's contribution, but that you will specifically point out to the
learner what the clue word is and how it is presumed to contribute to
recognition.  Given this, it is MUCH more likely that the clue words
will be effective.  For example, I would ask Bruce whether he actually
looked up "kandi" and "dim" from my earlier posting when he used it as
an example of where he 'would not recognize' it.  I would contend that
if he did not, the example is prima facie evidence that Bruce has formed
a link in his mind between "dim" and "kandi".  Given some meaningful
reinforcement of this link over a period of time (as we seek in the
LogFlash teaching program), there is (presumably) a 2/3 chance that
Bruce will remember that link enough to recognize the clue word at some
arbitrary time in the future >knowing the rules whereby such
recognition is measured<.

JCB claimed to test this algorithm vs. others in clases at the U. of
Florida, pointing out clue-word recognition factors in a corpus of
Loglan words, and testing people on recognition a week later, verifying
that the algorithm he used gave the closest correlation to a recognition
'score'.  I use the word 'score' in its statistical sense, as a relative
weight of goodness of fit.  The concept of a recognition score as an
absolute percentage is bogus, and JCB realized it and stated it - it is
like a probability in that it ranges from 0 to 1 and that the sume of
the individual scores for the languages measures a total 'score' for all
8 languages, but little more.

As I have stated many times, we are actually about to start a test of
the validity of the recognition score algorithm using the new version of
LogFlash which we hope is sufficiently instrumented.  One conlanger has
volunteered as a guinea pig; we would like more.  A meaningful
commitment over a few months is expected, and if you want to learn
Lojban as well, we will supply necessary materials, at our expense if
necessary (but we are extremely cash-tight at this time, so we hope even
voluntters will contribute funds if they can), to ensure this
commitment.

I believe that any criticism of the Loglan/Lojban algorithm that does
not recognize its admitted goals and shortcomings is invalid.  (For
those who care, Lojban used the same algorithm, but used 6 languages
based on the same criteria, with 1987 language population data.  The
choice of 6 vs. 8 had several reasons, including a close tie between
7th, 8th, and 9th places, and the instability of the recognition
algorithm when there are more than 3 major language families involved
because there are only 3 consonants in each Loglan/Lojban root.  (By
instability, I mean trial words that had no apparent relation to each
other and minimum recognition scores tended to end up in low-scoring
ties for the 'top score'.

There have been many proposals for high-recognition algorithms along the
way - indeed too many to mention.  They range from taking words whole
from other languages, weighting by language family speakers instead of
individual languages, and using a single language (English of course) as
the source for easy to recognize roots that would maximize the learning
of the language in our true audience:  those who read our
English-language teaching materials and those who are likely to develop
meaningful Lojban applications.  This last suggestion has recurred often
enough that one proposed name, "Anglan" has stuck as a generic for all
such proposals.  But such a proposal would chase off most people
actually interested in ALs, as well as run into the Procrustean bed
problem that Bruce mentions afflicts Volapu:k and vidpuni.


----
lojbab = Bob LeChevalier, President, The Logical Language Group, Inc.
         2904 Beau Lane, Fairfax VA 22031-1303 USA
         703-385-0273
         lojbab@grebyn.com

NOTE THAT THIS IS A NEW NET ADDRESS AND SUPERSEDES OTHERS IN MY POSTINGS
            OR LOGICAL LANGUAGE GROUP, INC. PUBLICATIONS