[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lojban-beginners] Re: anti-Zipfian gismu rant
Bob, first off, thanks for replying. I'm honored that you would. I don't
think I've ever seen you reply to a mail on this list before. (Too bad as of
last month I no longer work 9 miles from your home.)
>===== Original Message From Robert LeChevalier <lojbab@lojban.org> =====
>turnip wrote:
>> Compare these pair of sentences:
>> Old Italian squirrels are stupid, but zebras are smart.
>> loi tolci'o natmritaliano bo ritcyratcu cu tolmencre .iki'u xirmrxipotigre
cu
>> mencre.
>
>I don't know the significance of the English sentence. Is an "old
>Italian squirrel" a particular species?
No, it's a squirrel from Italy that's not young...
>
>The point is that only someone seeking a literal translation of the
>English would say "tolci'o natmritaliano bo ritcyratcu" (and I wouldn't
>be using "mencre/tolmencre" for the presumed intelligence contrast,
>either - probably instead a pair of lujvo based on menli-kakne-zmadu/mleca).
>
>Zipf's law is intended to deal with use of a language to express
>concepts ***in that language***. Much of the time, sentences in a
>language, when translated literally into another language, come out
>longer, with amounts that vary unpredictably. When speaking in a
>language, speakers tend to use forms that are short. All of our
>references to Zipf's Law are based on assumptions and predictions as to
>what the usage frequencies of words would be among fluent Lojban
>speakers speaking the language communicatively among themselves without
>reference to any external language (i.e. ignoring translation issues)
Of course. But as you point out below, there are plenty of concepts that
are cross-culturally common (intelligence being one example).
>
>Looking at other-language word frequencies in isolation is grossly
>misleading. Some words are high frequency because of multiple meanings
>that would require several different Lojban words to convey.
That's true, although the only word here likely to be contaminated in that
regard is "smart", which is used especially in British English to mean
"handsome, neat".
> Some
>words, specifically culture-related words, are high frequency in some
>cultures and low frequency in others. English speakers may refer to
>Italian- a lot. I doubt that Hindi, Chinese or Arabic speakers do.
>
Maybe, maybe not. I know I personally refer to Arabic more than I do
Italian. But by no objective measure (population, influence on world affairs
in the past several hundred years, economy, influence on international food,
clothing, etc.) can Italy be considered "lesser" than Algeria. The "do they
speak a language that lojban bases its gismu list on?" criterion seems
extremely arbitrary. Mind you, I haven't really had the need to speak of
Italian things (except when ranting abot lojban ;-) ), but I think you see my
point.
>Loglan/Lojban did attempt to consider usage frequency, but our basis was
>Helen Eaton's 1930s "Semantic Frequency List" which makes an effort to
>account for concepts as opposed to mere word forms. It only covered 4
>European languages, but even that removed some of the English biases.
>"squirrel" for example, was among the sixth thousand in English
>frequency, but in French, Spanish, and German it was too low a frequency
>to be rated (which means that it wasn't in the top 8000).
>
>James Cooke Brown set a priority on having a gismu or a *short* lujvo
>for the top 2000 or 3000 concepts.
>
>He also made sure he had covered some lists of what were considered
>"fundamental primitive concepts" that had root words in pretty much all
>languages (I believe Swadesh had such a list, but I can't recall whether
>JCB specifically used that one - to some extent, he made his own list of
>this sort using his own research).
>
>After the initial gismu making, he and others freely added gismu rather
>haphazardly and without any sort of frequency justification. This led
>to such oddities as gismu for "billiards". This was in the era when
>fu'ivla had to be in the form of a gismu or lujvo because the other
>forms simply weren't allowed.
>
Yes, I learned Loglan way back in '76 or so.
>When we remade the gismu list, we pared off most of the accretions as
>belonging in fu'ivla space. "gymnast" barely survived because it was an
>category of Olympic sports and hence arguably an international concept -
>and we couldn't think of a good short lujvo (to which terms could be
>added to indicate particular kinds of gymnastics). At that point there
>was no concept of making lujvo using fu'ivla.
>
>We put a lower priority on Eaton's frequencies (mostly because I didn't
>want to spend the time going through the list to determine JCB's
>justifications for his choices) and put a much higher priority on a
>word/concept's usefulness in making lujvo as opposed to its difficulty
>of being expressed as a lujvo.
>
>Late in the process, we went through Roget's thesaurus specifically
>looking for concepts that could not be easily expressed as lujvo, and
>then deciding for each whether it belonged as a gismu or a borrowing,
>and again whether it could be useful in making lujvo for other Roget words.
>
>Finally, in an attempt to be culturally neutral and systematic, we saw
>several sets of words that were incomplete, many because they included
>only Western biased concepts. For animals, plants, food staples, we
>sought out the most used concepts in non-Western cultures (hence the
>gismu for cassava/tar/starch roots and lotus).
>
>> The Algerian gymnast's cassava is 10^-18 cubits long.
>> le le jerxo zajba ku samcu cu xatsi gutci.
>
>JCB had many of the metric prefixes, so we made the set complete. We
>added all of the SI (metric) fundamental units as well. Because of the
>need to translate non-metric words, we had a parallel set of words for
>non-metric units. But not wanting to be biased towards any one culture,
> "foot" and "cubit" were combined. The keyword is "cubit" because
>"foot" is used as the keyword for the body part, and LogFlash requires
>unique keywords; it also stresses both the fact that the word is a
>measurement and that it is not limited to the English system of
>measurements - the English unit would be a lujvo based on glico-gutci.
>
>> Note the difference in length between the two sentences and their English
>> counterpart.
>
>Totally arbitrary, especially since your sentences are obviously
>arbitrarily designed to maximize the effect.
Of course, and I stated so forthrightly.
> It is easy to come up with
>short grammatically-correct nonsense in any language that may not
>translate briefly into another language. Chomsky's
>"Colorless green ideas sleep furiously" is probably an English example.
>
>
> > In the first, all the non-cmavo words are non-gismu, whereas in
>> the second, all the non-cmavo words are gismu. The first sentence is
almost
>> twice as long as the English, whereas in the scond, it is about 20%
shorter.
>> The English sentences are roughly the same length. The relative frequency
of
>> the "important" English words in the British National Corpus (appearance
per
>> million words)* are:
>>
>> old 524.86 Italian 47.91 squirrel 2.28 stupid 30.89 zebra 2.22 smart 17.32,
>> average =105.913
>>
>> Algerian 2.18 gymnast 0.19 cassava 0.41 atto- 0.25 cubit 0.09
>> Average=0.624
>>
>> So how come we have short words (gismu) for the latter set, but very long
>> words for the former set?
>
>An Algerian speaks one of our core languages. And Italian doesn't
>(unless you want to call Italian an eastern dialect of Spanish, or
>neo-Latin, in which case it would be a short lujvo).
>
>> (On a side note, my kids have a raccoon puppet, which my brother likes
to
>> put on and say, "I'm Italian. Yay!" in a silly voice, knowing that it
makes
>> me laugh uncorntoallably because of lojban's lack of gismu for Italian or
>> raccoon :-)
>
>And if we had gismu for both of those, then someone else would have a
>llama puppet, and they might be Vietnamese (and puppet isn't a gismu
>either).
But you _do_ have a gismu for llama -- kumte.... see?? There's more than
500,000,000 raccoons in the world (and yes, mostly concentrated in North
America, but ~1 million in Germany's forest, and also in Asia and South
America), and about 25 million camels, alpacas, llamas, and vicunas combined.
So again, who makes the decision on where to split the hairs (hares? ;-) ) ?
(Personally, I don't think there should be a gismu for raccoon, but I do think
there should be for rodent).
(And I wouldn't mind a Vietnamese gismu, either, for that matter)
>
>For that matter, if one of the people from the island south of Australia
>was speaking in his language, he might be upset to find the English
>translation for his cubit-long animal puppet is "sesquipedalian
>Tasmanian devil puppet", which might be expressed rather briefly in the
>native language.
>
Could be, but English doesn't claim cultural neutrality, or a vocabulary
built on international basic concepts. Lojban does.
For example, I asked some Israelis how to say "blueberry" in Hebrew. None
of them could think of it, because it only exists as foreign import there.
--Mike "gejyspa" Turniansky
(And remember, don't take any of this personally. I'm not trying to change
things around lojbanistan, just expressing some frustrations...)