From nobody@digitalkingdom.org Wed Aug 08 11:10:35 2007 Received: with ECARTIS (v1.0.0; list lojban-beginners); Wed, 08 Aug 2007 11:10:35 -0700 (PDT) Received: from nobody by chain.digitalkingdom.org with local (Exim 4.67) (envelope-from ) id 1IIpzO-0004kR-0c for lojban-beginners-real@lojban.org; Wed, 08 Aug 2007 11:10:34 -0700 Received: from mail.bcpl.net ([204.255.212.10]) by chain.digitalkingdom.org with esmtp (Exim 4.67) (envelope-from ) id 1IIpzL-0004kK-Ni for lojban-beginners@lojban.org; Wed, 08 Aug 2007 11:10:33 -0700 Received: from webmail.bcpl.net (webmail.bcpl.net [204.255.212.24]) by mail.bcpl.net (8.13.0/8.13.0) with ESMTP id l78IATUP009237 for ; Wed, 8 Aug 2007 14:10:29 -0400 (EDT) X-WebMail-UserID: turnip Date: Wed, 8 Aug 2007 14:10:29 -0400 From: turnip To: lojban-beginners@lojban.org X-EXP32-SerialNo: 00002700 Subject: [lojban-beginners] anti-Zipfian gismu rant Message-ID: <46C10802@webmail.bcpl.net> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 X-Spam-Score-Int: 0 X-Spam-Bar: / X-archive-position: 5313 X-ecartis-version: Ecartis v1.0.0 Sender: lojban-beginners-bounce@lojban.org Errors-to: lojban-beginners-bounce@lojban.org X-original-sender: turnip@bcpl.net Precedence: bulk Reply-to: lojban-beginners@lojban.org X-list: lojban-beginners Compare these pair of sentences: Old Italian squirrels are stupid, but zebras are smart. loi tolci'o natmritaliano bo ritcyratcu cu tolmencre .iki'u xirmrxipotigre cu mencre. The Algerian gymnast's cassava is 10^-18 cubits long. le le jerxo zajba ku samcu cu xatsi gutci. Note the difference in length between the two sentences and their English counterpart. In the first, all the non-cmavo words are non-gismu, whereas in the second, all the non-cmavo words are gismu. The first sentence is almost twice as long as the English, whereas in the scond, it is about 20% shorter. The English sentences are roughly the same length. The relative frequency of the "important" English words in the British National Corpus (appearance per million words)* are: old 524.86 Italian 47.91 squirrel 2.28 stupid 30.89 zebra 2.22 smart 17.32, average =105.913 Algerian 2.18 gymnast 0.19 cassava 0.41 atto- 0.25 cubit 0.09 Average=0.624 So how come we have short words (gismu) for the latter set, but very long words for the former set? (* No attempts have been made to corect for singular vs. plural, etc. This is for single POV purposes only :-) (On a side note, my kids have a raccoon puppet, which my brother likes to put on and say, "I'm Italian. Yay!" in a silly voice, knowing that it makes me laugh uncorntoallably because of lojban's lack of gismu for Italian or raccoon :-) --gejyspa