From rlpowell@digitalkingdom.org Sat Apr 13 21:34:02 2002 Return-Path: X-Sender: rlpowell@digitalkingdom.org X-Apparently-To: lojban@yahoogroups.com Received: (EGP: mail-8_0_3_1); 14 Apr 2002 04:34:02 -0000 Received: (qmail 11069 invoked from network); 14 Apr 2002 04:34:01 -0000 Received: from unknown (66.218.66.217) by m11.grp.scd.yahoo.com with QMQP; 14 Apr 2002 04:34:01 -0000 Received: from unknown (HELO chain.digitalkingdom.org) (216.231.54.78) by mta2.grp.scd.yahoo.com with SMTP; 14 Apr 2002 04:34:01 -0000 Received: from rlpowell by chain.digitalkingdom.org with local (Exim 3.35 #1 (Debian)) id 16wbiw-00086C-00 for ; Sat, 13 Apr 2002 21:34:46 -0700 Date: Sat, 13 Apr 2002 21:34:46 -0700 To: lojban Subject: Re: [lojban] brevity metrics Message-ID: <20020414043446.GD19164@digitalkingdom.org> Mail-Followup-To: lojban References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.3.28i From: Robin Lee Powell X-Yahoo-Group-Post: member; u=66827819 X-Yahoo-Profile: robinleepowell X-Yahoo-Message-Num: 13992 On Thu, Apr 11, 2002 at 02:31:45PM +0100, And Rosta wrote: > Robin Turner: > >A lot of extra words are itty-bitty cmavo which don't add much to > >the real length (conversely, translating English into Turkish results > >in fewer words, but some of them can be very long!). Another point > >is that Lojban _can_ #make distinctions explicit, and we tend to make > >it do so because we can, but it doesn't need to do so - sometimes > >Lojban can be amazingly terse. > > A good way of measuring brevity is to compare translations, e.g. by > comparing the Lojban translation of _Alice_ with translations into > other languages, measuring by bytes or pages. If anyone can be > bothered to do this, I'm sure lots of us would be interested in the > results. Heh. Behold, the power of unix: rlpowell@chain> grep '^ *[.a-z]' alice-??.texinfo | wc -w 31064 rlpowell@chain> grep '@c .*[a-z]' alice-??.texinfo | wc -w 29227 That took me about 2 minutes. Woot. Except it's slightly wrong. Again: rlpowell@chain> grep '^ *[.a-z]' alice-??.texinfo | sed 's/^[^;]*://' | wc -w 30880 rlpowell@chain> grep '@c .*[a-z]' alice-??.texinfo | sed 's/.*:@c//' | wc -w 26505 The first one in lojban. the second is English. Is Alice actually *finished*!? -Robin -- http://www.digitalkingdom.org/~rlpowell/ BTW, I'm male, honest. le datni cu djica le nu zifre .iku'i .oi le so'e datni cu to'e te pilno je xlali -- RLP http://www.lojban.org/