From rlpowell@digitalkingdom.org Sat Apr 13 21:34:02 2002
Return-Path: <rlpowell@digitalkingdom.org>
X-Sender: rlpowell@digitalkingdom.org
X-Apparently-To: lojban@yahoogroups.com
Received: (EGP: mail-8_0_3_1); 14 Apr 2002 04:34:02 -0000
Received: (qmail 11069 invoked from network); 14 Apr 2002 04:34:01 -0000
Received: from unknown (66.218.66.217)
  by m11.grp.scd.yahoo.com with QMQP; 14 Apr 2002 04:34:01 -0000
Received: from unknown (HELO chain.digitalkingdom.org) (216.231.54.78)
  by mta2.grp.scd.yahoo.com with SMTP; 14 Apr 2002 04:34:01 -0000
Received: from rlpowell by chain.digitalkingdom.org with local (Exim 3.35 #1 (Debian))
  id 16wbiw-00086C-00
  for <lojban@yahoogroups.com>; Sat, 13 Apr 2002 21:34:46 -0700
Date: Sat, 13 Apr 2002 21:34:46 -0700
To: lojban <lojban@yahoogroups.com>
Subject: Re: [lojban] brevity metrics
Message-ID: <20020414043446.GD19164@digitalkingdom.org>
Mail-Followup-To: lojban <lojban@yahoogroups.com>
References: <scb59e75.034@gwise-gw1.uclan.ac.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <scb59e75.034@gwise-gw1.uclan.ac.uk>
User-Agent: Mutt/1.3.28i
From: Robin Lee Powell <rlpowell@digitalkingdom.org>
X-Yahoo-Group-Post: member; u=66827819
X-Yahoo-Profile: robinleepowell

On Thu, Apr 11, 2002 at 02:31:45PM +0100, And Rosta wrote:
> Robin Turner:
> >A lot of extra words are itty-bitty cmavo which don't add much to
> >the real length (conversely, translating English into Turkish results
> >in fewer words, but some of them can be very long!). Another point
> >is that Lojban _can_ #make distinctions explicit, and we tend to make
> >it do so because we can, but it doesn't need to do so - sometimes
> >Lojban can be amazingly terse. 
> 
> A good way of measuring brevity is to compare translations, e.g. by
> comparing the Lojban translation of _Alice_ with translations into
> other languages, measuring by bytes or pages. If anyone can be
> bothered to do this, I'm sure lots of us would be interested in the
> results.

Heh.

Behold, the power of unix:

rlpowell@chain> grep '^ *[.a-z]' alice-??.texinfo | wc -w
31064
rlpowell@chain> grep '@c .*[a-z]' alice-??.texinfo | wc -w
29227

That took me about 2 minutes. Woot.

Except it's slightly wrong. Again:

rlpowell@chain> grep '^ *[.a-z]' alice-??.texinfo | sed 's/^[^;]*://' | wc -w
30880
rlpowell@chain> grep '@c .*[a-z]' alice-??.texinfo | sed 's/.*:@c//' | wc -w
26505

The first one in lojban. the second is English. Is Alice actually
*finished*!?

-Robin

-- 
http://www.digitalkingdom.org/~rlpowell/ BTW, I'm male, honest.
le datni cu djica le nu zifre .iku'i .oi le so'e datni cu to'e te pilno
je xlali -- RLP http://www.lojban.org/

