From nobody@digitalkingdom.org Fri Jun 20 11:07:32 2008
Received: with ECARTIS (v1.0.0; list lojban-list); Fri, 20 Jun 2008 11:07:32 -0700 (PDT)
Received: from nobody by chain.digitalkingdom.org with local (Exim 4.69)	(envelope-from <nobody@digitalkingdom.org>)	id 1K9l1I-0007wC-8Z	for lojban-list-real@lojban.org; Fri, 20 Jun 2008 11:07:32 -0700
Received: from yw-out-1718.google.com ([74.125.46.152])	by chain.digitalkingdom.org with esmtp (Exim 4.69)	(envelope-from <pdf23ds@gmail.com>)	id 1K9l1E-0007w1-At	for lojban-list@lojban.org; Fri, 20 Jun 2008 11:07:32 -0700
Received: by yw-out-1718.google.com with SMTP id 5so695190ywm.46        for <lojban-list@lojban.org>; Fri, 20 Jun 2008 11:07:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=gmail.com; s=gamma;        h=domainkey-signature:received:received:message-id:date:from:to         :subject:in-reply-to:mime-version:content-type         :content-transfer-encoding:content-disposition:references;        bh=m71RVmXWNF9R4DpHFyR1ogi4Y548RZOix/XjrS/s4MY=;        b=Bh/18+v9mp0jEoDK17TSaCqyPenNEzS0eShiNHmCkAYUlkF4rMf3u2sIqk1bdW69Xb         vlwEbksd1I6oezKLJFqOqbASLF00Suhg4LA32fr+pgN7A9DQ3Wh5MHpjrUXZUA0vPtdO         gn9nGh20skD7UOxcoO7H1fVLzmus9FufYZagM=
DomainKey-Signature: a=rsa-sha1; c=nofws;        d=gmail.com; s=gamma;        h=message-id:date:from:to:subject:in-reply-to:mime-version         :content-type:content-transfer-encoding:content-disposition         :references;        b=N2nNgXkdUkja9B9jZQbLTiCcgJUd9hTBawmm4VKeW9dmOkKo4lc5bbvX19LyYuoRJF         aFxIHVevESiUuOGseK33yDVfDLsHgz8aZPuOjfGPLelg1MfcKxUT1pzpKO3jRsfDdY6S         2loSW2QNFK7+xNxBGNlzwm+/GTGSHUohDNcX4=
Received: by 10.143.28.7 with SMTP id f7mr1636500wfj.40.1213985235793;        Fri, 20 Jun 2008 11:07:15 -0700 (PDT)
Received: by 10.142.50.21 with HTTP; Fri, 20 Jun 2008 11:07:15 -0700 (PDT)
Message-ID: <737b61f30806201107o2e18fb57q5add63b8c3bc1ade@mail.gmail.com>
Date: Fri, 20 Jun 2008 13:07:15 -0500
From: "Chris Capel" <pdf23ds@gmail.com>
To: lojban-list@lojban.org
Subject: [lojban] Re: lojgloss extraneous characters
In-Reply-To: <12d58c160806200756y3b5d3ecdi45cd638bd803c321@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <737b61f30806180402g18ece640m43e0d18642c2280e@mail.gmail.com>	 <alpine.LRH.1.00.0806200920540.28134@grid.cec.wustl.edu>	 <12d58c160806200756y3b5d3ecdi45cd638bd803c321@mail.gmail.com>
X-Spam-Score: -0.0
X-Spam-Score-Int: 0
X-Spam-Bar: /
X-archive-position: 14532
X-ecartis-version: Ecartis v1.0.0
Sender: lojban-list-bounce@lojban.org
Errors-to: lojban-list-bounce@lojban.org
X-original-sender: pdf23ds@gmail.com
Precedence: bulk
Reply-to: lojban-list@lojban.org
X-list: lojban-list

On Fri, Jun 20, 2008 at 9:56 AM, komfo,amonan <komfoamonan@gmail.com> wrote:
> On Wed, 18 Jun 2008, Chris Capel wrote:
>
>> How should I handle non-lojban characters? I can strip out some that
>> should be superfluous, and others that I can't make any sense out of,
>> but should I translate some characters into cmavo? I'm thinking of
>> parentheses, braces, and brackets here. I could translate parentheses
>> into to-toi, and square brackets into sei-se'u. Any other ideas?
>>
>> Should I just ignore any random non-lojban characters?
>
> I think it's probably best to leach this shorthand out of Lojban usage. It
> feels like a natlang security blanket to me, and seems to run counter to the
> principle of audiovisual isomorphism.

Perhaps so, but if so a parser/glosser is probably not the place to do
it. I really want Lojgloss to be beginner friendly, so that they could
paste any lojban text into the box and see what it means. So I want to
make it as permissive as possible, at least by default. For instance,
I plan to convert "\n\>*" (i.e., e-mail quotes) in the input to spaces
so you can get glosses for quoted lojban text. On the other hand,
perhaps there's something I can do as a step afterwards to encourage
proper lojban?

> A problem with numerals that hasn't
> been brought up (at least in this thread) is: don't most non-anglophone
> countries use "," where anglophones use "." and a space or "." where we use
> "," (i.e. 186,282.397 == 186 282,397 == 186.282,397)? Of course, the
> downside is that use of English-style numerals in Lojban text is
> semi-standard & well represented in Lojban text to date, including the
> instructional materials.

Hmm. Currently the morphology parses digits as PA cmavo (except in
cmene), but basically ignores "," and ".". I'm fine with leaving it
that way, actually. It's close enough for non-standard input. I don't
care about getting non-standard things *exactly right* every time, I
just want it so they at least don't break the rest of the parse, and
if it's possible to do something more useful than not with it, then
I'd like to do it.

Chris Capel
-- 
"What is it like to be a bat? What is it like to bat a bee? What is it
like to be a bee being batted? What is it like to be a batted bee?"
-- The Mind's I (Hofstadter, Dennet)


To unsubscribe from this list, send mail to lojban-list-request@lojban.org
with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if
you're really stuck, send mail to secretary@lojban.org for help.