From nobody@digitalkingdom.org Fri Jun 20 11:07:32 2008 Received: with ECARTIS (v1.0.0; list lojban-list); Fri, 20 Jun 2008 11:07:32 -0700 (PDT) Received: from nobody by chain.digitalkingdom.org with local (Exim 4.69) (envelope-from ) id 1K9l1I-0007wC-8Z for lojban-list-real@lojban.org; Fri, 20 Jun 2008 11:07:32 -0700 Received: from yw-out-1718.google.com ([74.125.46.152]) by chain.digitalkingdom.org with esmtp (Exim 4.69) (envelope-from ) id 1K9l1E-0007w1-At for lojban-list@lojban.org; Fri, 20 Jun 2008 11:07:32 -0700 Received: by yw-out-1718.google.com with SMTP id 5so695190ywm.46 for ; Fri, 20 Jun 2008 11:07:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=m71RVmXWNF9R4DpHFyR1ogi4Y548RZOix/XjrS/s4MY=; b=Bh/18+v9mp0jEoDK17TSaCqyPenNEzS0eShiNHmCkAYUlkF4rMf3u2sIqk1bdW69Xb vlwEbksd1I6oezKLJFqOqbASLF00Suhg4LA32fr+pgN7A9DQ3Wh5MHpjrUXZUA0vPtdO gn9nGh20skD7UOxcoO7H1fVLzmus9FufYZagM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=N2nNgXkdUkja9B9jZQbLTiCcgJUd9hTBawmm4VKeW9dmOkKo4lc5bbvX19LyYuoRJF aFxIHVevESiUuOGseK33yDVfDLsHgz8aZPuOjfGPLelg1MfcKxUT1pzpKO3jRsfDdY6S 2loSW2QNFK7+xNxBGNlzwm+/GTGSHUohDNcX4= Received: by 10.143.28.7 with SMTP id f7mr1636500wfj.40.1213985235793; Fri, 20 Jun 2008 11:07:15 -0700 (PDT) Received: by 10.142.50.21 with HTTP; Fri, 20 Jun 2008 11:07:15 -0700 (PDT) Message-ID: <737b61f30806201107o2e18fb57q5add63b8c3bc1ade@mail.gmail.com> Date: Fri, 20 Jun 2008 13:07:15 -0500 From: "Chris Capel" To: lojban-list@lojban.org Subject: [lojban] Re: lojgloss extraneous characters In-Reply-To: <12d58c160806200756y3b5d3ecdi45cd638bd803c321@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <737b61f30806180402g18ece640m43e0d18642c2280e@mail.gmail.com> <12d58c160806200756y3b5d3ecdi45cd638bd803c321@mail.gmail.com> X-Spam-Score: -0.0 X-Spam-Score-Int: 0 X-Spam-Bar: / X-archive-position: 14532 X-ecartis-version: Ecartis v1.0.0 Sender: lojban-list-bounce@lojban.org Errors-to: lojban-list-bounce@lojban.org X-original-sender: pdf23ds@gmail.com Precedence: bulk Reply-to: lojban-list@lojban.org X-list: lojban-list On Fri, Jun 20, 2008 at 9:56 AM, komfo,amonan wrote: > On Wed, 18 Jun 2008, Chris Capel wrote: > >> How should I handle non-lojban characters? I can strip out some that >> should be superfluous, and others that I can't make any sense out of, >> but should I translate some characters into cmavo? I'm thinking of >> parentheses, braces, and brackets here. I could translate parentheses >> into to-toi, and square brackets into sei-se'u. Any other ideas? >> >> Should I just ignore any random non-lojban characters? > > I think it's probably best to leach this shorthand out of Lojban usage. It > feels like a natlang security blanket to me, and seems to run counter to the > principle of audiovisual isomorphism. Perhaps so, but if so a parser/glosser is probably not the place to do it. I really want Lojgloss to be beginner friendly, so that they could paste any lojban text into the box and see what it means. So I want to make it as permissive as possible, at least by default. For instance, I plan to convert "\n\>*" (i.e., e-mail quotes) in the input to spaces so you can get glosses for quoted lojban text. On the other hand, perhaps there's something I can do as a step afterwards to encourage proper lojban? > A problem with numerals that hasn't > been brought up (at least in this thread) is: don't most non-anglophone > countries use "," where anglophones use "." and a space or "." where we use > "," (i.e. 186,282.397 == 186 282,397 == 186.282,397)? Of course, the > downside is that use of English-style numerals in Lojban text is > semi-standard & well represented in Lojban text to date, including the > instructional materials. Hmm. Currently the morphology parses digits as PA cmavo (except in cmene), but basically ignores "," and ".". I'm fine with leaving it that way, actually. It's close enough for non-standard input. I don't care about getting non-standard things *exactly right* every time, I just want it so they at least don't break the rest of the parse, and if it's possible to do something more useful than not with it, then I'd like to do it. Chris Capel -- "What is it like to be a bat? What is it like to bat a bee? What is it like to be a bee being batted? What is it like to be a batted bee?" -- The Mind's I (Hofstadter, Dennet) To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.