From zefram@fysh.org Thu May 13 15:26:51 2004
Received: with ECARTIS (v1.0.0; list lojban-list); Thu, 13 May 2004 15:26:51 -0700 (PDT)
Received: from [195.167.170.152] (helo=bowl.fysh.org ident=mail)
	by chain.digitalkingdom.org with esmtp (TLS-1.0:DHE_RSA_3DES_EDE_CBC_SHA:24)
	(Exim 4.32)
	id 1BOOf5-0007pB-Nn
	for lojban-list@lojban.org; Thu, 13 May 2004 15:26:44 -0700
Received: from zefram by bowl.fysh.org with local (Exim 3.35 #1 (Debian))
	id 1BOOf0-0005c9-00; Thu, 13 May 2004 23:26:38 +0100
Date: Thu, 13 May 2004 23:26:37 +0100
To: lojban-list@lojban.org
Subject: [lojban] Re: erasure words
Message-ID: <20040513222637.GI16333@fysh.org>
References: <20040513213804.GG16333@fysh.org> <20040513214744.GA4461@digitalkingdom.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20040513214744.GA4461@digitalkingdom.org>
User-Agent: Mutt/1.3.28i
From: Zefram <zefram@fysh.org>
X-archive-position: 7815
X-ecartis-version: Ecartis v1.0.0
Sender: lojban-list-bounce@lojban.org
Errors-to: lojban-list-bounce@lojban.org
X-original-sender: zefram@fysh.org
Precedence: bulk
Reply-to: lojban-list@lojban.org
X-list: lojban-list

Robin Lee Powell wrote:
>If you would like to produce a list of selma'o that can be considered
>equivalent for this purpose, I'd be willing to consider immplementing
>that.  I don't *think* there are any cases where LE and LA are not
>interchangeable.

This one is a low priority for me among the various competing projects.
I encourage anyone else to look into it.

>> Btw, this earmarking is a protocol engineering technique, and I highly
>> recommend it.  
>
>Really?  So you think CIDR is bad, then?

I don't see the connection.  Are you referring to the definition of
classful address space?  I think, given that there are to be classes of
network address and that those handling the addresses need to know the
class, defining in advance which addresses have which class is useful.
However, getting rid of the classes altogether, CIDR, is a better way.
Most entities handling an IP address *don't* need to know the class.

A good analogy is the DNS RRtype space.  Some RRtypes (A, MX,
...) represent actual data, but others (ANY, TSIG) don't behave that way.
A DNS server that receives data of an unrecognised RRtype *but knows
that it is a normal data RRtype* can correctly process the data and
pass it on to other parties.  An unrecognised non-data RRtype can't be
processed at all, and the server must reject the transaction.

Until recently no official categorisation of unassigned RRtypes into
data and non-data types was made, but non-data RRtypes were segregated,
counting down from 255 where data RRtypes counted up from 1.  Then someone
put a non-data RRtype (OPT) in data RRtype space, and people started to
notice that it wasn't safe to assume that an unrecognised RRtype was
a data type.  Now data and non-data RRtype spaces have been allocated
(OPT stands as a well-known exception to the zoning).  The current advice
is to treat unrecognised RRtypes in a data zone as data, and to reject
unrecognised RRtypes in the non-data zones.

>> If a Lojban parser sees a cmavo that it doesn't know, being able to
>> tell at least whether it is an erase operator would be *very* helpful.
>
>No, it wouldn't.  Not in the least.  The erase operators are all
>different selma'o, and are all handled completely independantly.

We're talking at cross-purposes here.  The issue is how an *unrecognised*
cmavo is handled.  What do you do in your parser with, say, "cei'au"?
Do you accept "le broda cei'au si brode"?

>How is "lu broda SA_LIKE li'u da" == da better than "lu broda sa lu si
>da" == da?

That's not the kind of case I had in mind, but it raises some good
points itself.  Consider the thought process behind using "lu": "I'm
in a "lu" quotation; it ends with "li'u"".  During the quotation, when
thinking about ending the quotation I should be thinking about "li'u",
not "lu".  Also, this new operator would encourage thinking about the
erasure as "end the quotation and ignore it", rather than "delete back
to the beginning of the quotation".  I prefer to think forwards, and in
terms of high-level constructs.

What I really had in mind was things like "le le nanmu ku stizu
ERASE_CONSTRUCT ku", where I want to skip over a nested construct.
This example should erase everything, back to and including the first
"le", rather than only going back to the second "le".  High-level
constructs again.  I don't want to be forced to remember the exact
sequence of words I've spoken in order to modify the sentence; I want
to be able to remember just the semantic value and the stack of open
grammatical constructs.  With this erase construct, in this example I
wouldn't have to care whether I said "le nanmu ku" or "ta".  Obviously the
value increases in longer sentences.

This was intended as a rather fanciful suggestion; I was more a fan of the
"erase current sumti" type operators that I suggested and that share all
of the traits I discussed above.  ("le le nanmu ku stizu ERASE_SUMTI"
*can't* be done with "sa".)  But I find the generalisation quite neat.
I think it's at least a useful thought experiment in the realm of
grammar-aware erase operators.

You seem to be hostile to new erase operators because of the complexity
of implementation.  Is that the case?  Perhaps further discussion
should occur when, and if, I produce a parser that implements erasure
more modularly.

-zefram