From nobody@digitalkingdom.org Mon May 28 00:32:53 2007
Received: with ECARTIS (v1.0.0; list lojban-beginners); Mon, 28 May 2007 00:32:54 -0700 (PDT)
Received: from nobody by chain.digitalkingdom.org with local (Exim 4.63)	(envelope-from <nobody@digitalkingdom.org>)	id 1HsZim-0005lQ-J4	for lojban-beginners-real@lojban.org; Mon, 28 May 2007 00:32:52 -0700
Received: from squirtle.drak.net ([72.52.144.201])	by chain.digitalkingdom.org with esmtp (Exim 4.63)	(envelope-from <pliny@ptelder.net>)	id 1HsZii-0005lB-Uq	for lojban-beginners@lojban.org; Mon, 28 May 2007 00:32:52 -0700
Received: from ptelder by squirtle.drak.net with local (Exim 4.63)	(envelope-from <pliny@ptelder.net>)	id 1HsZiN-0005pY-Qz	for lojban-beginners@lojban.org; Mon, 28 May 2007 02:32:27 -0500
Received: from 127.0.0.1 ([127.0.0.1])        (SquirrelMail authenticated user pliny@ptelder.net)        by squirtle.drak.net with HTTP;        Mon, 28 May 2007 02:32:27 -0500 (CDT)
Message-ID: <56746.127.0.0.1.1180337547.squirrel@squirtle.drak.net>
Date: Mon, 28 May 2007 02:32:27 -0500 (CDT)
Subject: [lojban-beginners] Re: lojban-beginners Digest V6 #83
From: pliny@ptelder.net
To: lojban-beginners@lojban.org
User-Agent: SquirrelMail/1.4.9a
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal
X-DrakNet-MailScanner-Information: Please contact the ISP for more information
X-DrakNet-MailScanner: Not scanned: please contact your Internet E-Mail Service Provider for details
X-DrakNet-MailScanner-SpamCheck: 
X-DrakNet-MailScanner-From: pliny@ptelder.net
X-Spam-Status: No
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - squirtle.drak.net
X-AntiAbuse: Original Domain - lojban.org
X-AntiAbuse: Originator/Caller UID/GID - [32054 32056] / [47 12]
X-AntiAbuse: Sender Address Domain - ptelder.net
X-Source: 
X-Source-Args: 
X-Source-Dir: 
X-Spam-Score: 0.6
X-Spam-Score-Int: 6
X-Spam-Bar: /
X-archive-position: 4747
X-ecartis-version: Ecartis v1.0.0
Sender: lojban-beginners-bounce@lojban.org
Errors-to: lojban-beginners-bounce@lojban.org
X-original-sender: pliny@ptelder.net
Precedence: bulk
Reply-to: lojban-beginners@lojban.org
X-list: lojban-beginners

> I don't know what's right, unless perhaps just using something like
> UTF-8 is suitable.  I don't know enough to know how to correctly label
> something that uses (say) 8859-1 for most content but also includes
> &#...; escapes for characters not from 8859-1 - maybe there is a
> correct way to label it, or maybe such a thing is inherently a spec
> violation.  (Perhaps labeling it 8859-1 is right, even, in which case
> the only thing wrong was your wording above, implying that the
> non-8859-1 characters were part of 8859-1.)

> Surely there's a Web maven here who can say?

Ask and ye shall recieve.

I was looking at the archives to see if I wanted to join the list, and I
couldn't resist.

You're on the right track in thinking the UTF-8 is the only quasi-sane way
to mix Latin  and CJK type characters.

The actual probems you are seeing with this document could be coming from
a number of directions, however. Here's the short list:

1. Improper formatting in the text editor making the document. You might
be able to mitigate this by throwing the document at a validator. The most
popular is the venerable W3C validator at: http://validator.w3.org/

If memory serves, the WDG validator checks for encoding issues. It's at:
http://htmlhelp.com/tools/validator/

You can also use Tidy to check the file locally: http://tidy.sourceforge.net/

2. The document could be getting subtly hosed by the mail client Martin
uses, or by our receiving clients. The archive says Martin's using IMP,
which is part of the web-based Horde platform.

There's a decent chance it's escaping the input in a funny way. It's
probably also mucking with the encoding. I'd recommend placing the file
somewhere on a server somewhere and passing out the link for people to
look at.

3. Certain browsers are known to choke on proper encoding and content-type
definetions *cough*IE*cough*. Wikipedia has an article about how Unicode
and HTML work together at: http://en.wikipedia.org/wiki/Unicode_and_HTML

It might be instructive to point problem browsers at it to see how well
they render the browser support section. It's also a good page to grab the
source from and throw it into an editor to see if it really likes UTF-8 or
not.

At this point it should be painfully obvious why the motto when dealing
with multi-lingual web developement is sometimes "Down, not across".

-Ken (Who is *really* glad he doesn't earn a living as a codemonkey any more)