From nobody@digitalkingdom.org Mon May 28 00:32:53 2007 Received: with ECARTIS (v1.0.0; list lojban-beginners); Mon, 28 May 2007 00:32:54 -0700 (PDT) Received: from nobody by chain.digitalkingdom.org with local (Exim 4.63) (envelope-from ) id 1HsZim-0005lQ-J4 for lojban-beginners-real@lojban.org; Mon, 28 May 2007 00:32:52 -0700 Received: from squirtle.drak.net ([72.52.144.201]) by chain.digitalkingdom.org with esmtp (Exim 4.63) (envelope-from ) id 1HsZii-0005lB-Uq for lojban-beginners@lojban.org; Mon, 28 May 2007 00:32:52 -0700 Received: from ptelder by squirtle.drak.net with local (Exim 4.63) (envelope-from ) id 1HsZiN-0005pY-Qz for lojban-beginners@lojban.org; Mon, 28 May 2007 02:32:27 -0500 Received: from 127.0.0.1 ([127.0.0.1]) (SquirrelMail authenticated user pliny@ptelder.net) by squirtle.drak.net with HTTP; Mon, 28 May 2007 02:32:27 -0500 (CDT) Message-ID: <56746.127.0.0.1.1180337547.squirrel@squirtle.drak.net> Date: Mon, 28 May 2007 02:32:27 -0500 (CDT) Subject: [lojban-beginners] Re: lojban-beginners Digest V6 #83 From: pliny@ptelder.net To: lojban-beginners@lojban.org User-Agent: SquirrelMail/1.4.9a MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-DrakNet-MailScanner-Information: Please contact the ISP for more information X-DrakNet-MailScanner: Not scanned: please contact your Internet E-Mail Service Provider for details X-DrakNet-MailScanner-SpamCheck: X-DrakNet-MailScanner-From: pliny@ptelder.net X-Spam-Status: No X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - squirtle.drak.net X-AntiAbuse: Original Domain - lojban.org X-AntiAbuse: Originator/Caller UID/GID - [32054 32056] / [47 12] X-AntiAbuse: Sender Address Domain - ptelder.net X-Source: X-Source-Args: X-Source-Dir: X-Spam-Score: 0.6 X-Spam-Score-Int: 6 X-Spam-Bar: / X-archive-position: 4747 X-ecartis-version: Ecartis v1.0.0 Sender: lojban-beginners-bounce@lojban.org Errors-to: lojban-beginners-bounce@lojban.org X-original-sender: pliny@ptelder.net Precedence: bulk Reply-to: lojban-beginners@lojban.org X-list: lojban-beginners > I don't know what's right, unless perhaps just using something like > UTF-8 is suitable. I don't know enough to know how to correctly label > something that uses (say) 8859-1 for most content but also includes > &#...; escapes for characters not from 8859-1 - maybe there is a > correct way to label it, or maybe such a thing is inherently a spec > violation. (Perhaps labeling it 8859-1 is right, even, in which case > the only thing wrong was your wording above, implying that the > non-8859-1 characters were part of 8859-1.) > Surely there's a Web maven here who can say? Ask and ye shall recieve. I was looking at the archives to see if I wanted to join the list, and I couldn't resist. You're on the right track in thinking the UTF-8 is the only quasi-sane way to mix Latin and CJK type characters. The actual probems you are seeing with this document could be coming from a number of directions, however. Here's the short list: 1. Improper formatting in the text editor making the document. You might be able to mitigate this by throwing the document at a validator. The most popular is the venerable W3C validator at: http://validator.w3.org/ If memory serves, the WDG validator checks for encoding issues. It's at: http://htmlhelp.com/tools/validator/ You can also use Tidy to check the file locally: http://tidy.sourceforge.net/ 2. The document could be getting subtly hosed by the mail client Martin uses, or by our receiving clients. The archive says Martin's using IMP, which is part of the web-based Horde platform. There's a decent chance it's escaping the input in a funny way. It's probably also mucking with the encoding. I'd recommend placing the file somewhere on a server somewhere and passing out the link for people to look at. 3. Certain browsers are known to choke on proper encoding and content-type definetions *cough*IE*cough*. Wikipedia has an article about how Unicode and HTML work together at: http://en.wikipedia.org/wiki/Unicode_and_HTML It might be instructive to point problem browsers at it to see how well they render the browser support section. It's also a good page to grab the source from and throw it into an editor to see if it really likes UTF-8 or not. At this point it should be painfully obvious why the motto when dealing with multi-lingual web developement is sometimes "Down, not across". -Ken (Who is *really* glad he doesn't earn a living as a codemonkey any more)