[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lojban-beginners] Re: lojban-beginners Digest V6 #83



> I don't know what's right, unless perhaps just using something like
> UTF-8 is suitable.  I don't know enough to know how to correctly label
> something that uses (say) 8859-1 for most content but also includes
> &#...; escapes for characters not from 8859-1 - maybe there is a
> correct way to label it, or maybe such a thing is inherently a spec
> violation.  (Perhaps labeling it 8859-1 is right, even, in which case
> the only thing wrong was your wording above, implying that the
> non-8859-1 characters were part of 8859-1.)

> Surely there's a Web maven here who can say?

Ask and ye shall recieve.

I was looking at the archives to see if I wanted to join the list, and I
couldn't resist.

You're on the right track in thinking the UTF-8 is the only quasi-sane way
to mix Latin  and CJK type characters.

The actual probems you are seeing with this document could be coming from
a number of directions, however. Here's the short list:

1. Improper formatting in the text editor making the document. You might
be able to mitigate this by throwing the document at a validator. The most
popular is the venerable W3C validator at: http://validator.w3.org/

If memory serves, the WDG validator checks for encoding issues. It's at:
http://htmlhelp.com/tools/validator/

You can also use Tidy to check the file locally: http://tidy.sourceforge.net/

2. The document could be getting subtly hosed by the mail client Martin
uses, or by our receiving clients. The archive says Martin's using IMP,
which is part of the web-based Horde platform.

There's a decent chance it's escaping the input in a funny way. It's
probably also mucking with the encoding. I'd recommend placing the file
somewhere on a server somewhere and passing out the link for people to
look at.

3. Certain browsers are known to choke on proper encoding and content-type
definetions *cough*IE*cough*. Wikipedia has an article about how Unicode
and HTML work together at: http://en.wikipedia.org/wiki/Unicode_and_HTML

It might be instructive to point problem browsers at it to see how well
they render the browser support section. It's also a good page to grab the
source from and throw it into an editor to see if it really likes UTF-8 or
not.

At this point it should be painfully obvious why the motto when dealing
with multi-lingual web developement is sometimes "Down, not across".

-Ken (Who is *really* glad he doesn't earn a living as a codemonkey any more)