[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lojban-beginners] Re: lojban-beginners Digest V6 #83
> I don't know what's right, unless perhaps just using something like
> UTF-8 is suitable. I don't know enough to know how to correctly label
> something that uses (say) 8859-1 for most content but also includes
> &#...; escapes for characters not from 8859-1 - maybe there is a
> correct way to label it, or maybe such a thing is inherently a spec
> violation. (Perhaps labeling it 8859-1 is right, even, in which case
> the only thing wrong was your wording above, implying that the
> non-8859-1 characters were part of 8859-1.)
> Surely there's a Web maven here who can say?
Ask and ye shall recieve.
I was looking at the archives to see if I wanted to join the list, and I
couldn't resist.
You're on the right track in thinking the UTF-8 is the only quasi-sane way
to mix Latin and CJK type characters.
The actual probems you are seeing with this document could be coming from
a number of directions, however. Here's the short list:
1. Improper formatting in the text editor making the document. You might
be able to mitigate this by throwing the document at a validator. The most
popular is the venerable W3C validator at: http://validator.w3.org/
If memory serves, the WDG validator checks for encoding issues. It's at:
http://htmlhelp.com/tools/validator/
You can also use Tidy to check the file locally: http://tidy.sourceforge.net/
2. The document could be getting subtly hosed by the mail client Martin
uses, or by our receiving clients. The archive says Martin's using IMP,
which is part of the web-based Horde platform.
There's a decent chance it's escaping the input in a funny way. It's
probably also mucking with the encoding. I'd recommend placing the file
somewhere on a server somewhere and passing out the link for people to
look at.
3. Certain browsers are known to choke on proper encoding and content-type
definetions *cough*IE*cough*. Wikipedia has an article about how Unicode
and HTML work together at: http://en.wikipedia.org/wiki/Unicode_and_HTML
It might be instructive to point problem browsers at it to see how well
they render the browser support section. It's also a good page to grab the
source from and throw it into an editor to see if it really likes UTF-8 or
not.
At this point it should be painfully obvious why the motto when dealing
with multi-lingual web developement is sometimes "Down, not across".
-Ken (Who is *really* glad he doesn't earn a living as a codemonkey any more)