Loading...
 
Skip to main content

History: CharacterEncodingTrouble

Preview of version: 2

All the data in Tiki are supposed encoding with UTF-8

If you are looking at this site and you see some strange character when you choose a non english language, it is perhaps that your browser is interpreting the UTF-8 encoded output as Latin-1 which is the default in Western Europe. (the characters look like 'æ' for 'æ', 'ø' for 'ø', and 'Ã¥' for 'å')

The UTF-8 encoding is backwards compatible with ASCII, byte for byte, but not with anything else. The Universal Character Set defined by Unicode is a superset of Latin-1 which is a superset of ASCII, but when characters from UCS are encoded in UTF-8, then everything except for plain 7-bit ASCII end up as between two and six bytes. That's why 'æ', 'ø', and 'å' turns into two-letter combinations when an UTF-8 encoded text is viewed as Latin-1.

If Tiki on your server doesn't look fine:
Some server (like with the default Debian config) adds a charset=iso-8859-1 to the Content-Type header. The browser (ex: Mozilla) first looks for a charset value in the Content-Type header and then for META tags, so the header overrides the META tag inserted by TikiWiki.

History

Information Version
Marc Laporte 12
View
mlpvolt 11
View
Marc Laporte removing this page from the categories, only useful for historical reasons 10
View
Florian Gleixner 9
View
boud 8
View
boud 7
View
Marc Laporte adding to Install category 6
View
Soron_12F 5
View
Martin Geisler Added link to the Unicode FAQ and expanded a bit. 4
View
sylvie greverend 3
View
sylvie greverend 2
View