2014-11-08

PHP 5.6 Mojibake


The three little piggies have shown their ugly faces and bit me again. The three little piggies are the swedish national characters ÅÄÖ. The three letters with the ring and the dots are more than cute IKEA-dots as I once had them described to me, they are vowels in their own rights. Å is pronounced as the vowel sound in the english word ‘your’, Ä  as the vowel sound in ‘bear’ and Ö as in ‘sir’. These three letters has haunted me in my entire professional career. from EBCDIC in the IBM mainframes and now lastly in PHP 5.6. PHP now has a default encoding setting and this is UTF-8 by default. This is in itself is a good thing, prior to 5.6 character encoding was a mishmash of ISO-8859-1 and UTF-8. I knew of the change, but I didn’t pay attention to it and didn’t test it. I did not even ask someone else to test it. I should have known better. All the Ås, the Äs and the Ös was distorted when we made the switch to 5.6. In our Nantes plant they had managed to put in Äs in some part numbers (material number in SAP english), this caused some  ‘disturbance’ in BOM structures and some other places. How the french guys had managed to use the swedish Ä in part/material numbers is a mystery to me, it might just have been some double encoding error, I do not know.
The remedy for this mistranslation  is simple, just set default_charset=’ISO-8859-1’, but I had not tested it, so I sat the old ICONV( ISO-8859-1, ISO-8859-1, ISO-8859-1) instead, although deprecated in PHP 5.6. I had tested ICONV before and was pretty sure it would work. The right solution is of course to figure out what the problems with UTF-8 are and fix them. Since encoding now is addressed in a clear way in PHP,  it should be simpler to attack the problem, simple it will never be but simpler than before version 5.6.

No comments:

Post a Comment