Tuesday, January 02, 2007

Are you confused by character encodings?

I am. Being a native English speaker leads to a number of pathologies, the the most obvious one is our inability to speak foreign languages. What's the point when everyone speaks English? I only learnt to speak Japanese because I worked in a Japanese school for the JET programme for two years, before that I'd been a language dunce. Three years of secondary school French lessons had left me being only able to order a coffee, tell you my name and complain about the weather. There's a similar tendency with English speaking programmers, a total lack of knowledge about character encodings. After learning about ASCII as a boy I really haven't progressed at all beyond thinking that each character is a byte with all the important ones between 32 and 127. I have a vague awareness of Unicode and other things like UTF-8, but I don't really know what they mean in technical terms. If you're like me, it's well worth reading Joel Spolsky's excellent post on his Joel On Software blog where he has a brief potted history of character encodings and what every programmer should know about them. Joel On Software should be required reading for anyone working in the IT world, not just programmers, so long as you pass his stuff through the coding horror filter :)

No comments: