peteg's blog - cs - 2007 09 15 UnicodeNumerals

Unicode Numerals.

/cs | Link

OK, take a deep breath. Look at this page and tell me the world isn't crazy.

Say you want to talk to the world in Unicode, but you want to do it quickly. Well, obviously you're going to draft C's atoi and friends to convert numerals to your internal integer type, right? That's great in theory, but when your code is running on someone else's webserver that you know little about, things might get a little tricky.

Haskell's FFI specifies that the functions in the CString module are subject to the current locale, which renders them unpredictable on the hitherto mentioned webserver. I can imagine a numeral encoding that e.g. strtol_l understands with the locale setting of today that it fails to understand tomorrow. I don't think there are enough manpages in all the world to clarify this problem.

Solution? Use integers only for internal purposes, like user identifiers, render them in ASCII, and use Unicode strings for everything else. Don't use the CString module, carefully unpack UTF-8 ByteStrings into Haskell Strings, and don't expect warp speed. If you're (cough) putting this stuff in a library, hope like hell your users don't try anything too weird.

One day someone will resolve all the issues of implementing a proper Unicode I/O layer, and I will thank them for it.