Python and Locales

If you are using utf-8 documents in Python, you may occasionally run into this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 124106: ordinal not in range(128)

The fix is trivial!

First, you need to find out which UTF-8 languages you have installed. In this example, I grep for en as I want the English packages.

locale -a | grep utf8 | grep en
en_AG.utf8
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IN.utf8
en_NG.utf8
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZW.utf8

 

Out of these, I want to choose en_US.utf8

Now all I need to do is export the appropriate environment variable:

export LC_CTYPE=en_US.UTF-8

And the code will just work*

*With Python 3. It may not work with Python 2. Yet another reason to change!