Google maps UTF-8 problem

Aug 24, 2006

A while ago I came across a problem with the google geocoder apparently returning Latin1 encoded characters rather than UTF-8. I posted an enquiry to the Google Maps API group but didn't get any responses.

Now I've had time to look at this in more detail and found how to fix it. From my investigations I found that:

  1. wget, curl and requests made with Python urllib2 all returned responses encoded in Latin1. Requests made with Firefox returned responses encoded in UTF-8.

  2. Regardless of the actual encoding returned, the XML always stated encoding="UTF-8".

  3. The Content-Type header in the HTTP response correctly gave the returned encoding (either UTF-8 or ISO-8859-1).

So it looked like this had something to do with the headers sent in the HTTP request. I used curl to play around with these and see if I could get a UTF-8 response. The obvious ones (e.g. Accept-Charset: utf-8) didn't work. But what did work was changing the User-agent header. So, if you want to ensure you get a UTF-8 response, pretend to be Firefox:
curl -H'User-Agent: Mozilla/5.0' ''

All this means that you can now search for cologne on and it will display Köln rather than K�ln.