Logo Background

Email Message Encoding

  • If the text of an email message which I am composing contains diacritical marks, or employs a non-Latin alphabet, the recipient of the message often will get something that is, at least partially, garbage with all sorts of odd characters (such as squares, dollar signs, etc.) appearing in the place of the diacritical characters. These undesired transformations can also happen when an unfinished message is saved as a draft and later recovered using my ISPs webmail interface. I am interested in knowing why this happens and how it can be avoided.

    The reason that the messages are coming through garbled is because the receiving email client does not know that the message has been written in a different language and is trying to render the message as it thinks should be correct. When an email message is composed and sent, it has been written using a particular character encoding. For example, if you are using Thunderbird then when composing a new email message you can go to the “Options” menu > “Character Encoding” to set the encoding of your message. Every character (i.e. letter, number, special character, etc.) has an associated code associated with that character which is how that character is stored. The encoding provides a mapping from the character to a character code. For example, the character A is encoded as 0×41 in the Unicode (UTF-8) encoding. Therefore, if you send an email message to someone and use the character A so long as their email client supports Unicode (UTF-8) encoding, which is very likely since this is probably one of the most common encodings, it knows that the 0×41 code should be mapped to a capital A and thus displays the capital A character. However, if the receiver does not support (UTF-8) then you will get unpredictable results, as it will try to render the 0×41 character using its default encoding which may map 0×41 to a completely different character, thus displaying that different character on the screen (such as an upside-down question mark or a square box character). This is not so common when you are just using standard characters such as A-Z and a-z since they are fairly universal, but when you start to use special characters from different languages (with diacritical marks) you may start to see problems.

    Therefore, depending on the language you are writing, ensure that you are composing the message in the appropriate encoding. There are plenty of different encodings available which are specific to languages. However, you should aim to use an encoding which has the characters you require while also being relatively common, so there is a good chance that people to whom you send emails will be able to read the messages because they have that encoding installed on their computer by default. For example, one of the default character encodings in the US English version of Thunderbird is Western (ISO-8859-1). A quick search of the internet produced the following Wikipedia page: en.wikipedia.org/wiki/ISO/IEC_8859-1 which indicates that this encoding supports multiple different languages. It would be worthwhile ensuring that you are using an appropriate encoding for the message and that the recipients of the message also support that encoding. If the recipients do not support the encoding, then you should try to find a more general encoding which they do support. A good place to start could be the Unicode UTF encodings of which there are several variants, including UTF-8 and UTF-16.

    Your question also mentioned that the problem occurs when you save a draft email through webmail and then later retrieve the message. It is likely that this is occurring because the database which is running the webmail system does not support the character encoding being used to compose the message, so when the message is being retrieved it returns different characters for certain character codes. Unfortunately in this situation there is not much you can do except contact your ISP to see whether they know of any resolution for the problem.

Leave a Comment
Hi there. If this is the first time you are posting a comment it will not appear immediately, but needs to be approved. This is necessary in order to combat comment spam. However, once you have submitted a comment (which is subsequently approved) you do not need to go through this process again - the site remembers who you are and auto-approves your comments. Nifty eh? Anyway, sorry about the inconvenience that this may cause for your first comment post.