MIME encoding change in Python 2.4.3 (or 2.4.2? 2.4.1?) - problem and solution

I have an application that processes MIME messages. It reads a message from a file,
looks for a text/html and text/plain parts in it, performs some processing on these
parts, and outputs the new message.

Ever since I recently upgraded my Python to 2.4.3, the output messages started to
come out garbled, as a block of junk characters.

I traced the problem back to a few lines that were removed from the email package:
The new Python no longer encodes the payload when converting the MIME message to a

Since my program must work on several computers, each having a different version of
Python, I had to find a way to make it work correctly no matter if msg.as_string()
encodes the payload or not.

Here is a piece of code that demonstrates how to work around this problem:

................... code start ................
import email
import email.MIMEText
import email.Charset

def do_some_processing(s):
    """Return the input text or HTML string after processing it in some way."""
    # For the sake of this example, we only do some trivial processing.
    return s.replace('foo','bar')

msg = email.message_from_string(file('input_mime_msg','r').read())
utf8 = email.Charset.Charset('UTF-8')
for part in msg.walk():
    if part.is_multipart():
    if part.get_content_type() in ('text/plain','text/html'):
        s = part.get_payload(None, True) # True means decode the payload, which is normally base64-encoded.
        # s is now a sting containing just the text or html of the part, not encoded in any way.

        s = do_some_processing(s)

        # Starting with Python 2.4.3 or so, msg.as_string() no longer encodes the payload
        # according to the charset, so we have to do it ourselves here.
        # The trick is to create a message-part with 'x' as payload and see if it got
        # encoded or not.
        should_encode = (email.MIMEText.MIMEText('x', 'html', 'UTF-8').get_payload() != 'x')
        if should_encode:
            s = utf8.body_encode(s)

        part.set_payload(s, utf8)
        # The next two lines may be necessary if the original input message uses a different encoding
        # encoding than the one used in the email package. In that case we have to replace the
        # Content-Transfer-Encoding header to indicate the new encoding.
        del part['Content-Transfer-Encoding']
        part['Content-Transfer-Encoding'] = utf8.get_body_encoding()

................... code end ................

Hope this helps someone out there.
(Permission is hereby granted for anybody to use this piece of code for any purpose whatsoever)
nobody36 (141)
11/28/2006 1:10:57 AM
comp.lang.python 77058 articles. 6 followers. Post Follow

0 Replies

Similar Articles

[PageSpeed] 57