f



PHP - using mail() and unicode text - text gets disturbed

I have the following problem. On a website there's a (simple) feedback
form. This is used also by Polish visitors who (of course) type Polish
text using special characters.

However, when I receive the text in my mailbox, all special characters
have been turned into mess......

For example: "wsp�lprace" is turned into "współprace".

It seems PHP is handling the Unicode-8 strings quite well (when I
'echo' the strings on the site, I see the text correctly), until the
point that it is send by using mail().

Is this a server configuration issue? Or something else?

How can I get my text to remain in Unicode?

I have this problem both on my testserver (Apache 1.3.28, PHP 4.3.2 on
Windows XP) as on my providers server (Apache under Linux).


Hope anybody can help.

Many thanks,


Edo.
0
ezouwen (5)
2/1/2004 2:44:52 PM
comp.lang.php 32646 articles. 0 followers. Post Follow

6 Replies
1360 Views

Similar Articles

[PageSpeed] 11

> For example: "wsp�lprace" is turned into "wspó�,prace".
>
> It seems PHP is handling the Unicode-8 strings quite well

are you setting up the headers of the email to state something such as

Content-Type: text/html;charset=iso-8859-15


0
2/1/2004 3:33:30 PM
It's an encoding issue. One way to deal with this is to escape the UTF-8
text using imap_8bit() and set the charset in the email header to UTF-8.
Many email clients don't handle this correctly though. I would recommend
sending multipart mails. In the plaintext part, remove the accent marks
(solidarnos'c' -> solidarnosc). In the HTML part, encoding the special
characters as HTML entities (doka,d => dokąd). This will ensure that
everyone see something that's readable. The same strategy is used by Outlook
Express. It'll be helpful if you send yourself a test email and look at the
source.

Here are a couple functions that do what I suggested:

$pl_markless_tr = array(
"\xC4\x85" => "a",
"\xC4\x87" => "c",
"\xC4\x99" => "e",
"\xC5\x82" => "l",
"\xC5\x84" => "n",
"\xC5\x9b" => "s",
"\xC5\xba" => "z",
"\xC5\xbc" => "z");

$pl_uni_entities_tr = array(
"\xC4\x85" => "ą",
"\xC4\x87" => "ć",
"\xC4\x99" => "ę",
"\xC5\x82" => "ł",
"\xC5\x84" => "ń",
"\xC5\x9b" => "ś",
"\xC5\xba" => "ź",
"\xC5\xbc" => "ż");

function remove_polish_marks($s) {
 global $pl_markless_tr;
 return strtr($s, $pl_markless_tr);
}

function escape_polish_marks($s) {
 global $pl_uni_entities_tr;
 return strtr($s, $pl_uni_entities_tr);
}


Uzytkownik "Edo van der Zouwen"
<ezouwen@dithiervoorisdomainenhetisbijdemonkennerswetenwattedoen.nl> napisal
w wiadomosci news:jm3q10dkg5ssdfoj4g5paa7nnu85j3pub4@4ax.com...
> I have the following problem. On a website there's a (simple) feedback
> form. This is used also by Polish visitors who (of course) type Polish
> text using special characters.
>
> However, when I receive the text in my mailbox, all special characters
> have been turned into mess......
>
> For example: "wsp�lprace" is turned into "współprace".
>
> It seems PHP is handling the Unicode-8 strings quite well (when I
> 'echo' the strings on the site, I see the text correctly), until the
> point that it is send by using mail().
>
> Is this a server configuration issue? Or something else?
>
> How can I get my text to remain in Unicode?
>
> I have this problem both on my testserver (Apache 1.3.28, PHP 4.3.2 on
> Windows XP) as on my providers server (Apache under Linux).
>
>
> Hope anybody can help.
>
> Many thanks,
>
>
> Edo.


0
chernyshevsky (2297)
2/1/2004 5:06:26 PM
On Sun, 1 Feb 2004 15:33:30 -0000, "Filth" <p.macdonald@blueyonder.co.uk>
wrote:

>> For example: "wsp�lprace" is turned into "wspó�,prace".
>>
>> It seems PHP is handling the Unicode-8 strings quite well
>
>are you setting up the headers of the email to state something such as
>
>Content-Type: text/html;charset=iso-8859-15

Content-Type: text/plain;charset=utf-8

 ... sounds like the more appropriate header to send in this case.
 
-- 
Andy Hassall <andy@andyh.co.uk> / Space: disk usage analysis tool
<http://www.andyh.co.uk> / <http://www.andyhsoftware.co.uk/space>
0
andy171 (2271)
2/1/2004 6:20:19 PM
On Sun, 1 Feb 2004 15:33:30 -0000, "Filth"
<p.macdonald@blueyonder.co.uk> wrote:

>
>> For example: "wsp�lprace" is turned into "wspó�,prace".
>>
>> It seems PHP is handling the Unicode-8 strings quite well
>
>are you setting up the headers of the email to state something such as
>
>Content-Type: text/html;charset=iso-8859-15
>


Thanks, this did the trick, except the header should contain:

"Content-Type: text/html; charset=UNICODE-1-1-UTF-8"

Cheers,


Edo.
0
ezouwen (5)
2/1/2004 7:28:54 PM
On Sun, 1 Feb 2004 12:06:26 -0500, "Chung Leong"
<chernyshevsky@hotmail.com> wrote:

>It's an encoding issue. One way to deal with this is to escape the UTF-8
>text using imap_8bit() and set the charset in the email header to UTF-8.
>Many email clients don't handle this correctly though. I would recommend
>sending multipart mails. In the plaintext part, remove the accent marks
>(solidarnos'c' -> solidarnosc). In the HTML part, encoding the special
>characters as HTML entities (doka,d => dok&#261;d). This will ensure that
>everyone see something that's readable. The same strategy is used by Outlook
>Express. It'll be helpful if you send yourself a test email and look at the
>source.
>
>Here are a couple functions that do what I suggested:
>
>$pl_markless_tr = array(
>"\xC4\x85" => "a",
>"\xC4\x87" => "c",
>"\xC4\x99" => "e",
>"\xC5\x82" => "l",
>"\xC5\x84" => "n",
>"\xC5\x9b" => "s",
>"\xC5\xba" => "z",
>"\xC5\xbc" => "z");
>
>$pl_uni_entities_tr = array(
>"\xC4\x85" => "&#261;",
>"\xC4\x87" => "&#263;",
>"\xC4\x99" => "&#281;",
>"\xC5\x82" => "&#322;",
>"\xC5\x84" => "&#324;",
>"\xC5\x9b" => "&#347;",
>"\xC5\xba" => "&#378;",
>"\xC5\xbc" => "&#380;");
>
>function remove_polish_marks($s) {
> global $pl_markless_tr;
> return strtr($s, $pl_markless_tr);
>}
>
>function escape_polish_marks($s) {
> global $pl_uni_entities_tr;
> return strtr($s, $pl_uni_entities_tr);
>}
>
>

Thanks, very interesting method. For the time being, the email client
used by the receiver of the webforms is capable of handling the
unicode text, so I'll stick to just using a header which enables
Unicode text.

However, I'll definiately save and check your method, might be very
useful in the future.

Dziekuje i do wiedzenia :-)


Edo.
0
ezouwen (5)
2/1/2004 7:55:43 PM
On Sun, 01 Feb 2004 18:20:19 +0000, Andy Hassall <andy@andyh.co.uk>
wrote:

>
>Content-Type: text/plain;charset=utf-8
>
> ... sounds like the more appropriate header to send in this case.
> 

Thx, found that out myself, but appreciate your input.

Edo.
0
ezouwen (5)
2/1/2004 8:08:27 PM
Reply: