Hi,
I am trying to aggregate content on a website into a database, and
getting severe encoding troubles with the mdash character (—
U+2014) as well as a bullet point (•, U+2022) and probably other
special characters too.
The remote website declares its charset as ISO-8859-1, and when viewing
it as such in the browser, I can see the — and • characters
just fine. When looking at aggregated content (HTTP via fsock_open) on
my own website, which declares UTF-8, of course the characters do not
display correctly.
Leaving aside the database for later, first I wanted to convert the
string such that it would display properly on my UTF-8 website. I
assumed this would be done with
$data = iconv('ISO-8859-1', 'UTF-8', $data);
However, the converted content will not display properly either, so it's
clear I need some more advice.
To avoid ambiguity or encoding troubles, I am showing all the characters
in base 64 encoding.
The character that the remote website sends is "lw==" in base 64.
Converted with the above iconv() command, it becomes "wpc=".
When I copy-paste the rendered character into a PHP script and encode
that, it becomes "4oCU". Not sure which encoding that is.
How should I approach this problem? Thanks
--
Christoph Burschka

