Okay, I could see the issue. The text/character encoding is not right.
Do you know anything about
text/character encoding?
No, you should read power tip
Working with Unicode in UltraEdit/UEStudio and
this post and all other pages referenced in power tip and the post. It is inexcusable for everyone editing HTML, XHTML and XML files in a text editor not knowing what character encoding is, how it works and what the meta tag with
charset=utf-8 in header of an HTML/XHTML file really means. At least read very careful the text below explaining what happened on not knowing anything about text/character encoding.
The text with
mother´s was originally
non Unicode encoded with most likely code page
Windows-1252. The
right single quotation mark is encoded in Windows-1252 with just a single byte (8-bit) with hexadecimal value
92 (decimal 146 or binary 1001 0010).
I think, the HTML header was added next with the line:
<meta http-equiv="Content-Type
" content="text/html; charset=
utf-8" />
This line declares the file as Unicode encoded with
UTF-8. But the encoding of original text was not converted from Windows-1252 to UTF-8 and so the text inside the XHTML file was in real still Windows-1252 encoded although declared as UTF-8 encoded. That was the main mistake on creation of the file.
Then the file declared as UTF-8 encoded, but being in real Windows-1252 encoded, was opened in UltraEdit which recognized UTF-8 encoding declaration and interpreted the file according to UTF-8. On next save this results in converting the byte with hexadecimal value
92 now really correct UTF-8 encoded with the two bytes with hexadecimal values
C2 92 and became the character
private use two. This character is not supported by most fonts resulting in getting it displayed with default glyph for non supported characters of the used font. This can be no glyph, or a rectangle or a thin line like
-. It depends on the font how the not supported character is displayed on screen. This is correct as
private use characters are code points whose interpretation is not specified by a character encoding standard.
The real procedure necessary to get text encoded in Windows-1252 correct into an XHTML file being UTF-8 encoded is as follows:
- The text file with Windows-1252 encoded text is opened as ASCII/ANSI file with code page 1252 in UltraEdit.
- Then the file is converted from ASCII/ANSI to UTF-8.
This step would modify character ´ encoded with a single byte (8-bit) with value 92 to Unicode with hexadecimal code point value 201A (two bytes respectively 16-bit) in memory of UltraEdit. This can be seen in UltraEdit on positioning the caret left to the character and executing command Character Properties before and once more after conversion to UTF-8.
- Next the HTML header and footer could be inserted into now really Unicode encoded file with the character set declaration as posted above.
- UltraEdit runs on saving the Unicode file the 8-bit Unicode transformation format procedure resulting in getting character ´ stored in the file with three bytes with hexadecimal values E2 80 99.
I really recommend to run this procedure on each file containing a character in Windows-1252 encoding with a value greater 127 decimal, i.e. with a hexadecimal value in range
80 to
FF. Run a
Find in Files with
Perl regular expression [\x080-\xFF] on original Windows-1252 encoded files using option
Open matching files. Convert each opened file to UTF-8 encoding, add the XHTML header and footer and other HTML tags on each file and save the files (without UTF-8 BOM) now with text being really UTF-8 encoded as declared in header.
See also the script
ConvertFilesToUtf8 which is most likely a big help on converting all the Windows-1252 encoded files to UTF-8.
It would be possible to run a
Perl regular expression Replace in Files with search string
\xC2\x92 and replace string
\xE2\x80\x99 to correct all occurrences of UTF-8 encoded
private use two by UTF-8 encoded right single quotation mark. But that quick fix of this character does not help on the other non ASCII characters in the wrong encoded XHTML files like the character being displayed as question mark after
mother´s. I am quite sure that in original Windows-1252 encoded text this character was not a question mark. A question mark is the result of a character which could not be correct converted from one text encoding to another text encoding.