Problems with special characters like ü, ö

Problems with special characters like ü, ö

5

    Aug 06, 2005#1

    Hi together.

    I am trying to do a search/replace on special characters.
    I have some HTML files where I want to replace "ü" with "ü", but it doesn't find any "ü" in the files.

    I tried to replace "ü" for test in jpegs, no problem, it found some. But not in HTML files.

    Where could be the problem?

    6,687587
    Grand MasterGrand Master
    6,687587

      Aug 07, 2005#2

      I guess, the German umlauts are already coded as html entities in the html source text. Look for ö (= ö), ü (= ü), Ö (= Ö), ...

      The umlauts are coded different if they are part of a URL. For example Ü = %C3%9C and ü = %C3%BC. You should never use German umlauts or ß in file names - NEVER!

      Check also if the HTML files are in Unicode - see status bar at bottom of UltraEdit window - or if the umlauts are coded in HTML decimal or hexadecimal which is also possible.
      Best regards from an UC/UE/UES for Windows user from Austria

      5

        Aug 07, 2005#3

        Hi to Austria!

        Thanks for the reply. The pages for pictures are designed with Arles image webpage creator. Other pages are designed with Frontpage. When I now type e.g. a "ü" in Frontpage, it will be coded right as "ü" in Arles, it writes the "ü" also in the code. That is the problem. I didn't find any options in Arles that it codes it in Unicode.

        So I have to change all the umlauts and now I am trying to do that with UltraEdit but with no success.

        I never had such problems in the past. I have now a vserver @ server4you.de on which this problem appears with the page. With all the other webpage services I never had such problems.

        A code snippet example:

        Code: Select all

        <div align="center"> 
        <a href="../index2.html"><img src="../images/cimg0319.jpg" alt="<- zurück" title="<- zurück" width="800" height="600" border="0"></a> 
        </div>
        Here is written "zurück" from Arles webpage creator. But UltraEdit doesn't find this "ü".

        Greetings from Germany.

        6,687587
        Grand MasterGrand Master
        6,687587

          Aug 07, 2005#4

          Which hexadecimal code has the "ü" character in "zurück"?

          Set the cursor to the "ü" and switch to hex mode to see the hex code for it.

          If it is an ANSI "ü", then it should have the hex code FC and 00 FC if it is Unicode. UltraEdit should find this without any problem. An OEM "ü" (= old DOS) has hexadecimal code 81.

          However, to replace all these umlauts from Arles webpage creator do following:

          Select one "ü" which is not found.

          Open Replace (if all HTML files are open in UE) or Replace In Files.

          The selected string - ü - is automatically already specified in the find field independent on the hexadecimal code value.

          Now specify the replace string and set the other options correct and run the replace.

          Redo this procedure for all other umlauts.

          This should work, hopefully.
          Best regards from an UC/UE/UES for Windows user from Austria

          5

            Aug 07, 2005#5

            And here is my problem:

            Search/replace for "ü" doesn't give me any results.

            I have about 8000 HTML files. So I'm not able to open them all in UltraEdit.

            6,687587
            Grand MasterGrand Master
            6,687587

              Aug 08, 2005#6

              Is one of these files anywhere at WWW or can you zip one and upload it anywhere. I have done replaces for umlauts in HTML very often, last time for 2 weeks and it always worked (standard DOS or UNIX HTML files, no Unicode, no UTF-8). So it must be a special file format problem and I can only help further if I can look into an unmodified original source file.
              Best regards from an UC/UE/UES for Windows user from Austria

              5

                Aug 08, 2005#7

                They're normal HTML files but none is online at the moment. I will upload them later when I am back at home.

                I now replaced the umlauts with Dreamweaver. I didn't know that there's a good search/replace tool inside. Now all the pages are all right.

                  Aug 08, 2005#8

                  Hi, it's me again.

                  I uploaded a part of the page, you can find it under: http://www.danijel-brncic.de/test/

                  There's an index.html and under in the folder http://www.danijel-brncic.de/test/pages/ you can find the files image1.html to image30.html .

                  In the pages you'll find e.g. the word "zurück".

                  Thanks danijel.

                  6,687587
                  Grand MasterGrand Master
                  6,687587

                    Aug 09, 2005#9

                    OK. Now after looking into the code, I could see the problem and it was like expected. Your files are encoded in UTF-8 format with Unix line endings which is also correct specified in your files with <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> .

                    So when you open your HTML files and auto detect UTF-8 is enabled in the general configuration dialog, you will see the "ü" in "zurück" correctly in UltraEdit, although it is coded in the file as character with two bytes with the hexadecimal code values C3 BC.

                    But for seeing that the umlaut is encoded with these hexadecimal code values, you have to disable first the auto detect UTF-8 feature, then open the HTML file, go to "zurück" and switch to hex edit mode.

                    Because the Replace In Files has no detection of file format, it will never find an "ü" entered in the find field (with hexadecimal code value FC). To do the replace with UltraEdit, you would have to search for "ü" instead of "ü".

                    However, you have now done the replaces with Dreamweaver, so this is just for info for you and maybe for other users in future with the same problem.

                    For all users a small table for UTF-8 to ANSI to OEM conversion table for all special German characters.

                    Code: Select all

                    ANSI: ä | Ä | ö | Ö | ü | Ü | ß --- hex: E4 | C4 | F6 | D6 | FC | DC | DF
                    OEM:  „ | Ž | ” | ™ |  | š | á --- hex: 84 | 8E | 94 | 99 | 81 | 9A | E1
                    UTF8: ä | Ä | ö | Ö | ü | Ü | ß --- hex: C3 A4 | C3 84 | C3 B6 | C3 96 | C3 BC | C3 9C | C3 9F
                    Best regards from an UC/UE/UES for Windows user from Austria