Replace UTF-8 char sequences with ANSI chars

Replace UTF-8 char sequences with ANSI chars

6
NewbieNewbie
6

    Sep 21, 2007#1

    I want to do the following as part of a Macro, so it wouldn't be good if I should switch in HEX-mode therefor.

    I want to search for "Ä" (=C3 84 hex) and want replace it with "Ä". But Find MatchCase "Ä" can't find these characters, so it's maybe better to search for the exact HEX-code, but how?

    262
    MasterMaster
    262

      Sep 21, 2007#2

      I looks like you have pasted a text in UTF8 format into a non-UTF8 document.

      How about just doing a conversion:

      File - Conversions - UTF8-to-ASCII

      6
      NewbieNewbie
      6

        Sep 21, 2007#3

        No, it's not that simple. The text is in ASCII format, but just only this character was wrong converted by the source of my file. other non-ASCII chars are correct in my file (ö,ü,etc).

        And the Conversion UTF8-to-ASCII didn't solve this problem.

        262
        MasterMaster
        262

          Sep 21, 2007#4

          Strange - if I entered "Ä" (=C3 84 hex) in a ASCII file and used convert, it worked.

          Ok
          - switch to hex mode: Ctrl+H
          - then do a replace with Ctrl-R. Search for C383 and replace C4.
          - switch back with Ctrl-H

          Record it as a macro if you have to do this often.

          InsertMode
          ColumnModeOff
          HexOff
          UnixReOff
          HexOn
          Find "C384"
          Replace All "C4"
          HexOff


          (and just for the record: I'm puzzled why I can't find "Ä" in a normal find. And I can't even find "Ä" - strange, what point am I missing ?)

          6,675585
          Grand MasterGrand Master
          6,675585

            Sep 21, 2007#5

            I have had the same problem in one of my macros which cleans up the site statistic pages created with Advanced Web Statistics (AWStats) v6.4 to be able to use it offline. The 2 pages with the keywords and the keyphrases of the HTML files created by AWStats contains the UTF-8 characters of the search engines as is and not correct decoded. The macro is designed to correct this too.

            With v12.00 of UE the simple search and replace of the UTF-8 characters by the HTML entities did not work anymore, because since v12.00 UltraEdit can also search in Unicode. So when in a macro a search for "Ä" (=C3 84 hex) is coded, UE v12+ reads it as search for 'Ä'.

            The solution is to encode the UTF-8 characters which should be replaced also in Unicode = search for the Unicode characters always in Unicode. To be downwards compatible I have done this with a regular expression replace in UltraEdit syntax with a simple OR argument. Here is this part of my macro. The characters in the left braces are for UE v12+ and the characters in the right braces for UE up to v11.20b.

            UnixReOff
            Find MatchCase RegExp "^{ß^}^{ß^}"
            Replace All "ß"
            Find MatchCase RegExp "^{ü^}^{ü^}"
            Replace All "ü"
            Find MatchCase RegExp "^{Ãœ^}^{Ü^}"
            Replace All "Ü"
            Find MatchCase RegExp "^{ö^}^{ö^}"
            Replace All "ö"
            Find MatchCase RegExp "^{Ö^}^{Ö^}"
            Replace All "Ö"
            Find MatchCase RegExp "^{ä^}^{ä^}"
            Replace All "ä"
            Find MatchCase RegExp "^{Ä^}^{Ä^}"
            Replace All "Ä"
            Find MatchCase RegExp "^{á^}^{á^}"
            Replace All "á"
            Find MatchCase RegExp "^{à^}^{à^}"
            Replace All "à"
            Find MatchCase RegExp "^{´^}^{´^}"
            Replace All "´"

            Add UnixReOn or PerlReOn (v12+ of UE) at the end of the macro if you do not use UltraEdit style regular expressions by default - see search configuration. Macro command UnixReOff sets the regular expression option to UltraEdit style.
            Best regards from an UC/UE/UES for Windows user from Austria

            6
            NewbieNewbie
            6

              Sep 21, 2007#6

              Thanks Mofi. Is there anything which can't be done with UE :wink: ?

              About the characters in the braces: Is it possible to define these "strange" characters by their UTF-8 code instead? For example {#C384#C39C} It's easier to read and type the numbers instead of the "strange" characters.

              6,675585
              Grand MasterGrand Master
              6,675585

                Sep 21, 2007#7

                reinim19 wrote:Is it possible to define these "strange" characters by their UTF-8 code instead?
                No, that is not possible. "#C384#C39C" inside the macro would result in searching for exactly that string.

                I have encoded the UTF-8 characters as follows:
                1. I have unchecked at Configuration - File Handling - Unicode/UTF-8 Detection the setting Auto detect UTF-8 files and also the 2 UTF-8 settings at File Handling - Save.
                2. Then I have opened a new ASCII/ANSI file and entered 1 ANSI character per line which I wanted to replace later with the macro and saved that ANSI file.
                3. Next I have used File - Conversions - ASCII to UTF-8 and saved this file too with a new name.
                4. I have closed that UTF-8 file and opened it again via the recent file list. Because UTF-8 detection is disabled, UltraEdit has loaded it as ASCII file.
                5. I have used again File - Conversions - ASCII to UTF-8 to convert the UTF-8 codes to Unicode and saved and closed that Unicode encoded UTF-8 file again.
                6. Because UTF-8 detection is still disabled, opening the file again via recent file list has resulted again in loading and displaying it as ASCII.
                7. Last I have changed the 3 configuration settings back to what I normally need.
                Now I have had the characters for the edit macro dialog. And with the first saved file with the ANSI characters which is the reference between ANSI character and Unicode encoded UTF-8 characters I could write the macro.
                Best regards from an UC/UE/UES for Windows user from Austria