Hexadecimal search with Perl regex partly not working anymore for ANSI encoded file in UE v24.xx and v25.00 (fixed)

Hexadecimal search with Perl regex partly not working anymore for ANSI encoded file in UE v24.xx and v25.00 (fixed)

32
Basic UserBasic User
32

    Apr 02, 2018#1

    I've been using Perl hex searches in several of my macros for many years to find and fix funky characters (like left and right quotes) that don't always play nicely in HTML. But these searches are now failing in UE (version 25.00.0.58). Even doing a manual search is failing. Anybody know why? I have supplied 2 screen snapshots to show the problem.

    hexfind_error01.png (70.61KiB)
    hexfind_error02.png (74.72KiB)

    6,603548
    Grand MasterGrand Master
    6,603548

      Apr 02, 2018#2

      I could reproduce this issue and found lots of other issues on analyzing it.

      UltraEdit for Windows is a full Unicode aware application since v24.00. And it looks like it converts also ANSI encoded files with Windows-1252 encoding to Unicode in memory since v24.00.

      Following happens on positioning the caret left to character and pressing Alt+RETURN to get the Character Properties displayed for this character.
      • UE v25.00.0.58 displays as hexadecimal value 0x201c instead of 0x93.
        0x201c is the correct Unicode code value while 0x93 is the correct Windows-1252 code value of this character.
      • UltraEdit v24.00.0.43 to v24.20.0.62 display in Character Properties for this character wrong 0x1c.
        This is just the low byte of the Unicode code value of this character.
      • Older UltraEdit for Windows versions up to v23.20.0.43 display in Character Properties correct 0x93 for stored in a Windows-1252 encoded file.
      In my opinion it is wrong that Character Properties displays the decimal and hexadecimal Unicode code value of the character as hold in memory instead of the code values of this character as stored in Windows-1252 encoded file. I expect to get displayed the code values of the Windows-1252 encoded character even if the character is hold in memory currently as Unicode character.

      Running a Perl regular expression Find or Replace on current file with \x93 as search string works for UE < 24.00, but does not find Windows-1252 encoded character with UE v24.00.0.43 to UE v25.00.0.58. It is necessary with those full Unicode aware versions of UltraEdit to search with \x{201c} to find/replace this character in Windows-1252 encoded file using Find or Replace. That is in my opinion not correct as I would expect to search for single byte encoded character with the code value as stored in file and not as hold in memory.

      There is one more issue. A Perl regular expression Find in Files or Replace in Files with search string \x93 with option In set to Open files finds the character in opened Windows-1252 encoded file on using UltraEdit for Windows < v25.00. Yes, a Perl regular expression Find/Replace in Files works with a hexadecimal search in open files works different to Find/Replace with same options in UE v24.xx. But it is necessary to use Perl regular expression search string \x{201c} in Find in Files or Replace in Files with option In set to Open files with the up to now public released builds 25.00.0.53 and 25.00.0.58 of UE version 25.00. (I have not run Find/Replace in Files on Windows-1252 encoded files not opened in UltraEdit.)

      All these issues are extremely confusing for users to whom I belong too. I expect to get displayed in Character Properties the code value of a non Unicode encoded character as stored in file and not as hold in memory. And I expect to run hexadecimal Perl regular expression finds and replaces with the code values of the characters as stored in file and not as hold in memory.

      My suggestion is to use in your macros at the moment \x{201c} to find and \x{201d} to find and \x{201e} to find or use in macros simple the characters as macros are since UE v24.00 also full Unicode aware.

      I have just reported all these issues by email to IDM support.
      Best regards from an UC/UE/UES for Windows user from Austria

      32
      Basic UserBasic User
      32

        Apr 02, 2018#3

        Thanks very much for your detailed answer and very useful workarounds. I thought it might have something to do with the new character handling that UE does, but I don't have the expertise to analyze it further. I will make your suggested changes to my macros and then keep an eye on what UE does going forward. :)

        6,603548
        Grand MasterGrand Master
        6,603548

          Jul 04, 2018#4

          This issue is fixed with UltraEdit for Windows v25.10.0.50 and with UEStudio v18.10.0.08.

          In an opened ANSI encoded file it is possible now to search for either ANSI or Unicode hexadecimal code value using a Perl regular expression search, i.e. the left double quotation mark stored in a Windows-1252 encoded file opened in UltraEdit or UEStudio can be found with either Perl regular expression search string \x93 or \x{201c}.

          The same character in a Unicode file (UTF-8 or UTF-16 LE or UTF-16 BE) can be found only with searching for Unicode hexadecimal code value. It is not possible to search in an opened Unicode encoded file for the byte stream of the character.

          It is still possible with Find/replace in files to search for the byte stream of a character in UTF-8, UTF-16 LE or UTF-16 BE encoded files, i.e. searching for \xe2\x80\x9c (UTF-8), \x1c\x20 (UTF-16 LE), \x20\x1c (UTF-16 BE).
          Best regards from an UC/UE/UES for Windows user from Austria