How to find non-ASCII characters in a CSV file?

How to find non-ASCII characters in a CSV file?

14
Basic UserBasic User
14

    Dec 16, 2020#1

    Suppose the file contains:

    Code: Select all

    3085.34 3987.19 2.06 4354.89 -369.76
    ●15.35 19.84 0.01 21.67 -1.84
    5010.51 194252.54 8966.93 0.00▪ 185285.61
    24.93 966.60 44.62 0.00 921.98
    33867.97 15405.43 1156.39 0.00 14249.04
    ▪168.53 ▪76.66 5.75 0.00 70.91
    4714.18 6092.15 3.14 6653.96 -564.95
    23.46 30.31 0.02 33.11 -2.82
    9548.94 370202.92 17089.01 ●0.00 353113.91
    I want to find all the non-ASCII characters like  and .

    What should be the Perl regular expression search string to find these characters?

    11327
    MasterMaster
    11327

      Dec 16, 2020#2

      Find what:

      Code: Select all

      [[:^ascii:]]
      It's impossible to lead us astray for we don't care even to choose the way.

      14
      Basic UserBasic User
      14

        Dec 16, 2020#3

        Yup sir, it's working, but it's not highlighting all the string at once. It is highlighting one by one.

        11327
        MasterMaster
        11327

          Dec 16, 2020#4

          Yes, I have noted this also, seems a bug in UE 27.10.0.164?

          You can use List lines containing string or Bookmark matching lines
          It's impossible to lead us astray for we don't care even to choose the way.

          6,684586
          Grand MasterGrand Master
          6,684586

            Dec 16, 2020#5

            Fonts with no support for the Unicode characters BLACK CIRCLE and BLACK SMALL SQUARE display often these two characters like a HYPHEN-MINUS. That is a strong indication for me on looking on the data that all and should be in real -.

            So better than searching for non-ASCII characters would be in this case running a non-regular expression replace searching for and replacing all occurrences with - and running a second non-regular expression replace searching for and replacing all occurrences also with - or using a Perl regular expression replace searching for [\x{25cf}\x{25aa}] and replacing all occurrences with - to get them modified to minus signs.

            Then it would be a good idea to run a Perl regular expression replace all which is searching for -(?=0\.00\>)|(?<=\<0\.00)- and replacing all found occurrences by an empty string to remove the minus sign left or right to null values (0.00).

              Dec 16, 2020#6

              Ovg, the search expression [[:^ascii:]] works to find non-ASCII characters, although this expression is not really correct. Correct would be the syntax [^[:ascii:]] as it can be seen for example on Boost documentation page for Perl Regular Expression Syntax, which is the library used by UltraEdit for Perl regular expression finds/replaces, in the table of chapter "Single character" character classes.

              I think, there is no general issue with highlighting all items found with UE v27.10.0.164. A Perl regular expression find with checked option Highlight all items found works fine on a Unicode file for the search strings [0-9]+ and [äöüÄÖÜß]+ and [Ω⅓⅔√①②]+. Just a Perl regular expression find with [\x{25cf}\x{25aa}] or [^[:ascii:]] does not find any matching string at all with a checked option Highlight all items found. That is indeed not explainable for me.
              Best regards from an UC/UE/UES for Windows user from Austria

              11327
              MasterMaster
              11327

                Dec 16, 2020#7

                Thank you so much again, Mofi! :thumbsup:
                It's impossible to lead us astray for we don't care even to choose the way.

                14
                Basic UserBasic User
                14

                  Dec 17, 2020#8

                  Thank you mofi sir for clarifying

                  6,684586
                  Grand MasterGrand Master
                  6,684586

                    Dec 17, 2020#9

                    I reported the issue with no strings found with option Highlight all items found checked on Unicode file as follows to IDM support by email and received a reply that the issue could be reproduced and was added into the issue database for investigation by a developer of UltraEdit.

                    Forum member Ovg and I detected an issue on answering a forum question with UE v27.10.0.164. There are no strings found on find option Highlight all items found checked in some cases depending on the search string or the search expression on file being UTF-8 and in one case UTF-16 LE encoded.

                    Please open the small attached UTF-8 text file to reproduce this issue.

                    It is absolutely no problem to run from top of the file a Perl regular expression Find with Alt+F3 with one of the following search strings and none of the advanced Find options checked except Regular expressions with Perl selected.

                    [ÄÖÜäöü]+
                    [\x{2153}-\x{215E}]+
                    [\x{2460}-\x{2473}]+
                    [\x{25aa}\x{25cf}]+
                    [^[:ascii:]]+

                    All these Perl regular expression finds work also with either option List lines containing string or Bookmark matching lines checked.

                    But the find behavior differs on having checked option Highlight all items found. Only the first Perl search expression works in this case. The others do not find any string with enabled Highlight all items found and the active file is UTF-8 encoded.

                    Just the last Perl search expression does not work with the file being UTF-16 LE encoded and an enabled option Highlight all items found.

                    The UTF-8 encoded file without BOM attached to the report contains an empty line at top and the following five lines:

                    Code: Select all

                    ASCII characters: 0123 abcd 456 EFG 789
                    German umlauts: ÄÖÜ äöü
                    Fractions: ⅓⅔⅕ ⅖⅗⅘
                    Circled digits: ①②③ ④⑤⑥
                    Black square / circle: ▪●

                      Mar 24, 2022#10

                      The issue reported by me to the support as it can be read above is fixed with UltraEdit for Windows v2022.0.0.70 and UEStudio v2022.0.0.70.
                      Best regards from an UC/UE/UES for Windows user from Austria