Is there a simple way to find/show non-ASCII characters in a generic text-based file without using regex?

Is there a simple way to find/show non-ASCII characters in a generic text-based file without using regex?

4

    Oct 19, 2022#1

    Ref: A pre-existing topic that asks the question in the context of a search-and-replace using regex: How to find non-ASCII characters in a CSV file?

    My problem is a bit different. I periodically edit files that are plain-vanilla ASCII, like configuration files and such. Every so often, bit-rot, corruption, an unexpected exit, (or whatever), will cause the occasional non-ASCII character to appear and the results are often "application dependent" (😉) as they are usually not easy to find because they are, (usually), invisible.

    Question: (Or maybe this should be a "wouldn't it be nice if. . . ." type question?)
    Is there some pre-existing thing that can be checked, selected, or whatever, (like word-wrap, tabs-to-spaces, etc.) that will display some mark or something for characters that are not part of the "normal" ASCII character set that would be expected to be seen in a file?

    Or, maybe there's a "show control characters" mode? [BEL] [ESC] [NAK] and such like?

    I hate to sound like a whiny user that's never satisfied - that I'm not. UE is so very capable and powerful, that it's startling sometimes to NOT find something you expect.

    So, how do I play this? Is there a pre-existing feature that I forgot about or can't find in these later versions? Should I toss this up to IDM's tech-support people? Is there an existing forum, (that I didn't find - and yes, I looked), for enhancement requests?

      Oct 19, 2022#2

      Sorry if I look like an idiot for - sort-of - answering my own question, but. . . .

      I did find, (re-discover), the show spaces/tabs and line-endings settings. They are good because if something is NOT a space, tab, line-ending, or other useful character, it's automatically suspect.

      However, there are still issues with this:
      1. The marks, when they appear, are extremely difficult to see - they are something like a 10% grey - and for my older eyes, they might as well not be there. It would be quite useful if these markings could be made easier to see.
      2. Line endings can be [CR], [LF], or [CR][LF] and if they get mixed, that's also a problem.
      Please accept my apologies in advance and thanks for any help you can provide!

      19176
      MasterMaster
      19176

        Oct 19, 2022#3

        Hi,

        do I understand you correctly that you need to highlight all unusual ASCII characters from range x00-x21? Ideally like Notepad++?

        BR, Fleggy

        4

          Oct 20, 2022#4

          Well. . . . I wasn't going to name them, (it might not be considered polite), but something like that would be useful.

          This also assumes that any (single byte) character code greater than 0x7F is a printable character - is that a safe assumption?

          Thanks!

          Update:
          jharris1993 wrote: ↑
          Oct 19, 2022
          1. The marks, when they appear, are extremely difficult to see - they are something like a 10% grey - and for my older eyes, they might as well not be there.  It would be quite useful if these markings could be made easier to see.
          Found it.
          Capture1.JPG (51.07KiB)

          Becomes this. . . .
          Capture2.JPG (53.71KiB)

          Makes those pesky marks easier to see. . .

          6,687586
          Grand MasterGrand Master
          6,687586

            Oct 20, 2022#5

            There is no built-in feature to display control characters usually not used in text files with a different text/background color or with a replacement character/string. But it is possible to use a syntax highlighting wordfile in which on the line beginning with /Delimiters = is specified a list of control characters and non-ASCII characters and has a color group which has all these characters each on a separate line. In Manage Themes dialog window can be configured a text color and very important here a background color for the color group containing the control characters and non-ASCII characters. I used this technique to get in HTML files characters like no-break space or en-dash or certain types of single and double quotes highlighted after pasting text from other files into the HTML file. Then I run usually a macro to replace these characters by HTML entities or other characters. The syntax highlighting wordfile html.uew installed with UltraEdit for Windows v2022.1.0.112 contains in color group 4 with name Dashes and NBSP an en-dash, em-dash and a no-break space which are also specified as word delimiters. A good text and background color for the HTML color group Dashes and NBSP makes these characters good visible.

            Enhancement requests must be sent by email to UltraEdit support or using the technical support form.
            Best regards from an UC/UE/UES for Windows user from Austria

            4

              Oct 20, 2022#6

              Mofi wrote:There is no built-in feature to display control characters usually not used in text files with a different text/background color or with a replacement character/string. But it is possible to use a syntax highlighting wordfile [. . .]
              Thanks! That's a good idea and I may end up trying something like that.
              Mofi wrote:Enhancement requests must be sent by email to UltraEdit support or using the technical support form.
              I thought so, but I like to ask as things have changed much since UltraEdit v8 came out 😉

              Thanks!