Which byte or character is found? The file has only 3 characters!

Which byte or character is found? The file has only 3 characters!

4
NewbieNewbie
4

    Nov 03, 2014#1



    The file only contains string 'abc', 3 characters, no whitespace.
    But when I press next, it jumps to the first line and first place, no characters are selected.
    I'm using UltraEdit 21.30.0.1016. Is that a bug?

    11327
    MasterMaster
    11327

      Nov 03, 2014#2

      Can't reproduce and confirm. UE 21.30.0.1016
      Capture.PNG (36.79KiB)
      It's impossible to lead us astray for we don't care even to choose the way.

      6,686585
      Grand MasterGrand Master
      6,686585

        Nov 03, 2014#3

        I suppose that your example file has a line termination at end of first line. As carriage return and line-feed are both NOT abc, UltraEdit finds the line termination character and selects it resulting in getting displayed a thin blue vertical line at end of first line and a blinking caret at beginning of second line.

        Turn on View - Show Line Endings to see the line endings visually and then run the find again and you see better that line ending is selected.

        Well, according to line number there is obviously not line termination. But perhaps there is a trailing space or tab character which is not displayed although you wrote that there is no whitespace character. Turn on View - Show Spaces/Tabs to verify that.

        Or your file is UTF-8 encoded and contains a UTF-8 BOM which is not displayed and your version of UltraEdit finds the UTF-8 BOM. UE v21.30.0.1016 ignores UTF-8 BOM on this UltraEdit regular expression find.
        Best regards from an UC/UE/UES for Windows user from Austria

        4
        NewbieNewbie
        4

          Nov 03, 2014#4

          Ovg wrote:Can't reproduce and confirm. UE 21.30.0.1016
          You try

          Editor > New File Creation > Create new files as UTF-8

          see if can reproduce

            Nov 03, 2014#5

            Mofi, when I save as UTF-8 NO BOM, it remains the same result.

            6,686585
            Grand MasterGrand Master
            6,686585

              Re: Which byte or character is finded? The file is only 3 characters!

              Nov 03, 2014#6

              Yes, if new file is created as UTF-8 encoded file and not yet saved, UltraEdit v21.30.0.1016 finds the not displayed BOM. Saving the new file and running same find again and nothing is found anymore. Same issue exists with new file created as UTF-16 encoded file and not yet saved.

              I can also confirm that this issue remains if UTF-8 encoded file containing just abc is saved without BOM. But after reloading the file using File - Revert to Saved results in this case in reloading the file as ASCII/ANSI file and therefore no character is found anymore not being abc.

              You can report this minor issue to IDM support if you want get it fixed in a future version.
              Best regards from an UC/UE/UES for Windows user from Austria

              4
              NewbieNewbie
              4

                Nov 03, 2014#7

                'Revert to Saved' just seems like 'workaround', because I wanna 'UTF-8' files.

                And I also found a bug:
                1. New file as UTF-8
                2. Just type character 'a' (maybe more accurate: ansi character)
                3. Save as UTF-8 NO BOM, DOS
                4. Close this file, and open, the encoding back to 'ANSI'
                See if you can reproduce.

                It seem like they handle multi-language not well.
                So how should I report to them? Email? Or post in forum?

                6,686585
                Grand MasterGrand Master
                6,686585

                  Nov 03, 2014#8

                  That a file containing just abc saved as UTF-8 without BOM is loaded as ASCII/ANSI file is not a bug.

                  A UTF-8 encoded file must have either a UTF-8 BOM or a character with a code value greater decimal 127 (within the first 64 KB on using UltraEdit for Windows < v24.10 or UEStudio < v17.10) which is encoded in UTF-8. A text file encoded in UTF-8 containing no BOM and also no character with a code value greater decimal 127 is binary equal to an ASCII file containing same text. So no editor can determine that this file should be interpreted as UTF-8 encoded.

                  This is one reason why HTML and XHTML file encoded in UTF-8 must contain in header a character set declaration to indicate for applications loading the file the used encoding. And XML files encoded in UTF-8 must contain in header (first line) also appropriate encoding declaration.

                  It is very important for all applications loading text files that they can determine as soon as possible the used encoding especially for UTF-8 encoded files where often just a few characters are different in comparison to an ASCII/ANSI file as it makes a huge difference if the text is a single byte encoded text or a Unicode encoded text. Unicode encoded text requires internally in all applications a completely different character management then single byte encoded text, or if the application uses internally only Unicode text, it must know how to convert the bytes in the text file correct to Unicode characters on load. See power tip Unicode text and Unicode files in UltraEdit/UEStudio.

                  There is a special setting to load also files containing only ASCII characters as UTF-8 encoded file even with NO BOM, no UTF-8 encoded character within the first 64 KB on using UltraEdit for Windows < v24.10 or UEStudio < v17.10, and no charset or encoding declaration, see UTF-8 not recognized, largish file or Using UTF-8 with UltraEdit. But be aware on using this setting that opening a file which is an ANSI file containing characters with a code value greater decimal 127 single byte encoded without using File - Open with option Open as ASCII results in corrupting the ANSI file on save for those ANSI characters.

                  In case you don't know the difference between ASCII and ANSI: ASCII defines only the characters with code value 0 to 127. The various ANSI standards define the characters with code value 0 to 255.
                  Best regards from an UC/UE/UES for Windows user from Austria