Find String List - line number incorrect on large Unicode file (fixed)

Find String List - line number incorrect on large Unicode file (fixed)

3
NewbieNewbie
3

    Feb 20, 2017#1

    Anyone else have issues with the "Find String List" and the goto being off by a few lines/many lines on large files. 10K plus lines is the file size. The find works like it should, but the Goto in the list doesn't go to the correct line as well as the line numbers are off after ~12K lines. It gets close, but not right on it as it with smaller files and or less lines.

    6,602548
    Grand MasterGrand Master
    6,602548

      Feb 21, 2017#2

      I opened an ISO-8859-1 encoded HTML file (ANSI encoded) with about 22.930 DOS terminated lines and extended it to 206.362 lines with a file size of 8.807.940 bytes. Then I executed a non regular expression Find with List lines containing string option checked from top of file and it found 4383 lines listed in Find String List. The last found line was line 206.297. I double clicked on last found line and correct position was set in file. Also positioning caret on other lines listed in Find String List worked fine.

      I can remember that other forum members wrote about an issue with wrong line counting on running a Find with List lines containing string option if the file is a mixture of DOS and UNIX file, i.e. the file contains carriage return + line-feed (DOS/Windows) as well as just line-feed (UNIX) as line terminators. In this case the file should be modified (add and remove space) and saved to get the file with same line termination on all lines on using default configuration settings for DOS/Unix/Mac Handling.

      But if you are using already currently latest UE v24.00.0.45 and you can reproduce this issue with your file, it would be good to report this issue by email to IDM support (see top of this page) and attach your file compressed into a ZIP or RAR archive together with your configuration settings stored in %APPDATA%\IDMComp\UltraEdit\uedit*.ini. Don't forget to describe step by step how to reproduce this issue using your configuration and your file, i.e. where is the caret on running Find, what search string do you use, which other find options are set, etc.
      Best regards from an UC/UE/UES for Windows user from Austria

      18672
      MasterMaster
      18672

        Feb 21, 2017#3

        Hi,

        I had similar problems in files containing 1M lines (and 112K matches). This bug has been fixed since UE 24.00.0.46.

        BR, Fleggy

        3
        NewbieNewbie
        3

          Feb 21, 2017#4

          Thanks for the info. The few of the beginning lines work as well as the last line, but did notice that this issue is on a UTF-8 file and if the file is converted to ANSI the Find String List seems to work as it should.

          Update: This issue was escalated to support and was able to reproduce the issue with the UTF-8 files. Investigation underway in development. The issue exists in version 24.00.0.47.

          18672
          MasterMaster
          18672

            Feb 22, 2017#5

            Interesting. I've just checked 24.00.0.48 and FSL line numbers for UTF-8 files are not correct. On the other hand, FSL line numbers are OK using ANSI and UTF-16 files.

            EDIT: UE 24.00.0.49 is available for download. FSL is correct in ANSI, UTF-8 and UTF-16 files (at least for me).

            6,602548
            Grand MasterGrand Master
            6,602548

              Feb 23, 2017#6

              I converted my ANSI encoded HTML file to UTF-8 and increased its file size to 25.412.943 bytes with 595.354 lines and really containing UTF-8 encoded characters and not just ASCII characters.

              UE v24.00.0.49 still fails to count the lines correct on this large UTF-8 encoded HTML file. Also UE v22.20.0.49 and v23.20.0.43 record wrong line numbers on using List lines containing string option. So this issue is not new to UE v24.00, it exists also in former versions.

              Find String List contains the correct line numbers for all lines if I use File Open command and select ASCII (ansi code page auto detection) for Encoding for opening the UTF-8 encoded file as ANSI file.

              What was surprising for me is the fact that line counting of UE v24.00.0.49 is also not correct on UTF-16 encoded file. I converted the large UTF-8 encoded HTML file to UTF-16 with BOM (50.821.588 bytes), closed the file and re-opened it with correct auto detection of UTF-16 LE encoding, executed Find with List lines containing string option, scrolled to end of list and double clicked on found lines at end of list. The line numbers were not correct. Just the line numbers at beginning of the list were correct determined by UltraEdit. UE v22.20.0.49 and v23.20.0.43 failed also to correct count the lines on large UTF-16 Little Endian encoded file.

              So there is still a problem with Find String List on large Unicode files. I reported this issue to IDM support by email.
              Best regards from an UC/UE/UES for Windows user from Austria

              18672
              MasterMaster
              18672

                Feb 23, 2017#7

                Hi Mofi,

                are you testing FSL with Perl or a normal search?
                Because I usually test FSL with this Perl regex (can match multi-lines) in large SQL files

                ((?<!\w)(DO|LOOP|BEGIN|END|AS)|;)((\s*(((?:--|//).*[\r\n])|(?>/\*(.|[\r\n])*?\*/)))*\s*)IF(?!\w)

                and it populates FSL correctly even in UTF-8 with accented characters (correct line numbers using UTF-8 since 24.00.0.49, ANSI and UTF-16 fixed since 24.00.0.46)

                Thanks, Fleggy

                6,602548
                Grand MasterGrand Master
                6,602548

                  Feb 24, 2017#8

                  I created and sent to IDM support 3 large HTML files with 21.705.438 bytes (ANSI), 21.707.196 bytes (UTF-8) and 43.410.870 bytes (UTF-16) each with 508.033 lines (compressed with RAR5 to 650.138 bytes) and ran a non regular expression Find searching for the word summary in the 3 files with same contents. The line numbers for the found lines at end of Find String List are correct only in ANSI encoded file. IDM support replied already and confirmed the issue because of being reproducible with the supplied files and according to my step by step instructions with UE v24.00.0.49 as well as with v23.20.0.43.
                  Best regards from an UC/UE/UES for Windows user from Austria

                  18672
                  MasterMaster
                  18672

                    Feb 24, 2017#9

                    Thanks for the detailed information.
                    Fleggy

                    3
                    NewbieNewbie
                    3

                      Feb 24, 2017#10

                      All,

                      It appears that the most recent version has corrected this issue. Non public user verification build 24.00.0.50 appears to have the correct line/numbers with the FSL.

                      6,602548
                      Grand MasterGrand Master
                      6,602548

                        Feb 26, 2017#11

                        Yes, that's right. The issue with incorrect line counting on large Unicode files is fixed with public hotfix build 24.00.0.53 as I could verify, too.
                        Best regards from an UC/UE/UES for Windows user from Austria