UltraEdit is getting very slow on loading HTML file with binary files embedded base64 encoded and word wrap enabled

UltraEdit is getting very slow on loading HTML file with binary files embedded base64 encoded and word wrap enabled

912
Advanced UserAdvanced User
912

    Mar 06, 2018#1

    Hi.

    I know that there is already another thread - UltraEdit is getting very slow on startup - about something similar, getting slow on startup. I browsed it, but found nothing useful that could fix my problem at all. I use an Intel I7 4790 CPU at 3.60 GHz and 8 GB RAM, running Windows 7 SP1.

    When I try to load some HTML file, even with an instance of UltraEdit 24.10.0.24 already open, time to program respond or to get some data on screen varies from 20 to 30 seconds.

    I have a sample file to show what I'm saying and it goes attached. File is 556 KB, not a big one.
    About that, I have another question. The file is 569.821 bytes, but UltraEdit show at status bar 1.138.682 bytes.

    Why it shows almost the double?

    Well, I noticed that my sample file has a very big line, line # 44.
    Every single move inside that file is enormous slow, more yet around that line.
    And that line has something I saw it's new, compared to previous version of Firefox, from whom I saved the file.
    I mean, previous Firefox versions did not save HTML files with that tags and informations I have now on line # 44.

    For this, I also have more questions:
    1. Is it really necessary that tags?
    2. What means tags like this?
      src:url(data:application/font-woff;charset=utf-8;base64, (and a very very big string here...)
    I'm considering strip out that entire line from my next file saving actions.

    And why UltraEdit is delaying so much to show and edit that kind of files?

    P.S:
    Trying to workaround this, I found some fixes:
    1. If I rename the sample file to sample.txt, time to open it decreased to 2 seconds.
    2. If I set Configuration- Editor - Word wrap to uncheck Default word wrap on to each file, the problem is also gone, opening the file within 2 seconds (I always use and prefer to keep word wrap ON for each file).
    3. But if I set No highlighting for syntax at status bar after loading the file, problem remains because syntax highlighting come back to HTML at every move I do.
    Thanks.
    Sample.zip (344.5 KiB)   34

    6,602548
    Grand MasterGrand Master
    6,602548

      Mar 06, 2018#2

      I am using also 64-bit Windows 7 Enterprise with SP1 and all Windows updates installed on a machine being less powerful than yours. But I am using already 32-bit UltraEdit for Windows v24.20.0.62.

      None of the issues reported can be reproduced with your sample file with currently latest public version 24.20.0.62 of UltraEdit on my machine. Start time of UltraEdit with loading this file and getting it displayed syntax highlighted with default configuration is about two seconds. Scrolling works, but is not smooth with syntax highlighting enabled. Syntax highlighting can be disabled in status bar with selecting No highlighting which is kept off for this file in this editing session also on scrolling in file which makes the scrolling much faster.

      The large lines being external files (background image file, font files, ...) embedded in ASCII using base64 encoding are of no problem with default configuration of UE v24.20.0.62.

      Next I restored from my personal archives also 32-bit UE v24.10.0.24. Starting UE with default configuration of this UltraEdit version, loading the file and getting it displayed worked again in about two seconds. Scrolling speed was like with v24.20.0.62.

      But I could reproduce the issue with selecting No highlighting having no effect on using UE v24.10.0.24. The file is still syntax highlighted, but with some syntax highlighting based features like code folding disabled. This bug was fixed in next public released version 24.10.0.32 according to my tests by restoring this version from my archives and using your sample file. Contact IDM support by email asking for latest UE v24.10 hotfix version 24.10.0.35 in language X if your license does not permit an upgrade to UE v24.20.0.62.

      The scrolling is In all versions of UltraEdit much faster with syntax highlighting enabled, but with document map closed.

      File size indicated in status bar is the file size of the temporary file used which is UTF-16 LE encoded with fixed two bytes per character. UTF-8 encoding stores characters with one to four (six) bytes per character depending on character which is bad for holding them in memory. However, UltraEdit v25.00 being released most likely this month will support UTF-8 encoded files without converting to UTF-16 LE making it possible to view and edit large and huge UTF-8 encoded files with several hundred MiB or even some GiB without a temporary file and without conversion of original file to UTF-16 LE as done now in this special use case.

      Conclusion: The cause of the long loading time must be searched for in your configuration.

      What do you have configured at Advanced - Settings or Configuration - Editor display - Miscellaneous for setting Maximum columns before line wraps?

      The default is 4096. This value determines the maximum width of the image of the document window created in background of which just a part is displayed depending on document window width/height and horizontal/vertical position. The lines in the file are not really wrapped after the specified number of columns (= characters without tabs in file). Perhaps decreasing this value to 1024 or 512 is helpful for you.

      Which font do you have set? Is it a proportional font or a non-proportional (fixed width) font?

      The default font is Consolas with font size 10.
      Best regards from an UC/UE/UES for Windows user from Austria

      912
      Advanced UserAdvanced User
      912

        Mar 06, 2018#3

        Mofi wrote:None of the issues reported can be reproduced with your sample file with currently latest public version 24.20.0.62 of UltraEdit on my machine. Start time of UltraEdit with loading this file and getting it displayed syntax highlighted with default configuration is about two seconds. Scrolling works, but is not smooth with syntax highlighting enabled. Syntax highlighting can be disabled in status bar with selecting No highlighting which is kept off for this file in this editing session also on scrolling in file which makes the scrolling much faster.
        I forgot to say that I use to open files in UltraEdit by right click and context menu. But I think that's not the cause of big delay to open the file.
        The large lines being external files (background image file, font files, ...) embedded in ASCII using base64 encoding are of no problem with default configuration of UE v24.20.0.62.
        Understood. I mention that because I could not see any sense for so big line and asked for some clue from you.
        But I could reproduce the issue with selecting No highlighting having no effect on using UE v24.10.0.24. The file is still syntax highlighted, but with some syntax highlighting based features like code folding disabled. This bug was fixed in next public released version 24.10.0.32 according to my tests by restoring this version from my archives and using your sample file. Contact IDM support by email asking for latest UE v24.10 hotfix version 24.10.0.35 in language X if your license does not permit an upgrade to UE v24.20.0.62.
        Glad to know that it's already detected and fixed.
        File size indicated in status bar is the file size of the temporary file used which is UTF-16 LE encoded with fixed two bytes per character. UTF-8 encoding stores characters with one to four (six) bytes per character depending on character which is bad for holding them in memory. However, UltraEdit v25.00 being released most likely this month will support UTF-8 encoded files without converting to UTF-16 LE making it possible to view and edit large and huge UTF-8 encoded files with several hundred MiB or even some GiB without a temporary file and without conversion of original file to UTF-16 LE as done now in this special use case.
        Fully explained.
        Conclusion: The cause of the long loading time must be searched for in your configuration.
        I have some ideas for too big delay:
        1. I always use document map open.
        2. I always use word wrap ON.
        What do you have configured at Advanced - Settings or Configuration - Editor display - Miscellaneous for setting Maximum columns before line wraps?
        4096, default value here.
        Which font do you have set? Is it a proportional font or a non-proportional (fixed width) font?
        I use fixed width Courier New.

        So, when I close document map, set word wrap to OFF and adjust maximum columns before line wraps to 1024, the delay is not occurring anymore.

        But it's very interesting that if I keep document map OPEN, word wrap ON and maximum columns before line wraps to 4096, and rename the file to sample.txt, that delay also does not occur.

        When extension is HTM or HTML, UltraEdit calls syntax highlighting and put it together with document map, word wrap feature and other variables to make that delay, I guess.

        6,602548
        Grand MasterGrand Master
        6,602548

          Mar 08, 2018#4

          Yes, I can reproduce slow display of file with Wrap at window edge or any other word wrap enabled on opening the HTML file with very long "line" caused by binary file being embedded in base64 encoding without any line breaks (which would be possible, but Firefox doesn't do) with any version of UltraEdit including currently latest. I have a speculation on what is the reason in this very special case.
          Best regards from an UC/UE/UES for Windows user from Austria

          912
          Advanced UserAdvanced User
          912

            Mar 08, 2018#5

            Glad to know that the problem was identified.

            I think it's important to help you to better reproduce the delay and understand it is to say that here I use:
            1. Check box "Default word wrap on for each file" is checked
            2. Word wrap method is "Wrap after column # 160"
            3. Document map is open
            4. Maximum columns before line wraps is 4096
            If I uncheck "Default word wrap on for each file", I almost got no delay.
            Even if I turn ON and OFF after load the file, I can scroll the text without delay, what does not occur if "Default word wrap on for each file" is checked.

            Thanks.

              Mar 13, 2018#6

              I'm looking for a way to remove all long strings with binary files embedded base64 encoded from my HTML files.
              I suspect that they come from a Greasemonkey script called "TopAndDownButtonsEverywhere" and that "background" are their arrows Up and Down.
              But I'm not sure, because I still don't know how to see that PNG background embedded HTML code.

              So, because I have hundreds of HTML files to remove such strings, I ask about a way to automatize that.
              I know that I could use Search - Find/Replace In Files command, but because string is very big, I guess it won't fit at Find field.
              Strings are identical for each file, so the task would be make easier.

              Trying to better understand what means those strings, I could get that inside strings I have 2 PNG small pictures, arrows UP and DOWN:
              UpArrow.png (233Bytes)
              DownArrow.png (250Bytes)
              and some WOFF font family data.

              There are some online converters.
              If it's not forbidden, I may put links here.

              My goal is still to remove all those big strings from my hundreds of HTML files.
              Better if it was by using some automatic action.

              6,602548
              Grand MasterGrand Master
              6,602548

                Mar 13, 2018#7

                Perhaps you should read about Base64 encoding to get knowledge how this encoding works and where it is used. It us widely used for emails containing a file attachment because email system was designed for transmitting text and not for transmitting binary data.

                UltraEdit has built-in the functions to Base64 encode text and decode Base64 encoded text. But it is also possible with UltraEdit to decode Base64 encoded binary data on following the instructions posted by me at Decode Base64 data into a binary file. So you can decode the data in your HTML files if you want PNGs and font s as binary files.

                It is possible to insert line ending in base64 encoded data after every 76 characters by running three case-sensitive Perl regular expressions replaces:

                Find what: ;base64,(?![\r\n])\K
                Replace with: \r\n

                Find what: [0-9A-Za-z+/]{76}(?![\r\n])\K
                Replace with: \r\n

                Find what: [0-9A-Za-z+/]{76}[\r\n]+[0-9A-Za-z+/]*=*\)\K
                Replace with: \r\n

                The Perl regular expression search strings do not depend on line ending type. The three replace strings insert DOS/Windows line endings as your attached sample.htm has.

                The first replace inserts a line ending after the string ;base64, if there is not already a line ending after this string.

                The second replace inserts a line ending after exactly 76 Base64 characters.

                The third replace inserts a line ending after closing parenthesis of Base64 encoded data with at least 76 Base64 characters.

                It is possible to remove all Base64 encoded data by running multiple times until nothing replaced anymore a Perl regular expression replace with search string ;base64,\K[0-9A-Za-z+/\r\n]+=* and an empty replace string.
                Best regards from an UC/UE/UES for Windows user from Austria

                912
                Advanced UserAdvanced User
                912

                  Mar 13, 2018#8

                  Mofi wrote:Perhaps you should read about Base64 encoding to get knowledge how this encoding works and where it is used. It us widely used for emails containing a file attachment because email system was designed for transmitting text and not for transmitting binary data.
                  You are right.
                  I'll do that.
                  UltraEdit has built-in the functions to Base64 encode text and decode Base64 encoded text. But it is also possible with UltraEdit to decode Base64 encoded binary data on following the instructions posted by me at Decode Base64 data into a binary file. So you can decode the data in your HTML files if you want PNGs and font s as binary files.
                  I could find Base64 decode at Advanced - Base64 in ribbon mode, but didn't find it searching in traditional menus.

                  Also, my version 24.10.0.24 has no Advanced - Settings/Configuration - Editor - Advanced -> Allow editing of text files with hex 00's without converting them to spaces. It has Allow low ASCII values to be entered (usually control codes) instead. I'm not sure if it means similar or the same.
                  It is possible to insert line ending in base64 encoded data after every 76 characters by running three case-sensitive Perl regular expressions replaces:

                  ...

                  It is possible to remove all Base64 encoded data by running multiple times until nothing replaced anymore a Perl regular expression replace with search string ;base64,\K[0-9A-Za-z+/\r\n]+=* and an empty replace string.
                  Great!
                  It works!

                  Many thanks.

                  🙂

                  6,602548
                  Grand MasterGrand Master
                  6,602548

                    Mar 13, 2018#9

                    Sorry, the referenced post was not up-to-date for UltraEdit for Windows v24.00 and later versions. I updated this post.
                    Best regards from an UC/UE/UES for Windows user from Austria

                    912
                    Advanced UserAdvanced User
                    912

                      Mar 13, 2018#10

                      I think there is something I may missing, because trying to decode from Base64 using UltraEdit I got invalid WOFF font files.
                      Sending the same string to an online conversion, I got almost the same file, but without space characters and a valid WOFF font file.

                      Here a comparison:
                      Base64Decoded.png (33.81KiB)

                      And files involved:
                      Decoded.zip (43.64 KiB)   37

                      Also, I'd like to know where to find Decode base64 menu command (or button) using traditional menus.
                      I can find its button, but using ribbon mode at Advanced tab.

                      6,602548
                      Grand MasterGrand Master
                      6,602548

                        Mar 14, 2018#11

                        I have absolutely no problem neither with using UE v25.00.0.53 nor with v24.10.0.24 to decode the data of a font file right with copying (not wrapped) data into a new ASCII file (code page Windows-1252), selecting the pasted encoded data with Ctrl+A, executing in traditional menus mode from menu Edit the command Decode base64, and saving the file using key F12 with Default being selected for Encoding in Save As dialog. The created font file contains the null bytes and is valid. It is binary identical to file Base64Decoded.woff in your ZIP file Decoded.zip. I used both UltraEdit versions with default settings for this test.
                        Best regards from an UC/UE/UES for Windows user from Austria

                        912
                        Advanced UserAdvanced User
                        912

                          Mar 14, 2018#12

                          Oh, my!
                          Now I can see Decode base64 at Edit menu.

                          As I said, something I'm missing for real.
                          I open this file:
                          Open Sans-SemiBold_1.zip (22.1 KiB)   25

                          (using your words)
                          "copying (not wrapped) data into a new ASCII file (code page Windows-1252), selecting the pasted encoded data with Ctrl+A, executing Decode base64, and saving the file using key F12 with Default being selected for Encoding in Save As dialog." The created font file contains the space bytes instead of null bytes and is invalid.
                          That binary file from Decode.zip I got from a conversion site.

                          Anyway, thank you, Mofi, for your efforts to understand and give me some guidance along this topic.

                            Mar 14, 2018#13

                            Still working with too many HTML files trying to strip Base64 lines out, I wrote a simple macro to make easier my task.

                            I don't want to use Mofi's Regular Expressions because they need to apply more than one time per file and I want just to delete the entire line with those long strings.

                            Searching the forum, I found a macro from Mofi to save file with new name (backup file 'bak') and I merge it with my purpouses:

                            Code: Select all

                            InsertMode
                            ColumnModeOff
                            HexOff
                            Find "data:application/font-woff;charset=utf-8;base64"
                            IfFound
                            DeleteLine
                            Clipboard 8
                            CopyFilePath
                            Clipboard 9
                            SelectAll
                            Copy
                            Top
                            CloseFile NoSave
                            NewFile
                            Paste
                            ClearClipboard
                            Clipboard 8
                            SaveAs "^c.bak"
                            CloseFile NoSave
                            ClearClipboard
                            Clipboard 0
                            EndIf
                            IfNotFound
                            GotoLine 10 1
                            EndIf
                            
                            It's working very well, but if someone wants to make it better, feel free to do that.
                            I put 'GotoLine 10' command because I have no idea what to display if got string not found.

                              Mar 15, 2018#14

                              After some tests, I realized that I need an extra line on my previous macro.
                              Because new file created is ASCII, I must convert it to UTF8 before save it.
                              So, before SaveAs command, I put ASCIIToUTF8 and things are going fine now.
                              To run the macro, I set a hotkey to it.