Tapatalk

How to run fast a Perl regular expression Replace All on a huge file?

How to run fast a Perl regular expression Replace All on a huge file?

1
NewbieNewbie
1

    Feb 15, 2016#1

    In the interest of not starting an unrelated thread, I work with huge files from time to time as well. Today, unfortunately, I encountered a problem which I did not know how to get around, and it forced me to use another editor to achieve what I needed. That's unfortunate. I otherwise love using Ultraedit.

    First, my version is Version 22.20.0.49 on Windows 7, Home Edition.

    Second, I was doing a search and replace with a PERL regex. After proving that it actually worked on a couple of lines, I let it loose on the rest of the file. Over an hour later, I had to kill the process because it was taking too long. I tuned up the regex a little, but then I discovered that it seemed the replacement was saving the huge file after each and every line. I stepped through looking at it, and the "green bulb" turned that after each.

    I went into Configuration and deactivated everything I could find, from backups, to temporary files, to auto-saving, to syntax editing. Still, no luck, the same behavior occurred.

    Accordingly, after spending 2 hours trying to do this, I bolted and copied the file to another computer where I run an old copy of Editpad Pro.

    While it's too late for this job, I would really like to know what I can do to keep this from happening in the future.

    Thanks.

    6,685587
    Grand MasterGrand Master
    6,685587

      Feb 16, 2016#2

      There is the power tip Large file text editor explaining how to configure UltraEdit for efficiently editing very large files.

      But for running a Replace All on a file with hundreds of MB or even some GB it is best not opening the file at all in UltraEdit.

      Much better is using Search - Replace in Files for a replace task reformatting a very large file.

      In the Replace in Files window select the Directory containing the large / huge file and specify for In files/types the name of the file (with or without wildcards) or alternatively let the Directory field empty and enter or copy & paste in In files/types field the file name with full path.

      The advantages on using Replace in Files on a not opened file in comparison to a Replace All on an opened file:
      • No recording of changes for undo feature needs to be done ever.
      • No line number parsing needs to be done ever.
      • No recording of changed lines for line change indication feature needs to be done ever.
      • No loading of file contents for displaying and updating the display need to be done ever.
      • No syntax highlighting must be applied ever.
      • No update of internal database for the code folding feature needs to be done ever.
      • No update of internal database for the function list feature needs to be done ever.
      • And some more tasks never needed to be done in comparison to running a Replace All on an opened and displayed file.
      In other words by using Replace in Files just the replace is processed on the file using a thread which just counts how many replaces are made on the file and does nothing else.
      Best regards from an UC/UE/UES for Windows user from Austria

      115
      Power UserPower User
      115

        Feb 16, 2016#3

        Very useful information, Mofi. I knew find-in-files would be faster on large files but I didn't really stop to consider why that is so. Perhaps your find-in-files explanation should be added to the forum read-me.

        6,685587
        Grand MasterGrand Master
        6,685587

          Feb 16, 2016#4

          Find in Files on open files or files in a directory does not make much difference as line terminators must be found and also counted to get the line number information in output window or results file and the entire line containing the search string (or entire first line of found block).

          Replace in Files in comparison to a Replace All makes a big difference because of the modifications made on the file. Replace in Files can search for strings to replace faster than Find in Files as line terminators must not be also found, just the string to replace.
          Best regards from an UC/UE/UES for Windows user from Austria