Find strings with a regular expression and output them to new file

Find strings with a regular expression and output them to new file

6,604548
Grand MasterGrand Master
6,604548

    Sep 29, 2012#1

    On Macros & Scripts page there is the scripts collection Find strings to new file created by me to ask the script user on execution for a regular expression search string to find strings of interest in active file and output all found strings in a new file line by line.

    The scripts can be used on small files as well as on files with several MB or even GB and is very fast as it does everything in memory.

    Note: The regular expression string to enter must be of Perl syntax and is applied to a JavaScript RegExp object which is not as powerful as the Perl regular expression engine within UltraEdit.

    The search is executed not case sensitive as of option gi which means global case-insensitive.

    For example to find and output to a new file all integer numbers within a file execute the script and enter \d+

    Read also FindStringsReadMe.txt within the ZIP archive and the comments at top of the script file you want to use.

    The ZIP file contains following scripts:
    • FindStringsToNewFile.js
      searches in entire active file for strings with a regular expression and outputs only all found strings or even only parts of the found strings line by line to a new file. It can be used for small files with some KiB and larger files up to a few MiB.
    • FindStringsToNewFileExtended.js
      is like FindStringsToNewFile.js with the difference that it can be used also for really large files with many MiB or even GiB. This script can't be used with UE for Windows v13.00 and UES v6.20.
    • FindStringsWithLineNumbers.js
      is like FindStringsToNewFile.js with the difference that line number information is also output on every line with a found string in the new file. This script can be used for small files with some KiB and larger files up to a few MiB.
    • FindStringsWithLineNumbersExtended.js
      is like FindStringsWithLineNumbers.js with the difference that it can be used also for really large files with many MiB or even GiB. This script can't be used with UE for Windows v13.00 and UES v6.20.
    Post a reply here if you have questions regarding the scripts or suggestions for improving the scripts.

    The line and block comments can be removed from each script file by running a replace all (from top of file) searching with Perl regular expression for ^ *//.+[\r\n]+|^ */\*[\s\S]+?\*/[\r\n]+| +//.+$ and using an empty replace string. The first part in this OR expression with three arguments matches entire lines containing only a line comment, the second part matches block comments, and third part matches line comments right to code. Removal of the comments makes the usage of a script from this collection more efficient on using it often because of JavaScript interpreter has to interpret less characters and lines.

      Apr 15, 2013#2

      All four scripts of the scripts collection as described above were enhanced on 2013-04-14. The new features are:
      • Support for copying to new file just the string(s) in capturing group(s) instead of entire found string.

        With this new feature it is possible to search for strings in a specified context and get really only the strings of interest instead of the complete found strings matching also the content around to limit the results on a specific context. This enhancement makes it possible to get from HTML, XML, LOG or CSV files just certain values depending on some conditions and output them into a new file in CSV format.

        This feature makes it also possible to change the order of found strings for the output. For example it is possible to find data in a LOG file with date and time and the output file should be a CSV file with the found data of interest and the date format changed from yyyy-mm-dd to dd.mm.yyyy.

        To add something to a found string in the output file simply tag the found strings by using 1 or up to 9 pairs of parentheses (round brackets) in the regular expression search string and the script asks you for the output format string which makes it possible to enter the additional text output for every found string too.

        This enhancement makes it for example also possible to transform an XML file to a CSV file in one step if the XML file does not contain more than 9 data fields within a block (= not more than 9 values within a row in CSV output file) without modifying the input file.

        For this feature a replace is internally executed on each found string copied to memory to reformat the found strings to the wanted output format.
         
      • Real support for very large files with several hundred MB or even some GB.

        The first versions of the 2 extended scripts failed to copy / extract / grab data from very large files to a new file. The reason was a bad memory management of all versions of UltraEdit for Windows / UEStudio up to current releases v19.00.0.1028 (UE) / v12.20.0.1006 (UES) regarding selected text. A selected text accessed by a script is copied to RAM and hold there until the script is terminated even when the selection is discarded during script execution and therefore the data in RAM cannot be accessed anymore by the script. This memory usage behavior resulted in an out of memory situation for 32-bit application UE / UES on very large files as the extended scripts select in a loop large text blocks.

        To workaround this problem the two extended scripts terminate itself when end of file is not reached, but 500.000 lines (without line numbers) respectively 400.000 lines (with line numbers) are processed already. The script user is in this case informed by a message prompt to execute the script simply again to continue processing the very large file. If after one or more executions of the same script on the same file the end of file is reached finally, the extended scripts display with a message the total number of found strings and the output file with the produced data is made active.

        With UE v14.20 or UES v9.00 and any later version it is not needed to enter the data (regular expression search string, expression for output string in case of capturing groups, line number format) again when running one of the 2 extended scripts the second, third, ... time on same file to continue processing the large input file after the script termination to free memory as the extended scripts remember all values of variables entered or defined internally on first script usage on a very large file in user clipboard 8 and reloads them from this clipboard on continued script runs.

        I hope that the IDM developers improve the memory management for selected text soon so that the extended scripts can process very large files in a single run.
         
      • Search and replace strings can be defined easily now at top of the scripts.

        It happens quite often that a script to copy / extract data from large LOG, XML or CSV files to a new file need to be run in regular intervals (daily, monthly) with always searching for same data. In such cases it is annoying to enter the regular expression search string and now perhaps also the replace string for special output on every script usage.

        The search and replace strings can be predefined therefore very easily at top of the scripts below the introductory comments in case of using one of the scripts always for the same task. That makes it even for script newbies possible more or less easily to customize a script from this scripts collection for a specific task.
         
      • ReadMe file rewritten

        The file FindStringsReadMe.txt was rewritten which hopefully explains now better what this script collection can be used for giving some examples in a question and answer style, which known limitations exist, and how to customize them for a specific task.

        Sep 30, 2013#3

        All 4 scripts of the scripts collection as described in initial post of this topic were enhanced on 2013-09-30.
        1. An error caused by entering an invalid regular expression is caught now by the scripts and an error message is displayed which informs the script user about this error with showing also the error reason string returned from JavaScript core, i. e. why the entered search string is an invalid regular expression string.
        2. There was a wrong "nothing found check" in script FindStringsToNewFileExtended.js resulting in a script execution error instead of showing a message which informs the user that nothing was found with the entered expression.