Copying lines in 1st file based on line numbers listed in 2nd file to a new file

Copying lines in 1st file based on line numbers listed in 2nd file to a new file


    Aug 13, 2009#1

    I have a 1st file which I want to read in as one whole record at a time (ignoring tab delimiters). Lets say I have 500 records (= lines).

    The 2nd file is tab delimited with a record (= line) number in 3rd field. Lets say the 2nd file has 3 records (= lines) with the records numbers 15 in 1st record, 250 in 2nd record, 421 in 3rd record.

    When I finish I want to have a new file with 3 records. The 15th record of the first file, the 250th record of the first file and the 421st record of the first file.

    Can anyone tell me step by step, how to acomplish this using UltraEdit?
    I have tried with Excel and Access but run into record size, file size and other limits that my files exceed.

    Help is sincerely appreciated.

    Grand MasterGrand Master

      Aug 13, 2009#2

      It is impossible to develop scripts for tasks like that without having real data. And with real data I mean real data and not simplified examples.

      Copy some lines of your first file into a new file which uses the same encoding and line termination as your real file has - see the third field in the status bar at bottom of the UltraEdit window show normally only the word DOS (= ASCII/ANSI file with DOS line endings).
      Next we need also the second file containing the record numbers of interest. It is important to know if there is one record number per line or whatever the format is for this file.
      Last we need a file you create manually which shows what the script should produce when executed with the 2 other files as input.

      Pack those 3 files with ZIP or RAR and upload the ZIP archive as attachment to your next post.

      Important is maybe also how large the files are in general and if there are limitations like write-protections. For example for working on very large input files it is often better to avoid permanent or temporary modifications on this file. Sometimes it is even better not opening large files and extract data from it using the command Find In Files which is much faster than a Find on an opened file because of window updates. When your input file is really very large, it is possibly even better to use a script or program not working with a GUI.
      Best regards from an UC/UE/UES for Windows user from Austria


        Aug 16, 2009#3

        Attached are 3 zipped files. The 2 input files and an 'expected' output file. ZIP file deleted later.
        Thanks for your help

        Grand MasterGrand Master

          Aug 18, 2009#4

          Got some time to write this little script and comment it. It worked perfect on your example files with UE v15.10.0.1028.

          Code: Select all

          /* Most left file (= document with index 0 in the document array) must be
             the file with the data. The second file (= document 1) must be the list
             file with the line numbers. Other files open are ignored. */
          if (UltraEdit.document.length >= 2) {
             var sListEntry  = "";
             var nLineNumber = 1;
             var DataFile    = UltraEdit.document[0];
             var ListFile    = UltraEdit.document[1];
             var OutputFile  = UltraEdit.activeDocument;
             /* Define the working environment and open already the new file for the
                output which will become immediately the active file to avoid screen
                updates during script execution which would make the script slower. */
             if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
             else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();
             /* Verify if the last line of the data file also has a line ending.
                If this is not the case insert one and make sure that the auto
                indent feature has not added additional preceding whitespaces. */
             if (DataFile.isColNumGt(1)) {
                if (DataFile.isColNumGt(1)) DataFile.deleteToStartOfLine();
             /* Define once the parameters for the regular expression search with the
                UltraEdit engine used to find the line numbers in the list file. */
             /* The lines of interest could be immediately copied from the data file
                to the output file. But the script is much faster when collecting
                the lines in a user clipboard (= memory). That avoids screen updates
                and flushing the data to the temporary file. */
             // Find in a tab delimited list file (CSV) a number in the third column.
             while (ListFile.findReplace.find("%*^t*^t[0-9]+")) {
                /* String found with the line number of interest at end of the string.
                   Get the selected search result into a string variable, extract the
                   line number using a Perl regular expression and convert the line
                   number from a string into an integer. */
                sListEntry = ListFile.selection;
                var aMatchResult = sListEntry.match(/\d+$/);
                nLineNumber = parseInt(aMatchResult[0],10);
                /* Go to this line in the data file, select it and append it to
                   the content of user clipboard 9 when the line really exists. */
                /* Has the data file enough lines or is the caret now set to last
                   line of the file because the line number is greater than the
                   number of lines in the data file? */
                if (nLineNumber == DataFile.currentLineNum) {
             OutputFile.paste();           // Write the lines into the output file.
             UltraEdit.clearClipboard();   // Clear user clipboard 9 to free memory.
             UltraEdit.selectClipboard(0); // Select again the Windows clipboard.
             /* In case the data file is read-only and the last line of the data
                file has been copied as last line into the output file and this
                line has no line termination, add the line ending now. If the
                last line of a read-only data file has no line ending and this
                line is appended to the clipboard anywhere else than at the end,
                there are 2 lines joined together in the output file. Additional
                code could be used to break the by mistake joined lines up in the
                output file, but such a code is not present here at the moment. */
             if (OutputFile.isColNumGt(1)) {
                if (OutputFile.isColNumGt(1)) OutputFile.deleteToStartOfLine();
   ;  // Move the cursor in all 3 files to top of the file.