Conditional output of filename, title and date to new file

Conditional output of filename, title and date to new file

14
Basic UserBasic User
14

    Nov 09, 2012#1

    I'm trying to figure out how I could go through an entire site and perform a search for a string in the <body>. If string is found, I would like to output the page <title>, date modified, which would be found with the following format

    Code: Select all

    modified" scheme="W3CDTF" content="[\d]{4}-[\d]{2}-[\d]{2}"
    , and file name with full path and extension.

    For example we'll search for "fast food" (case insensitive) would output the following results to a file or to the results window where .
    MacDiddles made record profits again;2012-11-08;\news\food\restaurants\somefile.htm
    Government of lala land goes on battle against obesity;2010-10-25;\government\policies\obesityxyz.html
    ...
    I don't need to display the lines it is found in, just the title, date and filename. File extensions could be .htm, .html, .asp and the fields could be separated by ; for importing into a spreadsheet.

    Code: Select all

    <html>
    <head>
    <title>MacDiddles made record profits again</title>
    .....
    <meta ...modified" scheme="W3CDTF" content="2012-11-08">
    ......
    <head>
    <body>
    ............... The fast food giant reports.....
    </body>
    </html>
    I'm pretty sure it involves using the FileNameFunctions.js and find in files. But we don't really want to look for title and date unless the string is found. So basically it's IF "fast food" is found after <body> THEN get title and date and insert into new line in a file or to the "results window".
    Maybe the script would first use the getlistoffiles found containing the string, and then processing the list to extract title, date and filename?

    Any help would be much appreciated once again.

    I'm using UE 14.20.

    6,604548
    Grand MasterGrand Master
    6,604548

      Nov 11, 2012#2

      As I read your post last Friday I thought that you described already the best method. But today as I started with writing the script I had an idea how this task could be done faster.

      Here is the script using this idea. The script produced the correct output with UE v18.20.0.1020 on an example files collection in a tree. It should work also with UE v14.20, but I have not tested it with this version of UltraEdit.

      You have to adapt the string in the first line with the path of the directory being interpreted as root directory for the Find in Files.

      Please note that at Advanced - Search - Set Find Output Format the option Found Line must be checked and the format string must be $P($L): $S (the default format string) for this script. All other output format settings for Find in Files results do not matter for this script.

      Attention: The Find in Files results window must not be present already on running this script. For this script as is it is important that this document window does not exist in list of document windows on script start.

      Code: Select all

      var sRootDir = "C:\\Temp\\Test\\";
      // Get document index for output file which is equal the number of
      // currently opened files before opening a new file for the output.
      nOutputIndex = UltraEdit.document.length;
      UltraEdit.newFile();
      
      // Ask user which string to find in the files.
      var sUserText = UltraEdit.getString("Enter string to find:",1);
      
      // Define all parameters for the regular expression Find in Files.
      UltraEdit.perlReOn();
      UltraEdit.frInFiles.regExp=true;
      UltraEdit.frInFiles.filesToSearch=0;  // Run a regular expression search
      UltraEdit.frInFiles.searchSubs=true;  // in an entire directory tree.
      UltraEdit.frInFiles.directoryStart=sRootDir;
      UltraEdit.frInFiles.searchInFilesTypes="*.asp;*.htm*";
      UltraEdit.frInFiles.ignoreHiddenSubs=false;
      UltraEdit.frInFiles.displayLinesDoNotMatch=false;
      UltraEdit.frInFiles.matchCase=false;
      UltraEdit.frInFiles.reverseSearch=false;
      UltraEdit.frInFiles.matchWord=false;
      UltraEdit.frInFiles.useOutputWindow=false;
      UltraEdit.frInFiles.unicodeSearch=false;
      
      // To find are lines with the entered string, the line with the
      // title and the line with the meta tag of last modification date.
      // The results are written to the results file.
      var sSearch = '<title>|modified" scheme="W3CDTF"';
      if (sUserText.length) sSearch += '|' + sUserText;
      UltraEdit.frInFiles.find(sSearch);
      
      // Interesting are only the files containing the entered string. And from
      // these files only file name, title and modification date data are needed.
      // Use a Perl regular expression search string to find 3 consecutive
      // containing <title>, date tag and the string entered by script user.
      UltraEdit.activeDocument.top();
      var sVarText = (sUserText.length) ? "\\r\\n\\1\\(.+" + sUserText + ".*" : "";
      sSearch = '^(.+?)\\(.+?<title>(.*?)</title>.*\\r\\n\\1\\(.+modified" scheme="W3CDTF" content="(\\d{4}-\\d{2}-\\d{2})".*'+sVarText+'$';
      
      UltraEdit.activeDocument.findReplace.mode=0;
      UltraEdit.activeDocument.findReplace.matchCase=false;
      UltraEdit.activeDocument.findReplace.matchWord=false;
      UltraEdit.activeDocument.findReplace.regExp=true;
      UltraEdit.activeDocument.findReplace.searchDown=true;
      if (typeof(UltraEdit.activeDocument.findReplace.searchInColumn) == "boolean") {
         UltraEdit.activeDocument.findReplace.searchInColumn=false;
      }
      UltraEdit.activeDocument.findReplace.preserveCase=false;
      UltraEdit.activeDocument.findReplace.replaceAll=true;
      UltraEdit.activeDocument.findReplace.replaceInAllOpen=false;
      UltraEdit.activeDocument.findReplace.replace(sSearch, "#\\2;\\3;\\1");
      
      // All lines starting now not with character # must be deleted as of no interest.
      UltraEdit.activeDocument.findReplace.replace("^[^#].+\\r\\n", "");
      
      // Then the marker character and the root directory can be removed on remaining lines.
      var sRootPath = sRootDir.replace(/\\/g,"\\\\");
      UltraEdit.activeDocument.findReplace.replace("^#(.+?;.+?;)"+sRootPath+"(.+)$", "\\1\\\\\\2");
      
      // Select now all remaining lines if there are such lines and copy
      // them to the new file created at beginning via user clipboard 9.
      UltraEdit.activeDocument.selectAll();
      if (UltraEdit.activeDocument.isSel()) {
         UltraEdit.selectClipboard(9);
         UltraEdit.activeDocument.copy();
         UltraEdit.document[nOutputIndex].paste();
         UltraEdit.clearClipboard();
         UltraEdit.selectClipboard(0);
         UltraEdit.document[nOutputIndex].top();
      } else {
         UltraEdit.document[nOutputIndex].write("No file found!");
      }
      
      // Close the not needed anymore find in files results file.
      UltraEdit.closeFile(UltraEdit.activeDocument.path,2);
      PS: The script can be used even with no string to find entered on script start to get a list of all files with their titles and last modification date.

      14
      Basic UserBasic User
      14

        Nov 12, 2012#3

        Thank you very much once again Mofi for coming to my rescue. It works perfectly!