Workaround sought for missing variable negative lookbehind

Workaround sought for missing variable negative lookbehind

20
Basic UserBasic User
20

    Mar 27, 2013#1

    I'm trying to write a script to identify footnotes in a Ventura text file. Such footnotes have the format:

    text text <$Ffootnote footnote footnote > text text

    That looks straightforward enough - except that Ventura has various other control characters of the type <I>, <I*>, <->, <N>, <> and so on that get in the way:

    text <$F footnote <I>title<I*> foot<->note <N> footnote<> footnote > text

    After some effort, using Regex Buddy, I constructed the following regex, which matches the above footnote just fine:

    Code: Select all

    <\$F([\s\S]+?)(?<!(<|< |<  |<I|<I\*|<CR|<N|<-))>
    Trouble is, UE's 'Perl-style' regex doesn't seem to support those negative lookbehinds that I need to ignore those gratuitous control characters, and I can't figure out an alternative. I'm flummoxed. Would anyone know of a workaround? (Regex Buddy declares the above regex valid for Java, but for Perl it warns: "Perl does not support variable repetition inside lookbehind")

    I'd be be most grateful for a bit of help here...
    best,
    fvgfvg

    6,686585
    Grand MasterGrand Master
    6,686585

      Mar 27, 2013#2

      Yes, in a Perl lookbehind or lookahead expression the length of the string must be fixed. An OR in a lookbehind expression is therefore not supported.

      But it is possible to simply specify multiple lookbehind and therefore you can use:

      Code: Select all

      <\$F(?:[\s\S]+?>)(?<!<>)(?<!< >)(?<!<  >)(?<!<I>)(?<!<I\*>)(?<!<CR>)(?<!<N>)(?<!<->)

      20
      Basic UserBasic User
      20

        Mar 28, 2013#3

        Mofi, thank you. I've been trying this out. In RB (RegexBuddy) it works perfectly well. The trouble is that when I insert it into my UE script it doesn't work. RB provides different options: a copy 'as is', or as a 'perl-style' string:
        'as is':

        Code: Select all

        <\$F((?:[\s\S]+?))>(?<!<>)(?<!< >)(?<!<  >)(?<!<I>)(?<!<I\*>)(?<!<CR>)(?<!<N>)(?<!<->)
        
        'perl-style':

        Code: Select all

        '<\$F((?:[\s\S]+?))>(?<!<>)(?<!< >)(?<!<  >)(?<!<I>)(?<!<I\*>)(?<!<CR>)(?<!<N>)(?<!<->)'
        
        The same for the 'replace' string:
        'as is':

        Code: Select all

        <<${1}>>
        
        perl-style:

        Code: Select all

        '<<${1}>>'
        
        The trouble is that whatever I do, I can't get my script to work. Have I got the find/replace line wrong?

        Code: Select all

        if (UltraEdit.document.length > 0) {
           UltraEdit.insertMode();
           UltraEdit.columnModeOff();
           UltraEdit.activeDocument.hexOff();
             //UltraEdit.ueReOn();     // UltraEdit
             //UltraEdit.unixReOn();   // Unix 
           UltraEdit.perlReOn();       // Perl
        
           UltraEdit.activeDocument.top();
           UltraEdit.activeDocument.findReplace.mode=0;
           UltraEdit.activeDocument.findReplace.matchCase=true;
           UltraEdit.activeDocument.findReplace.matchWord=false;
           UltraEdit.activeDocument.findReplace.regExp=true;
           UltraEdit.activeDocument.findReplace.searchDown=true;
           UltraEdit.activeDocument.findReplace.searchInColumn=false;
           UltraEdit.activeDocument.findReplace.preserveCase=false;
           UltraEdit.activeDocument.findReplace.replaceInAllOpen=false;
           UltraEdit.activeDocument.findReplace.replaceAll=true;
          //////////  formatting commands
           UltraEdit.activeDocument.top();        
           UltraEdit.activeDocument.findReplace.replace(" << ","<<");
           UltraEdit.activeDocument.top();
           UltraEdit.activeDocument.findReplace.replace(" >>",">>"); 
           UltraEdit.activeDocument.top();
           UltraEdit.activeDocument.findReplace.replace("<<","<$F");
           UltraEdit.activeDocument.top();
           UltraEdit.activeDocument.findReplace.replace(">>","[]>");
           UltraEdit.activeDocument.top();      
           UltraEdit.activeDocument.findReplace.replace("@N = ","@n=");
           UltraEdit.activeDocument.top();
           UltraEdit.activeDocument.findReplace.replace("@EN = ","@en=");
           UltraEdit.activeDocument.top();
           UltraEdit.activeDocument.findReplace.replace("@n=","@NOTE = ");
           UltraEdit.activeDocument.top();
           UltraEdit.activeDocument.findReplace.replace("@en=","@ENDNOTE = ");
           UltraEdit.activeDocument.top();
           UltraEdit.activeDocument.findReplace.replace("@h3=","@HEAD3 = ");
           UltraEdit.activeDocument.top();
           UltraEdit.activeDocument.findReplace.replace("@h4=","@HEAD4 = ");
           UltraEdit.activeDocument.top();
           UltraEdit.activeDocument.findReplace.replace("@h5=","@HEAD5 = ");
           UltraEdit.activeDocument.top();   
           UltraEdit.activeDocument.findReplace.replace("  "," ");   
           UltraEdit.activeDocument.top(); 
           UltraEdit.activeDocument.findReplace.replace("open italics","<I>");   
           UltraEdit.activeDocument.top();    
           UltraEdit.activeDocument.findReplace.replace("close italics","<I\*>");   
           UltraEdit.activeDocument.top();        
           UltraEdit.activeDocument.findReplace.replace("<I> ","<I>");   
           UltraEdit.activeDocument.top(); 
           UltraEdit.activeDocument.findReplace.replace("open bracket","(");   
           UltraEdit.activeDocument.top();     
           UltraEdit.activeDocument.findReplace.replace("close bracket",")");   
           UltraEdit.activeDocument.top(); 
           UltraEdit.activeDocument.findReplace.replace("open quote","\"");   
           UltraEdit.activeDocument.top(); 
           UltraEdit.activeDocument.findReplace.replace("close quote","\"");   
           UltraEdit.activeDocument.top(); 
        // Replace <$F....> with <<....>>  
           UltraEdit.activeDocument.findReplace.replace("<\$F((?:[\s\S]+?))>(?<!<>)(?<!< >)(?<!<  >)(?<!<I>)(?<!<I\*>)(?<!<CR>)(?<!<N>)(?<!<->)","<<${1}>>");   
           UltraEdit.activeDocument.top();   
           
        }    // <---END OF "IF" IN LINE 1! (DON'T DELETE)
        
        The 'sand-box' text that I use is this:

        Code: Select all

        
        Back-references 1 contains the full FN, without the '<$F' and the '>'.
        The negative lookbehind (creating back-reference 2) is there to discard the 
         <>, < >, <  >, <->, <I>, <I*>, <N>, <CR>
        
        
        this is text this is text this is text<$Fendnote endnote endnote> 
        this is text this is text this is text this is text this is text 
        this is<$Fendnote must span new line plus blanks lines
        
        
          
        
         endnote> text this is text this is text 
        this is text<$Fendnote <>This is a <I>title<I*> and this is another <I>Title
        
        
        <I*>
        
        
         endnote<N> end<->note> this is text this is text 
        this is text this is text this is text this is text 
        
        
        this is text this is text this is 
        text this is text this is text this is text this is text this is text<$Fendnote endnote <CR>
        
        endnote> this is text this is text this is text this is text t
        his is text this is<$Fendnote <N> <I>title <I*>end<->note <>endnote> text this is text this is 
        text this is text<$Fendnote <CR>
        endnote endnote> this is text this is text this is text this is
         text this is text this is text this is text this is text this is text this is text 
        this is text this is text this is text this 
        
        
        best,
        fvg

        6,686585
        Grand MasterGrand Master
        6,686585

          Mar 28, 2013#4

          fvgfvg wrote:Have I got the find/replace line wrong?
          Yes, you have the find string in the script wrong. See point 2 in List of UltraEdit / UEStudio script commands and most common mistakes: Backslashes in strings are not escaped.

          The search string I provided adapted to a tagged regular expression for replace must be inserted into an UltraEdit script for command UltraEdit.activeDocument.findReplace.replace as:

          "<\\$F([\\s\\S]+?>)(?<!<>)(?<!< >)(?<!<  >)(?<!<I>)(?<!<I\\*>)(?<!<CR>)(?<!<N>)(?<!<->)"

          And usually used replace string in Perl syntax is <<\1>> which must be in the script  <<\\1>>.

          So the entire line in the script is:

          Code: Select all

          UltraEdit.activeDocument.findReplace.replace("<\\$F([\\s\\S]+?)>(?<!<>)(?<!< >)(?<!<  >)(?<!<I>)(?<!<I\\*>)(?<!<CR>)(?<!<N>)(?<!<->)","<<\\1>>");
          The following line is also not 100% correct:

          Code: Select all

          UltraEdit.activeDocument.findReplace.replace("close italics","<I\*>");
          The JavaScript interpreter which handles a backslash character also as escape character like the Perl regular expression engine passes to the Perl regular expression engine of UltraEdit the replace string "<I*>" and not "<I\*>". But that does not matter here as the asterisk in the replace string has no special meaning and therefore does not need to be escaped at all with a backslash in the replace string.

          Therefore this line should be:

          Code: Select all

          UltraEdit.activeDocument.findReplace.replace("close italics","<I*>");

          20
          Basic UserBasic User
          20

            Mar 28, 2013#5

            Thank you Mofi... It's working like a charm. Don't know what I'd do without you. (Looks like the RegexBuddy 'copy' command doesn't know about the particular Perl 'flavour' of UE scripts..)
            Best,
            fvgfvg