Regex to replace footnotes in text works fine in RegexBuddy - but not when placed in an UE script

Regex to replace footnotes in text works fine in RegexBuddy - but not when placed in an UE script

6
NewbieNewbie
6

    Sep 22, 2022#1

    UltraEdit 2022.1.0.100 64-bit

    I'm rewriting some code that Mofi helped me with many years ago. (Really delighted that's he's still helping us novices along. Great!)

    !´ve got used to testing all my regex code in RegexBuddy nowadays - it's a real help, I use it regularly. I can thoroughly recommend it. (Except: it has no converter for the UE flavor.)
    The following regex works just fine in RegexBuddy:

    RegexBuddy (latest version) set for "Perl 5.30-532")

    FIND STRING:

    Code: Select all

    (?P<FNplaceholder><<--1)-->>(?P<mainText>\p{Any}*?)(?P<footnote>\[1*\N.*)
    REPLACE STRING:

    Code: Select all

    $+{FNplaceholder}$+{footnote}>>$+{mainText}
    Using the "convert to java string" converter inside RegexBuddy hence produces:

    Code: Select all

    "(?P<FNplaceholder><<--1)-->>(?P<mainText>\\p{Any}*?)(?P<footnote>\\[1*\\N.*)","$+{FNplaceholder}$+{footnote}>>$+{mainText}"
    In RegexBuddy this regex works just fine. Here's what it does. It takes a text like the one below and does the following: it replaces the footnote placeholders in the main text (marked by <<--x-->> ) with the footnotes in the footnote block, at the end of the text, which start with "[x] footnote footnote<CR>"  But if I put this regex inside this UE script (which I think Mofi once wrote for me) it doesn't work.

    What am I doing wrong?
    (Once this regex is up and running, I will want to put it inside that "WHILE" loop, so that it iterates through all of the place markers in the text "<<--x-->>") (Which means my next problem is: how to turn that "1" into a variable.)

    (If it's not too expensive I'd be quite happy to join the VIP+ club.)
    Very best, and most grateful for help ...

    Code: Select all

    if (UltraEdit.document.length > 0) 
        {
           UltraEdit.insertMode();
           UltraEdit.columnModeOff();
           UltraEdit.activeDocument.hexOff();
           UltraEdit.perlReOn();       // Regex Perl
           UltraEdit.activeDocument.findReplace.mode=0;
           UltraEdit.activeDocument.findReplace.matchCase=true;
           UltraEdit.activeDocument.findReplace.matchWord=false;
           UltraEdit.activeDocument.findReplace.regExp=true;
           UltraEdit.activeDocument.findReplace.searchDown=true;
           UltraEdit.activeDocument.findReplace.searchInColumn=false;
           UltraEdit.activeDocument.findReplace.preserveCase=false;
           UltraEdit.activeDocument.findReplace.replaceInAllOpen=false;
            UltraEdit.activeDocument.findReplace.replaceAll=false;
            UltraEdit.activeDocument.top();
    
    UltraEdit.activeDocument.findReplace.replace("(?P<FNplaceholder><<--1)-->>(?P<mainText>\\p{Any}*?)(?P<footnote>\\[1*\\N.*)","$+{FNplaceholder}$+{footnote}>>$+{mainText}");   
            while(UltraEdit.activeDocument.findReplace.find("TheresSomethingToFind"))  //<------semicolon yes or no?
                { // start loop -----------------
            UltraEdit.activeDocument.top(); 
    UltraEdit.activeDocument.findReplace.replace("a","1");   
                } // <-----------end loop -------- 
    /////////////////////////////////////////////////////////////////////////////////////////////
        }    // <---END 
    // EOF

    Code: Select all

    text text text texttext  text text<<--1-->> text text text text text text text text text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text texttext text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext texttext text text text text<<--2-->> text text text text text text text text text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttextexttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text tex<<--3-->> t text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text text
    text text text text text<<--4-->> text text text text text text text text text text text text text texttext text text text text text texttext ext text text text text texttext text text text text text texttext tex
    text textt text text text texttext text  text text text<<--5-->> text text text text text text text text text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text textext text text text texttext text text textext text text text text text texttext text text text text text text
    text text text text text<<--6-->> text text text text text text text text text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text text text text text text text<<--7-->> text text text text text text text text text text text text text texttext text text text text text texttext text text text text text texttext text text texxt text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text text
    text text text text text<<--8-->> text text text text text text text text text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text textext text text text text text texttext text text text text text text
    text text text text text<<--9-->> text text text text text text text text text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text texxttext text text text text text texttext text text text text<<--10-->> text text text text xt texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text<<--11-->> text text text text text text text text text text text text text texttext text text text text text texttext text text t text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text<<--12-->> text text text text text text text text text text text text text texttext text text text text text texttext text text text text text texttext text text text text text text
    
    [1] footnote footnote footnote footnote  footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote foote footnote footnote footnote footnote footnote footnote footnote footnote foofootnote footnote footnote
    [2] footnote footnote footnote footnote  footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote
    [3] footnote footnote footnote footnote  footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote
    [4] footnote footnote footnote footnote  footnote footnote footnote footnote footnote footnote
    [5] footnote footnote footnote footnote  footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote
    [6] footnote footnote footnote footnote  footnote tnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote
    [7] footnote footnote footnote footnote  footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote
    [8] footnote footnote footnote footnote  footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote fofootnote footnote
    [9] footnote footnote footnote footnote  footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote
    [10] footnote footnote footnote footnote  footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote
    [11] footnote footnote foootnote footnote footnote footnote footnote footnote footnote
    [12] footnote footnote footnote footnote  footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote footnote

    19476
    MasterMaster
    19476

      Sep 22, 2022#2

      Hi,

      this Perl Replace All replaces all footnotes at once. But preserves the list of footnotes at the end of file so you have to delete them separately.
      F: <<--(?<FNnum>\d+)-->>(?=(?s:.*?)(?<footnote>^\[\k<FNnum>\].*))
      R: <<--$+{footnote}>>

      or if you really wish to keep the placeholder number. Or tailor the replace string as you need.
      R: <<--$+{FNnum}$+{footnote}>>

      Tested on your sample in UE 2022.1.0.108 x64

      BR, Fleggy

      6
      NewbieNewbie
      6

        Sep 23, 2022#3

        Thank you very much indeed. I'll need some time to try all this out. Much appreciated!

          Sep 24, 2022#4

          @fleggy:
          This is working like a charm. What I've learnt from you:

          a) How to use the "positive lookahead" (?=)
          b) How to use the "dot matches line breaks" ?s: instead of  p{Any}
          c) That the \k really works like a variable, enabling a loop.

          I'm discovering also that simply running this in RegexBuddy has the advantage that it gives you a full log of all find/replace operations, enabling you to check to see that everything is working. (In addition to getting a full debug and documentation of all the commands used. This novice really needs that, learns from it.)

          As a mark of appreciation, I've become a VIP+ user, putting something into PayPal. Thanks again!

            Sep 25, 2022#5

            I'm still having problems in getting a regex to work in a UE script. Take the following.
            A common task in DTP (Desktop Publishing) is the identification of footnote placeholders. (Think of a docx file, containing footnotes, that's just been converted to TXT.) So, the task is to identify the sequence of footnotes starting with 1, render them unique (by turning them into <<--x-->>), while ignoring all the other digits in the text. So, with RegexBuddy, I first of all did some preliminary sorting, with:

            Code: Select all

            (?<!Chapter )(?<![ |`|.])\d+(?=[ |\)])
            Footnote placeholders:
            must precede either a blank or a ")"
            must NOT follow on a blank or a tilde ` (The latter can then be used as an escape character.)

            But this script of mine isn't working at all - I'm doing something wrong. (Nothing happens, no error messages.)
            I'd be really grateful for a helping hand.
            (The last line is an acknowledgement that this is a script that Mofi helped me with many years ago!)

            Areas where I'm unsure:
                            var counter == counter++ ;
            Those double backslashes which UE scripts needs in the regex.

            Code: Select all

            //////20220925/////// find footnote placeholders in text //////////////////////////////
            if (UltraEdit.document.length > 0) {
               UltraEdit.insertMode();
               UltraEdit.columnModeOff();
               UltraEdit.activeDocument.hexOff();
                 //UltraEdit.ueReOn();     // UltraEdit
                 //UltraEdit.unixReOn();   // Unix 
               UltraEdit.perlReOn();       // Perl
               UltraEdit.activeDocument.top();
               UltraEdit.activeDocument.findReplace.mode=0;
               UltraEdit.activeDocument.findReplace.matchCase=true;
               UltraEdit.activeDocument.findReplace.matchWord=false;
               UltraEdit.activeDocument.findReplace.regExp=true;
               UltraEdit.activeDocument.findReplace.searchDown=true;
               UltraEdit.activeDocument.findReplace.searchInColumn=false;
               UltraEdit.activeDocument.findReplace.preserveCase=false;
               UltraEdit.activeDocument.findReplace.replaceInAllOpen=false;
               UltraEdit.activeDocument.findReplace.replaceAll=true;
               var counter = 1;
               if (UltraEdit.outputWindow.visible == false) UltraEdit.outputWindow.showWindow(true);
               while(UltraEdit.activeDocument.findReplace.find("(?<!Chapter )(?<![ |`|.])\\d+(?=[ |\\)])")) 
                             { // start while -----------------
                            var currentDigit = UltraEdit.activeDocument.selection;
                             if (currentDigit == counter)
                                 { 
                                     var counter == counter++ ;
                                    UltraEdit.activeDocument.top();
                                    UltraEdit.activeDocument.findReplace.replace("(?<!Chapter )(?<![ |`|.])\\d+(?=[ |\\)])","<<--" + CurrentDigit + "-->>");
                                    UltraEdit.outputWindow.write("placeholder inserted:" + "<<--" + CurrentDigit + "-->>"); 
                                    UltraEdit.outputWindow.write("counter set to: --" + counter + "--"); 
                                } // -----end if ------
                         } // <-----------end while -------- 
            }    // <---END OF inintial "IF" IN LINE 1! (DON'T DELETE)
            // EOF  (With thanks to Mofi!)
            
            
            
            Test text here:

            Code: Select all

            ---------------------------------------------------------------------------------
            text 10% text 19th text texttext  text text1 text 1) text text text text --1-- text text text text text text text text texttext text text text text text 17th texttext text text text text text texttext text text text text text texttext 1.95 text text text text text texttext text text text text text 2022 texttext text texttext 1920 text text text text texttext text text text text text texttext text text text text text texttext text
            Chapter `2 text text text text texttext text text text text text texttext text text 30th text text text texttext texttext text text text text2) text text text text text texttext text text texttext text text texttext 
            text text text text text text text text text text text text text texttext text text text text text texttext text text text 
            Chapter 3 text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text 
            Chapter 4 text text text text texttext text text text text text texttextexttext text text text text text texttext text text text text 
            Chapter 5 text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text3 text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text text
            text text text text text4 text text text text text text text text text text text text text texttext text text text text text texttext ext text text text text texttext text text text text text texttext tex
            text textt text text text texttext text  text text text5 text text text text text text text text text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text textext text text text texttext text text textext text text text text text texttext text text text text text text text text text text text6 text text text text text text text text text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text text text text text text text7 text text text text text text text text text text text text text texttext text text text text text texttext text text text text text texttext text text texxt text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text text text text text text text8 text text text text text text text text text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text textext text text text text text texttext text text text text text text
            text text text text text9) text text text text text text text text text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text texxttext text 1023 text text text text texttext text text text text10 text text text text xt texttext 19th Century text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text11 text text text text text text text text text text text text text texttext text text text text text texttext text text t text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text12 text text text text text text text text text text text text text texttext text text text text text texttext text text text text text texttext text text text text text text
            

            19476
            MasterMaster
            19476

              Sep 26, 2022#6

              Hi,

              just two things:
              1) do not use "|" as an OR in a character class. Simply put all required characters/intervals inside []. For example [ `\.]
              2) I do not understand why you use a script when Replace All should be enough.
              and 3) :) I have zero DTP skills and knowledge yet it seems to me that your pattern (?<!Chapter )(?<![ `\.])\d+(?=[ \)]) finds many false positives.

              BR, Fleggy

              6
              NewbieNewbie
              6

                Sep 26, 2022#7

                Thank you for the help.
                I'm trying to write a regex (or a script/regex combination) that will replace the sequence: 1 2 3 4 .... 999 with <<--1-->> <<--2-->> <<--3-->> <<--4-->> <<--999-->>  (In the test text above.)
                In the good old emacs days, you could load the digit at the cursor position into a variable and let loose some if/then conditionals. If there's a workaround for this in a regex flavour out there, I'd be most interested. (I saw somewhere that JavaScript regex is starting to implement variables.)
                So for the moment, until I get more clued-up, the script has to do it: "if the current match is  1, then the next match needs to be 2 "
                (Thanks for the tip: the alternative "|" inside a [...] is superfluous.)
                For the moment, this regex is just a first filter:

                Code: Select all

                (?<![ `.])\d+(?=[ )])
                - and then the script figures out if the next match fits the sequence 1 2 3 4 ... 999  (I'd be interested to learn how to get rid of more of the false positives using the regex though...)
                Regards!
                fvg

                19476
                MasterMaster
                19476

                  Sep 27, 2022#8

                  well, considering all mentioned rules this Perl regexp matches whole numbers

                  (?<!Chapter )(?<![ `.\d])\d+(?=[ )])

                  I added \d to the negative lookbehind to eliminate false positives like 59 in 1.359.
                  I don't use scripts so I can't help you with it, sorry...

                  BR, Fleggy

                  6
                  NewbieNewbie
                  6

                    Sep 27, 2022#9

                    Thank you very much.  I'm most obliged! I've learnt a lot. Here, for completeness, are the two regexes that you've helped me develop. They're now fully functional in RegexBuddy - i.e. I'm actually using them - though I haven't got them to work in a UE script yet.  Works like a charm!
                    Footnote identifier:

                    Code: Select all

                    FOOTNOTE IDENTIFIER FIND:
                    (?<![ `.-])(?<!\d)(?<FNnum>\d+)(?=[ )\s])
                    REPLACE:
                    <<--$+{FNnum}-->>
                    
                    Footnote mover:

                    Code: Select all

                    FOOTNOTE MOVER FIND:
                    <<--(?<FNnum>\d+)-->>(?=(?s:.*?)(?<footnote>\[\k<FNnum>\].*))
                    REPLACE:
                    <<$+{FNnum}$+{footnote}>>
                    
                    Test text:

                    Code: Select all

                    text 10% text 19th text texttext  text text1 text 1) text text text text --1-- text text text text text text text text texttext text text text text text 17th texttext text text text text text texttext text text text text text texttext 1.95 text text text text text texttext text text text text text 2022 texttext text texttext 1920 text text text text texttext text text text text text texttext text text text text text texttext text
                    Chapter `2 text text text text texttext text text text text 30th text text text texttext texttext text text text text2) text text text text text texttext text text texttext text text texttext 
                    text text text text text text text text text text text text text texttext text text text text text texttext text text text 
                    Chapter 3 text text texttext text text text text text texttex
                    Chapter 4 text tetextexttext text text text text text texttext text text text text 
                    Chapter 5 text texttext text text text text text texttext text text text text text texttext text text text text text3 text text texttext text text text text text texttext t text texttext text text text text text texttext text text text text text texttext text text text text text text
                    text text text text text4 text text text text text text text text text text text text text texttext text text text text text texttext ext text text text text texttext text text text text text texttext tex
                    text textt text text text texttext text  text text text5 text text text text text text text text  t text text text text text text text text text6 text text text text text text text text text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text text text text text text text7 text text text text text text text ext text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text text text text text text text8 text text text ext text texttext text textext text text text text text texttext text text text text text text
                    text text text text text9) text text text text text text text text text text text text text t text text text text texttext text text text text text texttext text text text text texxttext text 1023 text text text text texttext text text text text10 text text text text xt texttext 19th Century text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text text texttext text text text text11 text text text text text text text text text text text text text texttext text text text text text texttext text xt text text text text texttext text text text text text texttext text text text text text texttext text text text text12 text text text text text text text text text text text text text texttext text text text text text texttext text text text text text texttext555 text text text text text text999
                    
                    FOOTNOTES
                    [1] Vgl. Peters 1993; 2008; aus dieser Perspektive auch Wessler 2018.
                    [2] Zum Verhältnis von politischer und literarischer Öffentlichkeit vgl. den Seitenblick in Habermas 2020.
                    [3] Das Kapitel zur Rolle von Zivilgesellschaft und politischer Öffentlichkeit in Faktizität und Geltung (Habermas 1992, S. 399–`467) knüpft an die Überlegungen im letzten Ka-pitel des Strukturwandel der Öffentlichkeit und vor allem an die Einleitung zur Neuaus-gabe von 1991 (Habermas 1991 [1962]) an. Zuletzt hierzu: Habermas 2008.
                    [4] Üblicherweise wählen soziologische Theorien allerdings einen grundbegrifflichen Ansatz, der den kognitiven Sinn dieser Geltungsdimension ausblendet und den Bindungseffekt der Sollgeltung auf die Androhung von Sanktionen zurückführt.
                    [5] Der Text der französischen Verfassung vom September 1791 beginnt mit einem Katalog, der zwischen droits naturels und droits civils unterscheidet. Damit hat er der zeitlichen Diskrepanz Rechnung getragen, die zwischen dem aktuellen Geltungsbereich der allgemeinen Staatsbürgerrechte und dem noch nicht realisierten, weit über die terri-torialen Grenzen des französischen Staates hinausweisenden Geltungsanspruch der »na-türlichen«, allen Personen dank ihres Menschseins gleichermaßen zustehenden Rechte besteht. Paradoxerweise behalten aber die als Grundrechte positivierten Menschen- und Bürgerrechte auch innerhalb der nationalen Grenzen den Sinn universaler Rechte und erinnern auf diese Weise die lebenden und die künftigen Generationen, wenn schon nicht an eine Selbstverpflichtung zur aktiven Verbreitung dieser Rechte, so doch an die Eigentümlichkeit des normativ überschießenden Gehalts von universalen Menschen-rechten über den provisorischen Charakter ihrer einstweilen territorial eingeschränkten Inkraftsetzung hinaus. Der moralische Überschuss hinterlässt auch in den geltenden Grundrechten Spuren eines noch nicht abgegoltenen normativen Gehalts; diese verraten etwas vom beunruhigenden Charakter einer ungesättigten Norm. Die fehlende »Sättigung« betrifft die zeitliche Dimension einer im politischen Gemeinwesen noch ausstehenden und in sachlicher Hinsicht noch zu spezifizierenden Ausschöpfung des unbestimmt überschießenden Gehalts etablierter Grundrechte ebenso wie die räumliche Dimension einer noch ausstehenden weltweiten Implementierung von Menschenrechten. 
                    [6] Vgl. Gaus 2013.
                    [7] Vgl. Habermas 2018.
                    [8] Vgl. Habermas 2009a; 2009b.
                    [9] Vgl. Seeliger, Martin; Sevignani, Sebastian »Zum Verhältnis von Öffentlichkeit und Demokratie«, in diesem Band, S. 11, spezifiziert diese Rolle unter Gesichtspunkten der Transparenz öffentlicher Angelegenheiten, der allgemeinen Orientierung der Bürger und der reziproken Rechtfertigung von Themen und Beiträgen.
                    [10] Normativ betrachtet, erfüllt die sogenannte Output-Legitimität eines Regierungshandelns, das die Bürger bei Laune hält, nicht die Bedingungen demokratisch legitimen Handelns; denn solche staatlichen Leistungen decken sich zwar mit Interessen der Bürger, ohne diese aber in Ausführung eines demokratisch gebildeten Willens der Bürger selbst zu befriedigen. 
                    [11] Vgl. meine Rezension von Cristina Lafont, Democracy without Shortcuts (in: Journal of Deliberative Democracy 16, 2 (`2020), S. 10-14).
                    [12] Vgl. Forst 2003.
                    [555] TEST