Replacing parenthesis from inside a tag to outside with a replace?

Replacing parenthesis from inside a tag to outside with a replace?

81
Advanced UserAdvanced User
81

    Jul 07, 2017#1

    I'm trying to replace the parenthesis inside a certain tag to just outside of the tag i.e. if there is a opening parenthesis immediately after the tag or a closing parenthesis immediately before the closing tag. Example:

    Code: Select all

    <italic>(When a parenthetical sentence stands on its own)</italic>
    <italic>(When a parenthetical sentence stands on its own</italic>
    <italic>When a parenthetical sentence stands on its own)</italic>
    Those lines should be after replace:

    Code: Select all

    (<italic>When a parenthetical sentence stands on its own</italic>)
    (<italic>When a parenthetical sentence stands on its own</italic>
    <italic>When a parenthetical sentence stands on its own</italic>)
    However, strings like the the next three below should stay untouched.

    Code: Select all

    <italic>(When) a parenthetical sentence stands on its own</italic>
    <italic>When a parenthetical sentence stands on its (own)</italic>
    <italic>When a parenthetical sentence stands (on) its own</italic>
    But the following strings:

    Code: Select all

    <italic>((When) a parenthetical sentence stands on its own</italic>
    <italic>((When) a parenthetical sentence stands on its own)</italic>
    <italic>(When) a parenthetical sentence stands on its own)</italic>
    <italic>When a parenthetical sentence stands on its (own))</italic>
    <italic>(When a parenthetical sentence stands on its (own)</italic>
    should be after the replace(s):

    Code: Select all

    (<italic>(When) a parenthetical sentence stands on its own</italic>
    (<italic>(When) a parenthetical sentence stands on its own</italic>)
    <italic>(When) a parenthetical sentence stands on its own</italic>)
    <italic>When a parenthetical sentence stands on its (own)</italic>)
    (<italic>When a parenthetical sentence stands on its (own)</italic>
    There could be nested tags inside the <italic>...</italic> tags and a line can contain multiple <italic>...</italic> strings.

    Can this be done using Perl regex replace?

    19476
    MasterMaster
    19476

      Jul 08, 2017#2

      Hi Don,

      I have some solution. Unfortunately it involves a variable lookbehind which UE does not support.
      My approach is this (I am still not sure if it covers all possible cases):

      1st step: <italic>( ---> (<italic>
      find <italic>( if the tag is not followed by a matching pair of parenthesis immediately not followed by a closing tag
      The match is allowed only within a single line.

      this is doable in UE
      Find what: (<(italic)>)(?!(\((?>(?:(?![()\r\n]).)++|(?3))*+\))(?!</\2\b))(\()
      Replace with: \4\1

      2nd step: )</italic> ---> </italic>)
      find )</italic> if the tag is not preceded by a matching pair of parenthesis immediately not preceded by an opening tag
      The match is allowed only within a single line.

      this is not doable in UE because lookbehind must have static length. This Perl regex uses dynamic lookbehind:
      (\))(?<!(?<!<(italic)>)(\((?>(?:(?![()\r\n]).)++|(?3))*+\)))(</2\b>)

      I am trying to find a workaround to avoid this lookbehind but I am afraid I won't succeed :(

      BR, Fleggy

        Jul 08, 2017#3

        Hi Don,

        I think there is no solution especially if I consider nested tags which can contain theirs own set of parenthesis. I suggest you to use a script instead of Perl regex replace. Maybe I am wrong and the logic is simple actually but I don't see such a pattern. I am sorry I couldn't help you.

        BR, Fleggy

        81
        Advanced UserAdvanced User
        81

          Jul 08, 2017#4

          Dear,
          Fleggy

          Thank you for your valuable time and effort.
          Is it possible to use two or three different regex replace in succession to get the job done?

          19476
          MasterMaster
          19476

            Jul 08, 2017#5

            Hi Don,

            ATM only the 1st replace works in UE and it doesn't address nested tags. The search regex

            (<(italic)>)(?!(\((?>[^()\r\n]++|(?3))*+\))(?!</\2\b))(\()

            uses a recursion (in the lookahead) to find a matching pair of parenthesis regadless any nested tags which is not what you need I'm afraid. You can test if for yourself on a bigger real text if you get expected results (I don't think so).

            Find what: (<(italic)>)(?!(\((?>[^()\r\n]++|(?3))*+\))(?!</\2\b))(\()
            Replace with: \4\1

            I don't know all details of your request to be able to build a general rules how to process the text using a script. I suppose you must check the count of parenthesis between <tag> and <\tag> excluding parenthesis in nested tags in addition to any search pattern. Perhaps Mofi will help you because I've never ever used any script in UE (shame on me).
            Or if you are able to exactly descibe the replacing algorithm then I will try to "translate" it to a Perl regex replace(s). Your sample lines are too simple to deduce a correct solution (even with your comments), at least for me.

            Thanks, Fleggy

              Jul 08, 2017#6

              Hi Don,

              I still must think about it :)
              Well, the following replaces will not work in nested tags but your sample lines should be processed correctly:

              Input lines on which I put x as markers at desired positions for an easier verification:

              Code: Select all

              x<italic>((When) a parenthetical sentence stands on its own</italic>
              x<italic>((When) a parenthetical sentence stands on its own)</italic>x
              <italic>(When) a parenthetical sentence stands on its own)</italic>x
              <italic>When a parenthetical sentence stands on its (own))</italic>x
              x<italic>(When a parenthetical sentence stands on its (own)</italic>
              x<italic>(When a parenthetical sentence stands on its own)</italic>x
              <italic>(When) a parenthetical sentence stands on its (own)</italic>
              1) Put placeholders instead of matching parenthesis which are surrounded by the tag on both sides. I've chosen ~~ but you can use any other suitable placeholder.

              Find what: (?<=<(italic)>)(\(((?>[^()\r\n]++|(?2))*+)\))(?=</\1>)
              Replace with: ~~\3~~

              Intermediate result looks good:

              Code: Select all

              x<italic>((When) a parenthetical sentence stands on its own</italic>
              x<italic>~~(When) a parenthetical sentence stands on its own~~</italic>x
              <italic>(When) a parenthetical sentence stands on its own)</italic>x
              <italic>When a parenthetical sentence stands on its (own))</italic>x
              x<italic>(When a parenthetical sentence stands on its (own)</italic>
              x<italic>~~When a parenthetical sentence stands on its own~~</italic>x
              <italic>(When) a parenthetical sentence stands on its (own)</italic>
              2) Put placeholders instead of remaining matching parenthesis which are surrounded by the tag on both sides. I've chosen ^[^ and ^]^ but you can use any other suitable placeholder.

              Find what: \(((?>[^()\r\n]++|(?R))*+)\)
              Replace with: ^[^\1^]^

              Intermediate result still looks good:

              Code: Select all

              x<italic>(^[^When^]^ a parenthetical sentence stands on its own</italic>
              x<italic>~~^[^When^]^ a parenthetical sentence stands on its own~~</italic>x
              <italic>^[^When^]^ a parenthetical sentence stands on its own)</italic>x
              <italic>When a parenthetical sentence stands on its ^[^own^]^)</italic>x
              x<italic>(When a parenthetical sentence stands on its ^[^own^]^</italic>
              x<italic>~~When a parenthetical sentence stands on its own~~</italic>x
              <italic>^[^When^]^ a parenthetical sentence stands on its ^[^own^]^</italic>
              3) Now we can simply replace <italic>( or <italic>~~ to (<talic> and )</italic> or ~~</italic> to </italic>)

              Find what: (<italic>)(?:\(|~~)
              Replace with: (\1

              Find what: (?:\)|~~)(</italic>)
              Replace with: \1)

              Intermediate result still looks good:

              Code: Select all

              x(<italic>^[^When^]^ a parenthetical sentence stands on its own</italic>
              x(<italic>^[^When^]^ a parenthetical sentence stands on its own</italic>)x
              <italic>^[^When^]^ a parenthetical sentence stands on its own</italic>)x
              <italic>When a parenthetical sentence stands on its ^[^own^]^</italic>)x
              x(<italic>When a parenthetical sentence stands on its ^[^own^]^</italic>
              x(<italic>When a parenthetical sentence stands on its own</italic>)x
              <italic>^[^When^]^ a parenthetical sentence stands on its ^[^own^]^</italic>
              4) And finally replace ^[^ and ^]^ back to parenthesis.

              Find what: \^\[\^
              Replace with: (

              Find what: \^\]\^
              Replace with: )

              Here we are :)

              Code: Select all

              x(<italic>(When) a parenthetical sentence stands on its own</italic>
              x(<italic>(When) a parenthetical sentence stands on its own</italic>)x
              <italic>(When) a parenthetical sentence stands on its own</italic>)x
              <italic>When a parenthetical sentence stands on its (own)</italic>)x
              x(<italic>When a parenthetical sentence stands on its (own)</italic>
              x(<italic>When a parenthetical sentence stands on its own</italic>)x
              <italic>(When) a parenthetical sentence stands on its (own)</italic>
              Don, is it correct?

              I'll try to incorporate nested tags but I will appreciate some real sample with nested tags and not trivial parenthesis.

              Thanks, Fleggy

              81
              Advanced UserAdvanced User
              81

                Jul 09, 2017#7

                Dear Fleggy,
                There could be only other tags that are nested within <italic>...</italic>. It is not possible that an italic element is nested within an italic element. Sorry I forgot to mention that.

                Below is another sample text for you to work with:

                Code: Select all

                <sec id="sec1">
                <para>In addition, many of you will be glad to hear that <xref ref-type="disp-formula" rid="deqn1">(1)</xref> Visual Basic is now a fully object-oriented programming language <xref ref-type="disp-formula" rid="deqn3">(3)</xref>-<xref ref-type="disp-formula" rid="deqn5">(5)</xref>, with the inclusion of the long sought-after class inheritance, as well as other OOP features.</para>
                </sec>
                <para>In this chapter, you'll see how Visual Basic has evolved eq. <xref ref-type="disp-formula" rid="deqn1">1</xref>  into the VB .NET language of today and get some sense of how and why VB .NET is different from previous versions of Visual Basic.</para>
                <sec id="sec1a">
                <para>How had I ever managed living without him?
                <disp-formula id="deqn1-2">$$\phi=a+b-c^2$$</disp-formula></para>
                <para>(<italic>Gideon Cross. <xref ref-type="figure" rid="fig2">Figure 2</xref>)</italic>, table 3.</para>
                <para>This chapter surveys some of the new features of the <italic>.NET Framework <xref ref-type="disp-formula" rid="deqn2">(2)</xref>, <xref ref-type="disp-formula" rid="deqn5">(5)</xref>)</italic> that most impact the  VB developer. These include namespaces, the Common Language Runtime (CLR), and assemblies.</para>
                <para>The third and final section, <italic>(Part (III))</italic>, consists of the following appendixes</para>
                </sec>
                </sec>
                <sec id="sec2">
                <label>2.</label>
                <para>The switch <italic>(case</italic>) Statement.</para>
                <para>A discussion of language <italic>(changes <xref ref-type="disp-formula" rid="deqn6">(6)</xref> from VB 6</italic> to VB .NET).</para>
                <para>A list of <italic>VB .NET (intrinsic) constants</italic>, as well as <italic>(VB .<bold>NET</bold>)</italic> enumerations and their members.</para>
                </sec>
                Thank you very much for your time :D

                19476
                MasterMaster
                19476

                  Jul 09, 2017#8

                  Hi Don,

                  it was really challenging but I enjoyed it. Here is a solution for text containing nested tags. Firstly I had to construct a regex for matching parenthesis on a single level excluding parenthesis in nested tags. Here it is If somebody interested in:

                  (?s)\(((?>(?:(?:(?![()])(?!</?(?!WRK)\b).)++(?:<((?!WRK)[^>]++)>(((?>(?:(?!<\2\b)(?!</\2\b).)++|<\2\b[^>]*+>(?3))*+)</\2>))?)|(?R))*+)\)

                  Group #1 captures the text between parenthesis.
                  Don't be confused by the word WRK. I had an idea to use WRK as a placeholder for already processed tags but it was not necessary in the end.

                  And here are replacements for Don.

                  1) Repeat until nothing is replaced.

                  Find what: (?s)(?<=<(italic)>)(\(((?>(?:(?:(?![()])(?!</?(?!WRK)\b).)++(?:<((?!WRK)[^>]++)>(((?>(?:(?!<\4\b)(?!</\4\b).)++|<\4\b[^>]*+>(?5))*+)</\4>))?)|(?2))*+)\))(?=</\1>)
                  Replace with: ~~\3~~

                  2) Repeat until nothing is replaced.

                  Find what: (?s)\(((?>(?:(?:(?![()])(?!</?(?!WRK)\b).)++(?:<((?!WRK)[^>]++)>(((?>(?:(?!<\2\b)(?!</\2\b).)++|<\2\b[^>]*+>(?3))*+)</\2>))?)|(?R))*+)\)
                  Replace with: ^[^\1^]^

                  3) Final replacements:

                  Find what: (<italic>)(?:\(|~~)
                  Replace with: (\1

                  Find what: (?:\)|~~)(</italic>)
                  Replace with: \1)

                  Find what: \^\[\^
                  Replace with: (

                  Find what: \^\]\^
                  Replace with: )

                  BR, Fleggy

                    Jul 09, 2017#9

                    Hi Don,

                    Thanks for the "real life" sample. I used above mentioned replacements and this is the result:

                    Code: Select all

                    <sec id="sec1">
                    <para>In addition, many of you will be glad to hear that <xref ref-type="disp-formula" rid="deqn1">(1)</xref> Visual Basic is now a fully object-oriented programming language <xref ref-type="disp-formula" rid="deqn3">(3)</xref>-<xref ref-type="disp-formula" rid="deqn5">(5)</xref>, with the inclusion of the long sought-after class inheritance, as well as other OOP features.</para>
                    </sec>
                    <para>In this chapter, you'll see how Visual Basic has evolved eq. <xref ref-type="disp-formula" rid="deqn1">1</xref>  into the VB .NET language of today and get some sense of how and why VB .NET is different from previous versions of Visual Basic.</para>
                    <sec id="sec1a">
                    <para>How had I ever managed living without him?
                    <disp-formula id="deqn1-2">$$\phi=a+b-c^2$$</disp-formula></para>
                    <para>(<italic>Gideon Cross. <xref ref-type="figure" rid="fig2">Figure 2</xref></italic>), table 3.</para>
                    <para>This chapter surveys some of the new features of the <italic>.NET Framework <xref ref-type="disp-formula" rid="deqn2">(2)</xref>, <xref ref-type="disp-formula" rid="deqn5">(5)</xref></italic>) that most impact the  VB developer. These include namespaces, the Common Language Runtime (CLR), and assemblies.</para>
                    <para>The third and final section, (<italic>Part (III)</italic>), consists of the following appendixes</para>
                    </sec>
                    </sec>
                    <sec id="sec2">
                    <label>2.</label>
                    <para>The switch (<italic>case</italic>) Statement.</para>
                    <para>A discussion of language (<italic>changes <xref ref-type="disp-formula" rid="deqn6">(6)</xref> from VB 6</italic> to VB .NET).</para>
                    <para>A list of <italic>VB .NET (intrinsic) constants</italic>, as well as (<italic>VB .<bold>NET</bold></italic>) enumerations and their members.</para>
                    </sec>
                    
                    BR, Fleggyxref ref-type=

                      Jul 09, 2017#10

                      Well, my previous solution is not perfect. Here is a better one (no warranty)
                      • Fixed tag name parsing.
                      • fixed no match for (<tag></tag>).
                      • Used group names

                      Code: Select all

                      (?s)(?x)
                      (?'main'
                       \(
                       (?'innertext'
                        (?>
                         (?|
                          (?:
                           (?:
                            (?![()])
                            (?!<(?!WRK)\b)(?!</(?!WRK)\b).
                           )++
                           (?:
                            <
                            (?'tagname'
                             \w++
                            )
                            [^>]*+>
                            (?'tagbody'
                             (?>
                              (?:(?!<\k'tagname'\b)(?!</\k'tagname'\b).)++
                              |
                              <\k'tagname'\b[^>]*+>
                              (?&tagbody)
                             )*+
                             </\k'tagname'>
                            )
                           )?
                          )
                          |
                          (?:
                           <
                           (?'tagname'
                            \w++
                           )
                           [^>]*+>
                           (?'tagbody'
                            (?>
                             (?:(?!<\k'tagname'\b)(?!</\k'tagname'\b).)++
                             |
                             <\k'tagname'\b[^>]*+>
                             (?&tagbody)
                            )*+
                            </\k'tagname'>
                           )
                          )
                          |
                          (?&main)
                         )
                        )*+
                       )
                       \)
                      )

                      81
                      Advanced UserAdvanced User
                      81

                        Jul 09, 2017#11

                        Thanks a lot Fleggy :)
                        Is it the first regex that you modified?

                        19476
                        MasterMaster
                        19476

                          Jul 09, 2017#12

                          Hi Don,

                          No, this was just a generic pattern.

                          The final solution for this week :)

                          1) Repeat until nothing is replaced.

                          F:

                          Code: Select all

                          (?s)(?x)
                          (?<=<(?'maintag'italic)>)
                          (?'main'
                           \(
                           (?'innertext'
                            (?>
                             (?|
                              (?:
                               (?:
                                (?![()])
                                (?!<(?!WRK)\b)(?!</(?!WRK)\b).
                               )++
                               (?:
                                <
                                (?'tagname'
                                 \w++
                                )
                                [^>]*+>
                                (?'tagbody'
                                 (?>
                                  (?:(?!<\k'tagname'\b)(?!</\k'tagname'\b).)++
                                  |
                                  <\k'tagname'\b[^>]*+>
                                  (?&tagbody)
                                 )*+
                                 </\k'tagname'>
                                )
                               )?
                              )
                              |
                              (?:
                               <
                               (?'tagname'
                                \w++
                               )
                               [^>]*+>
                               (?'tagbody'
                                (?>
                                 (?:(?!<\k'tagname'\b)(?!</\k'tagname'\b).)++
                                 |
                                 <\k'tagname'\b[^>]*+>
                                 (?&tagbody)
                                )*+
                                </\k'tagname'>
                               )
                              )
                              |
                              (?&main)
                             )
                            )*+
                           )
                           \)
                          )
                          (?=</\k'maintag'>)
                          
                          R:

                          Code: Select all

                          ~~$+{innertext}~~
                          2) Repeat until nothing is replaced.

                          F:

                          Code: Select all

                          (?s)(?x)
                          (?'main'
                           \(
                           (?'innertext'
                            (?>
                             (?|
                              (?:
                               (?:
                                (?![()])
                                (?!<(?!WRK)\b)(?!</(?!WRK)\b).
                               )++
                               (?:
                                <
                                (?'tagname'
                                 \w++
                                )
                                [^>]*+>
                                (?'tagbody'
                                 (?>
                                  (?:(?!<\k'tagname'\b)(?!</\k'tagname'\b).)++
                                  |
                                  <\k'tagname'\b[^>]*+>
                                  (?&tagbody)
                                 )*+
                                 </\k'tagname'>
                                )
                               )?
                              )
                              |
                              (?:
                               <
                               (?'tagname'
                                \w++
                               )
                               [^>]*+>
                               (?'tagbody'
                                (?>
                                 (?:(?!<\k'tagname'\b)(?!</\k'tagname'\b).)++
                                 |
                                 <\k'tagname'\b[^>]*+>
                                 (?&tagbody)
                                )*+
                                </\k'tagname'>
                               )
                              )
                              |
                              (?&main)
                             )
                            )*+
                           )
                           \)
                          )
                          
                          R:

                          Code: Select all

                          ^[^$+{innertext}^]^
                          3) Final replacements:

                          F:

                          Code: Select all

                          (<italic>)(?:\(|~~)
                          R:

                          Code: Select all

                          (\1
                          F:

                          Code: Select all

                          (?:\)|~~)(</italic>)
                          R:

                          Code: Select all

                          \1)
                          F:

                          Code: Select all

                          \^\[\^
                          R:

                          Code: Select all

                          (
                          F:

                          Code: Select all

                          \^\]\^
                          R:

                          Code: Select all

                          )
                          BR, Fleggy

                          81
                          Advanced UserAdvanced User
                          81

                            Jul 09, 2017#13

                            Holy cow :o the first two regex are huge. I'm not sure whether I can use those in a script for replace in multiple files :(

                            Anyways,
                            Thanks Fleggy. Really appreciate your time and effort :D

                            19476
                            MasterMaster
                            19476

                              Jul 09, 2017#14

                              Yes, they are a little bit more complex :)
                              At first glance your task looks simple. However, the devil is in the detail.

                              81
                              Advanced UserAdvanced User
                              81

                                Jul 09, 2017#15

                                Yes. I was hoping that I could use this replace technique to also do a similar replace for two hex codes (&#x201C; and &#x201D;) immediately after the <italic> tag and immediately before the </italic> respectively i.e. similar thing, just instead of a opening parenthesis there is &#x201c; and for closing parenthesis there is &#x201d;.

                                But now I'm clueless.

                                Anyways, thanks for helping me the way you did. :mrgreen:

                                Read more posts (3 remaining)