Remove certain tags with specific tag?

Remove certain tags with specific tag?

81
Advanced UserAdvanced User
81

    May 03, 2018#1

    Hi, I want to remove all the opening and closing tags: <italic>, </italic>, <bold> and </bold> from inside of  every <caption><p>...</p></caption> and <caption><title>...</title></caption> in the file, i.e. the contents inside <caption><p>...</p></caption> and <caption><title>...</title></caption> must not contain any opening or closing tags of <italic> and <bold>.
    Sample file:

    Code: Select all

    <p>Note, the ability for <italic>a plane wave</italic> to transfer angular momentum to matter is sometimes considered as 'paradoxi-cal'</p>
    <p>The paper is organized as follows.
    <fig id="d1">
    <caption><p>What does <italic>in vito</italic> mean?</p></caption>
    <graphic xlink:href="00057_fig_page_1_3.jpg"/>
    </fig>
    </p>
    <p>The presence of the momentum of a wave results in radiation pressure on a particle.</p>
    <table id="t1">
    <label>Table 1.</label>
    <caption><title>Notations <bold>for the</bold> statement <italic>of</italic> problem.</title></caption>
    <graphic xlink:href="00057_table_page_2_2.jpg"/>
    </table>
    <p>Proceeding from the conservation law for the system &#x2018;<bold>radiation-particle</bold>&#x2019;, one obtains:</p>
    <p>Let us designate the result of integration as
    <fig id="f2">
    <caption><p>Momentum And Angular Momentum Of Plane Electromagnetic Wave</p></caption>
    <graphic xlink:href="00057_fig_page_7_1.jpg"/>
    </fig>
    </p>
    Desired output:

    Code: Select all

    <p>Note, the ability for <italic>a plane wave</italic> to transfer angular momentum to matter is sometimes considered as 'paradoxi-cal'</p>
    <p>The paper is organized as follows.
    <fig id="d1">
    <caption><p>What does in vito mean?</p></caption>
    <graphic xlink:href="00057_fig_page_1_3.jpg"/>
    </fig>
    </p>
    <p>The presence of the momentum of a wave results in radiation pressure on a particle.</p>
    <table id="t1">
    <label>Table 1.</label>
    <caption><title>Notations for the statement of problem.</title></caption>
    <graphic xlink:href="00057_table_page_2_2.jpg"/>
    </table>
    <p>Proceeding from the conservation law for the system &#x2018;<bold>radiation-particle</bold>&#x2019;, one obtains:</p>
    <p>Let us designate the result of integration as
    <fig id="f2">
    <caption><p>Momentum And Angular Momentum Of Plane Electromagnetic Wave</p></caption>
    <graphic xlink:href="00057_fig_page_7_1.jpg"/>
    </fig>
    </p>
    Can this be done using a regex find and replace all? How?

    18572
    MasterMaster
    18572

      May 03, 2018#2

      Hi Don,

      this time no recursions :)
      I presume that:
      - every <caption> block is on a single line (no CR/LF inside)
      - blocks are correctly paired (the code is valid)
      - blocks <caption><p> and <caption><title> are not nested

      F: (?:<italic>|</italic>|<bold>|</bold>)(?=(?>.(?<!<caption>))*?</(?:p|title)></caption>)
      R: empty

      BR, Fleggy

      81
      Advanced UserAdvanced User
      81

        May 03, 2018#3

        Hi fleggy,
        What if the blocks are not correctly paired? Will it not work?

        18572
        MasterMaster
        18572

          May 03, 2018#4

          Hi Don,

          I am afraid that correctly paired blocks (no crossing blocks, no missing tags) are "must have" for any reliable solution.
          You have two options:
          - test, if there is a opening tag before the searched tag (lookbehind with variable length needed - not supported in UE)
          - test, if there is a closing tag after the searched tag (lookahead with variable length needed - supported in UE)
          The pattern above checks if there is the closing tag after the searched tag <bold>etc but not preceded by the opening tag.

          BR, Fleggy

          81
          Advanced UserAdvanced User
          81

            May 03, 2018#5

            Hi fleggy,
            Every <caption><p> and <caption><title> will have its closing pair but the contents of <caption><p> and/or <caption><title> could be like

            Code: Select all

            <caption><title>Notations for the</bold> statement <italic>of</italic> problem.</title></caption>
            <caption><p></italic>What does <italic>in vito</italic> mean?</p></caption>
            
            Will that cause any wrong string replace for these kind of text?

            18572
            MasterMaster
            18572

              May 03, 2018#6

              Hi Don,

              it will work. Only <caption><p> and <caption><title> must be correctly paired with no other text between closing tags. For example </p>end of text</caption> will "confuse" the regex pattern.

              BR, Fleggy

                May 03, 2018#7

                If you want to have more control over the whole process then use the pattern and replace the matched text with some safe placeholder. For example <TO_BE_DELETED>. Then you can simply check if everything has gone right and then simply remove the placeholder.