Replacing xml codes within a block of text

Replacing xml codes within a block of text

2
NewbieNewbie
2

    Oct 23, 2012#1

    How do I select a block of text and do replaces on the xml only within the block.
    within the <contrib-group> I need to delete <name> replace <surname> with <SN>; </surname> with </SN>; <given-names> with <FN>; </given-names> with </FN>; <degrees> with <DEG>; </degrees> with </DEG>.
    These tags are also in another place in my file but I need to name them something else. I'm really new to this stuff and any help would be much appreciated.

    Example of my data

    <title-group>
    <article-title>Neoadjuvant Accelerated Concomitant Boost Radiotherapy and Multidrug Chemotherapy in Locally Advanced Rectal Cancer</article-title>
    <subtitle>A Dose-Escalation Study</subtitle>
    </title-group>
    <contrib-group>
    <contrib contrib-type="author">
    <name><surname>Caravatta</surname><given-names>Luciana</given-names>
    </name><degrees>MD</degrees>
    <xref ref-type="aff" rid="aff1">&#x002A;</xref>
    </contrib>
    <contrib contrib-type="author">
    <name><surname>Picardi</surname><given-names>Vincenzo</given-names>
    </name><degrees>MD</degrees>
    <xref ref-type="aff" rid="aff1">&#x002A;</xref>
    </contrib>
    </contrib-group>
    <aff id="aff1">Departments of <label>&#x002A;</label>Radiation Oncology</aff>
    <aff id="aff2"><label>&#x2020;</label>Palliative Therapies</aff>

    6,604548
    Grand MasterGrand Master
    6,604548

      Oct 23, 2012#2

      There are several methods to make the replaces as you need.

      The first one does not make any selections to run the replaces only on selected text. Instead it uses a tagged regular expression with UltraEdit engine to simple search for blocks

      Code: Select all

      <contrib contrib-type="author">
      <name><surname>Caravatta</surname><given-names>Luciana</given-names>
      </name><degrees>MD</degrees>
      and reformats such blocks to

      Code: Select all

      <contrib contrib-type="author">
      <SN>Caravatta</SN><FN>Luciana</FN>
      <DEG>MD</DEG>
      The macro is:

      Code: Select all

      InsertMode
      ColumnModeOff
      HexOff
      Top
      UltraEditReOn
      Find MatchCase RegExp "^(<contrib contrib-type=*^p^)<name><surname>^(*^)</surname><given-names>^(*^)</given-names>*^p</name><degrees>^(*^)</degrees>"
      Replace All "^1<SN>^2</SN><FN>^3</FN>^p<DEG>^4</DEG>"
      This is the fastest method, but works only if all your blocks in the XML file look like your example including the whitespace characters - no spaces/tabs at beginning of lines, no trailing spaces/tabs, DOS line terminators used as posted here. If small variations exist, the regular expression could be modified to match all of them.

      The second method runs a loop selecting always everything from <contrib-group> to </contrib-group> using the Find Select feature (holding Shift key while clicking on button Find Next when doing it manually). A shorter UltraEdit tagged regular expression Replace All than above is used to reformat the tags within the selection. The result for your example is the same, but UltraEdit needs longer to finish.

      The macro is:

      Code: Select all

      InsertMode
      ColumnModeOff
      HexOff
      Top
      UltraEditReOn
      Loop 0
      Find MatchCase "<contrib-group>"
      IfNotFound
      ExitLoop
      EndIf
      StartSelect
      Find MatchCase Select "</contrib-group>"
      Find MatchCase RegExp SelectText "<name><surname>^(*^)</surname><given-names>^(*^)</given-names>*^p</name><degrees>^(*^)</degrees>"
      Replace All "<SN>^1</SN><FN>^2</FN>^p<DEG>^3</DEG>"
      EndSelect
      Key HOME
      EndLoop
      Top

      The third method runs also a loop selecting always everything from <contrib-group> to </contrib-group>, but using the Perl regular expression engine with advanced option to let the dot also match new line characters. 4 Perl tagged regular expression Replace All are executed on every selection to reformat the 2 lines to the wanted output.

      This method does not depend on XML structure. But with making the selection just once although every replace modifies the selection (deleting characters), it could fail to make all replaces correct. I have watched in the past that UltraEdit could not always re-apply the selection after a Replace All in the selection correct and in such cases it was necessary to reselect the block after every replace by appropriate commands again. But for your example it worked with just 1 find for selecting and 4 replaces within selection for reformatting.

      The macro is:

      Code: Select all

      InsertMode
      ColumnModeOff
      HexOff
      Top
      PerlReOn
      Loop 0
      Find MatchCase RegExp "(?s)<contrib-group>.*?</contrib-group>"
      IfNotFound
      ExitLoop
      EndIf
      Find MatchCase RegExp SelectText "</*name>"
      Replace All ""
      Find MatchCase RegExp SelectText "(</*)surname>"
      Replace All "\1SN>"
      Find MatchCase RegExp SelectText "(</*)given-names>"
      Replace All "\1FN>"
      Find MatchCase RegExp SelectText "(</*)degrees>"
      Replace All "\1DEG>"
      CancelSelect
      EndLoop
      Top

      2
      NewbieNewbie
      2

        Oct 24, 2012#3

        Mofi,
        Thank you so much :P
        I used the second method and it worked.