Script/macro to detect incorrect tagging in XML file

Script/macro to detect incorrect tagging in XML file

5
NewbieNewbie
5

    Sep 10, 2014#1

    I am reviewing some XML files where I need to check whether all sections, figures, tables, bibliographical references have been properly tagged or not.
    I consolidated all links in a single file and I am reviewing them.

    http://pastebin.com/yDAyvhHp

    ALL SECTION LINKS:

    Code: Select all

    <xref ref-type="sec" rid="ss2">Section 2</xref>
    <xref ref-type="sec" rid="ss2">Section 2</xref>
    <xref ref-type="sec" rid="ss2">Section 2</xref>
    <xref ref-type="sec" rid="ss2">Section 2</xref>
    <xref ref-type="sec" rid="ss3">Section 3</xref>
    <xref ref-type="sec" rid="ss2-7">Section 2.7</xref>
    <xref ref-type="sec" rid="ss2">Section 2</xref>
    <xref ref-type="sec" rid="ss2">Section 2</xref>
    ALL FIGURE LINKS:

    Code: Select all

    <xref ref-type="fig" rid="f1">Fig. 1</xref>
    <xref ref-type="fig" rid="f2">Fig. 2A</xref>
    <xref ref-type="fig" rid="f2">Fig. 2B</xref>
    <xref ref-type="fig" rid="f3">Fig. 3A</xref>
    <xref ref-type="fig" rid="f3">B</xref>
    <xref ref-type="fig" rid="f4">Fig. 4</xref>
    <xref ref-type="fig" rid="f1">Fig. 1</xref>
    <xref ref-type="fig" rid="f2">Fig. 2</xref>
    <xref ref-type="fig" rid="f2">Fig. 2</xref>
    <xref ref-type="fig" rid="f2">Fig. 2</xref>
    ALL TABLE LINKS:

    Code: Select all

    <xref ref-type="table" rid="t1">Table 1</xref>
    <xref ref-type="table" rid="t1">Table 1</xref>
    <xref ref-type="table" rid="t1">Table 1</xref>
    <xref ref-type="table" rid="t1">Table 1</xref>
    <xref ref-type="table" rid="t1">Table 1</xref>
    <xref ref-type="table" rid="t1">Table 1</xref>
    <xref ref-type="table" rid="t1">Table 1</xref>
    <xref ref-type="table" rid="t2">Table 2</xref>
    <xref ref-type="table" rid="t1">Table 1</xref>
    <xref ref-type="table" rid="t1">Table 1</xref>
    BIBLIOGRAPHICAL REFERENCE LINKS:

    Code: Select all

    <xref ref-type="bibr" rid="b1">[1</xref>
    <xref ref-type="bibr" rid="b3">3]</xref>
    <xref ref-type="bibr" rid="b4">[4</xref>
    <xref ref-type="bibr" rid="b7">7]</xref>
    <xref ref-type="bibr" rid="b5">[5</xref>
    <xref ref-type="bibr" rid="b6">6]</xref>
    <xref ref-type="bibr" rid="b8">[8</xref>
    <xref ref-type="bibr" rid="b9">9]</xref>
    <xref ref-type="bibr" rid="b10">[10</xref>
    <xref ref-type="bibr" rid="b13">13]</xref>
    I am looking for a pattern to detect if a figure/table/section/reference has been tagged incorrectly or not and if found find/replace with the correct one.
    For example:

    Code: Select all

    <xref ref-type="fig" rid="f3">Fig. 1</xref>
    is incorrect and needs to be

    Code: Select all

    <xref ref-type="fig" rid="f1">Fig. 1</xref>
    Same for tables, sections and bibliographical references.
    Regards,
    Sandeep
    It is easy to be born, it is difficult to be a human being.:)

    6,604548
    Grand MasterGrand Master
    6,604548

      Sep 10, 2014#2

      A quickly recorded macro solution:

      Code: Select all

      InsertMode
      ColumnModeOff
      HexOff
      Top
      PerlReOn
      Find MatchCase RegExp "(ref-type="sec" rid=").*?(">Section )(\d+)<"
      Replace All "\1ss\3\2\3<"
      Find MatchCase RegExp "(ref-type="sec" rid=").*?(">Section )(\d+)\.(\d+)<"
      Replace All "\1ss\3-\4\2\3.\4<"
      Find MatchCase RegExp "(ref-type="fig" rid=").*?(">Fig. )(\d+)"
      Replace All "\1f\3\2\3"
      Find MatchCase RegExp "(ref-type="table" rid=").*?(">Table )(\d+)"
      Replace All "\1t\3\2\3"
      Find MatchCase RegExp "(ref-type="bibr" rid=").*?(">\[)(\d+)"
      Replace All "\1b\3\2\3"
      Find MatchCase RegExp "(ref-type="bibr" rid=").*?(">)(\d+)\]"
      Replace All "\1b\3\2\3]"
      The Perl regular expression replaces modify all references even if nothing needs to be changed for a reference and therefore nothing really changed on a line. But that does not matter.

      The macro does not contain a solution for <xref ref-type="fig" rid="f3">B</xref>. This reference is not found.
      Best regards from an UC/UE/UES for Windows user from Austria

      5
      NewbieNewbie
      5

        Sep 10, 2014#3

        Instead of trying out the code suggested by you, I figured out an alternative solution.

        Let's say I am checking whether all bibliographical references have been properly tagged or not and I have the following content to check.
        [Please observe that the second and the fourth entries are incorrect.]

        File name(A.xml)

        Code: Select all

        <xref ref-type="bibr" rid="b1">[1</xref>
        <xref ref-type="bibr" rid="b4">3]</xref>
        <xref ref-type="bibr" rid="b4">[4</xref>
        <xref ref-type="bibr" rid="b9">7]</xref>
        <xref ref-type="bibr" rid="b5">[5</xref>
        <xref ref-type="bibr" rid="b6">6]</xref>
        <xref ref-type="bibr" rid="b8">[8</xref>
        <xref ref-type="bibr" rid="b9">9]</xref>
        <xref ref-type="bibr" rid="b10">[10</xref>
        <xref ref-type="bibr" rid="b13">13]</xref>
        Steps:

        Step 1: Replace

        Code: Select all

        ">
        with

        Code: Select all

        ">^p
        .
        Step 2: Find by

        Code: Select all

        <xref ref-type="bibr" rid="b[0-9]+
        and copy the contents to a new file(content below)

        Code: Select all

        <xref ref-type="bibr" rid="b1
        <xref ref-type="bibr" rid="b4
        <xref ref-type="bibr" rid="b4
        <xref ref-type="bibr" rid="b9
        <xref ref-type="bibr" rid="b5
        <xref ref-type="bibr" rid="b6
        <xref ref-type="bibr" rid="b8
        <xref ref-type="bibr" rid="b9
        <xref ref-type="bibr" rid="b10
        <xref ref-type="bibr" rid="b13
        In the new file, replace

        Code: Select all

        <xref ref-type="bibr" rid="b
        with none. We would be left with the following content:

        Code: Select all

        1
        4
        4
        9
        5
        6
        8
        9
        10
        13
        Copy-paste the contents on a fresh excel worksheet.

        Step 3: Go back to A.xml where we are left with the following content:

        Code: Select all

        ">
        [1</xref>
        ">
        3]</xref>
        ">
        [4</xref>
        ">
        7]</xref>
        ">
        [5</xref>
        ">
        6]</xref>
        ">
        [8</xref>
        ">
        9]</xref>
        ">
        [10</xref>
        ">
        13]</xref>

        Replace

        Code: Select all

        ">
        with none.
        Replace

        Code: Select all

        </xref>
        with none.
        Replace

        Code: Select all

        [
        and

        Code: Select all

        ]
        with none. *We need to ensure that the Regular Expressions: UltraEdit option is checked off during the last two replaces.*
        We would be left with the following content:

        Code: Select all

        1
        3
        4
        7
        5
        6
        8
        9
        10
        13
        Copy-paste the contents just like we did earlier. Refer the screenshot below:


        We just need to compare the values in the two columns. Screenshots below :)


        Regards,
        Sandeep
        It is easy to be born, it is difficult to be a human being.:)

        21
        Basic UserBasic User
        21

          Sep 12, 2014#4

          Just a quick one.

          Perform all operations on an.XML file in UE.

          Code: Select all

          if (UltraEdit.document.length > 0) {
           UltraEdit.insertMode();
           if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
           else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();
          
           var size1 = '<xref ref-type="bibr" rid="b'.length;
           var size2 = '</xref>'.length;
          
           // Move caret to top of the active file.
           UltraEdit.activeDocument.selectAll();
           if ( UltraEdit.activeDocument.isSel() ) {
           var sLineTerm = "\r\n";
           if (typeof(UltraEdit.activeDocument.lineTerminator) == "number") {
           if (UltraEdit.activeDocument.lineTerminator == 1) sLineTerm = "\n";
           else if (UltraEdit.activeDocument.lineTerminator == 2) sLineTerm = "\r";
           }
           var asLines = UltraEdit.activeDocument.selection.split( sLineTerm );
          
           // Find lines with same terms as on another line and remove them.
           for ( var nLine = 0; nLine < ( asLines.length - 1 ); nLine++) {
           if ( asLines[nLine].match( '<xref ref-type="bibr" rid="b[0-9]+' ) ) {
           var _t = asLines[nLine].substring( size1, asLines[nLine].length - size2 );
           _t = _t.replace( /\[/g, '' );
           _t = _t.replace( /\]/g, '' );
           var _arrT = _t.split( '">' );
           if ( _arrT.length != 2 )
           UltraEdit.activeDocument.write( "Error" + sLineTerm );
           else
           if ( _arrT[0] == _arrT[1] )
           UltraEdit.activeDocument.write( "correct " + _arrT[0] + " = " + _arrT[1] + sLineTerm );
           else
           UltraEdit.activeDocument.write( "wrong " + _arrT[0] + " = " + _arrT[1] + sLineTerm );
           }
           }
           }
          }