Extract data from different xml tags to make a new one

Extract data from different xml tags to make a new one

4
NewbieNewbie
4

    May 15, 2012#1

    Hello,
    I'm a beginner UltraEdit user and I'd like to share a doubt I have writing a script. I'm using UltraEdit 16.30.0.1003 version.
    This script does massive changes in a folder that contains a lot of XML files. The changes I've made at the moment are simple like:

    Code: Select all

    UltraEdit.frInFiles.replace("<bookin>", "<booking>");
    or

    Code: Select all

    UltraEdit.frInFiles.replace("<booking>We assume", "<booking>We have");
    I also do another changes that I've made using prompt function like:

    Code: Select all

    var cust = UltraEdit.getString("Please Enter a customer id:",1);
    I'd like to know if I can made a replace in an XML tag using the content I find at another ones. For example, I would like to set the content of a new tag <owner></owner> using the content I find in the tags <firstname> and <lastname>.

    For example, if i have following tags:

    Code: Select all

    	<firstname>John</firstname>
    	<lastname>Spencer</lastname>
    I want to append a new tag <owner> with the concatenation of these tags:

    Code: Select all

    	<firstname>John</firstname>
    	<lastname>Spencer</lastname>
    	<owner>John Spencer</owner>
    But I don't know how to extract this data to concatenate it.

    My script starts as follows:

    Code: Select all

    	UltraEdit.ueReOn();
    	UltraEdit.outputWindow.showWindow(true);
    	UltraEdit.frInFiles.matchCase = true;
    	UltraEdit.frInFiles.matchWord = false;
    	UltraEdit.frInFiles.regExp = true;
    	UltraEdit.frInFiles.useOutputWindow = true;
    	UltraEdit.frInFiles.logChanges = true;
    	UltraEdit.frInFiles.unicodeSearch = false;
    	UltraEdit.frInFiles.directoryStart = "C:\\amd\\";
    	UltraEdit.frInFiles.searchInFilesTypes = "*.XML";
    	//multiple replace statements in the way I've displayed above...
    Thank you very much for your comments and sorry for my beginner doubt.
    Greetings

    6,602548
    Grand MasterGrand Master
    6,602548

      May 15, 2012#2

      You can use a tagged regular expression as explained in power tip tagged expressions.

      With the UltraEdit regular expression engine executed on files with DOS line terminators you can for example use as search string

      ^(<firstname>^)^(?++^)^(</firstname>?++^p^)^([ ^t]++^)^(<lastname>^)^(?++^)^(</lastname>?++^)$

      and as replace string

      ^1^2^3^4^5^6^7^p^4<owner>^2 ^6</owner>

      The tags ^1, ^3, ^5 and ^7 are just used to keep the 2 lines with first and last name unmodified.

      The tag ^4 matching the whitespaces at start of line with last name is used to indent the new line with owner tag with the same whitespaces as the line above with last name.

      And the tags ^2 and ^6 match first and last name inserted dynamically in the replace string for every replace done in the files.

      For files with UNIX line terminators you need to use ^n instead of ^p in search and replace string.

      4
      NewbieNewbie
      4

        May 16, 2012#3

        Hello Mofi,
        thank you very much for your fast and efficient reply. It works great using Replace option at Search menu but I'd like to know how to integrate it into my replacer script.

        For example if I have the initial XML file:

        Code: Select all

        <?xml version="1.0" encoding="UTF-8"?>
        ...
        <owner>ENTER FIRSTNAME - LASTNAME</owner>
        <address>...
        ...
        <firstname>John</firstname><lastname>Spencer</lastname>
        ...
        I'd like to transform it in the following XML file:

        Code: Select all

        <?xml version="1.0" encoding="UTF-8"?>
        ...
        <owner>John - Spencer</owner>
        <address>...
        ...
        <firstname>John</firstname><lastname>Spencer</lastname>
        ...
        
        I didn't explain so good that the XML tag <owner> doesn't follow the other tags <firstname> and <lastname>. I also didn't explain that <owner> exists at the initial XML file with the default content "ENTER FIRSTNAME - LASTNAME" to be replaced.

        Another information I would let you know is that I'm modifying a lot of XML files at the same time using the command "UltraEdit.frInFiles.replace" like follows:

        Code: Select all

        	UltraEdit.ueReOn();
        	UltraEdit.outputWindow.showWindow(true);
        	UltraEdit.frInFiles.matchCase = true;
        	UltraEdit.frInFiles.matchWord = false;
        	UltraEdit.frInFiles.regExp = true;
        	UltraEdit.frInFiles.useOutputWindow = true;
        	UltraEdit.frInFiles.logChanges = true;
        	UltraEdit.frInFiles.unicodeSearch = false;
        	UltraEdit.frInFiles.directoryStart = "C:\\amd\\";
        	UltraEdit.frInFiles.searchInFilesTypes = "*.XML";
        
        	var change009a = "<!--^)*-->";
        	var change009b = "";
        	...
        	 UltraEdit.frInFiles.replace (change009a, change009b);
        	...
        I hope to have adequately explained my problem :roll: Thank you in advance for your help.

          May 17, 2012#4

          Good morning!

          I've tried with some expressions to make this replace. For example next one using 9 tagged expressions:

          Code: Select all

          var change060a = '^(<owner>^)*^(</owner>^)^(?++^)^(<firstname>^)^(?++^)^(</firstname>^)^(<lastname>^)^(?++^)^(</lastname>^)';
          var change060b = "^1^5 - ^8^2^3^4^5^6^7^8^9";
          UltraEdit.frInFiles.replace (change060a, change060b);
          But I don't know why this replacement creates a not well-formed XML file. For example: "<owner>>" or cutting the text inside one of these tags if it's a little large obtaining a wrong content.

          Thank you in advance for your help.

          6,602548
          Grand MasterGrand Master
          6,602548

            May 17, 2012#5

            In UltraEdit syntax * means any number of occurrences of any character except newline characters. While this special character is often best, in some situations it is better to use ?++ which means the same, but behavior is in some situations different to * like here.

            ^(*^) or ^(...^)*^(...^) results often in unexpected behavior while ^(?++^) or ^(...^)?++^(...^) produces the expected results. * is (most often) a non greedy expression and matches therefore nothing if possible. With tags used on search string often the part of a found string matched by * is not added to any tag although this is expected. With my years of experience in using UltraEdit regular expressions I don't wonder anymore and simply use ?++ or ?+ if * results in a tagged expression in unexpected results.

            On the other hand sometimes it is better to use * instead of ?++ or ?+ as when the computer name only should be found in a UNC file name.

            For example: \\computer name\share\directory\file name

            \\?++\ or \\?+\ can't be used to match just \\computer name\ as these expressions are greedy and match therefore everything from first \\ to last \ on the line. So for this example it matches \\computer name\share\directory\. That's good if the full path of a file name should be matched, but for matching just the computer name \\*\ is required as this expression is non greedy and matches only \\computer name\.


            Okay, back to your problem. Because an owner tag already exists in lots of files above the tags with first and last name, it would be good if a tagged regular expression Replace in Files could be used for that task. And yes, it is possible, but only on 2 conditions:
            1. The more powerful Perl regexp engine is used as UltraEdit regexp engine capabilities on multi-line replaces is limited.
            2. There are not too many bytes respectively lines between owner tag and the tags with first and last name.
            I don't know how many bytes a Perl regular expression can really match at once. I have tried once to find that out, but have had various results on running same regular expression find on various files. However, up to 32 KB are no problem and most likely 64 KB work too. But if your files have several hundred KB or even MB and the number of bytes between owner tag and the tags with first and last name can be much more than 64 KB, using a Perl regular expression Replace in Files is strongly not recommended by me as the result can be wrong.

            But before I give you the solution how to achieve the task by opening one file after the other, get first and last name into string variables or a clipboard (better for Unicode characters) and paste them as value of the owner tag, here is the solution for using a Perl regular expression Replace in Files.

            Code: Select all

            UltraEdit.perlReOn();
            UltraEdit.frInFiles.replace("(?s)(<owner>).*?(</owner>.+?<firstname>)(.+?)(</firstname>.*?<lastname>)(.+?)(</lastname>)", "\\1\\3 \\5\\2\\3\\4\\5\\6");
            UltraEdit.ueReOn();
            The explanation for (?s) can be read on topic "." in Perl regular expressions doesn't include CRLFs?

            Just (...) is in Unix/Perl syntax the same as ^(...^) in UltraEdit syntax. The tagged parts of a found string are referenced with \1, \2, ... in Unix/Perl syntax in comparison to ^1, ^2 in UltraEdit syntax. Because the backslash character is an escape character in Javascript strings, it is necessary to escape every backslash in a Unix/Perl regular expression search/replace string in scripts with an additional backslash.

            .* in Unix/Perl syntax is like ?++ in UE syntax - a greedy expression to match any characters except newline characters (or including newline characters if (?s) also used) any number of times. A question mark must be appended to make the expression non greedy. So .*? in Perl syntax is like * in UE syntax. The legacy Unix engine does not support ? to make an expression non greedy.

            The Unix/Perl equivalent for UE regexp ?+ to match 1 or more characters except newline characters is .+ which can be in Perl only made also non greedy by appending a question mark. So Perl regexp .+? matches 1 or more characters, but as less as possible to fulfill the expression. In UE syntax it is not possible that ?+ works non greedy.

            4
            NewbieNewbie
            4

              May 18, 2012#6

              Thank you very much for your great explanation :)

              I'm going to study it and I'll follow your advice to make these replacements.
              Thank you very much!