Hello!
Please, help in solving this problem!
How to clean an XML file from all the data outside the tags?
I am sure there are thousands of ways to do it but so far I fail.
Let's see how it can be done by using regular expressions in ultraedit.
Here is an example of the XML file structure I have:
BEFORE THE CLEANING:
text to clean
<wordA>
<wordB>
text to clean
<wordX>useful text X</wordX> no need to clean this
text to clean
text to clean
text to clean
<wordY>useful text Y</wordY>
no need to clean this <wordZ>useful text Z</wordZ>
text to clean
</wordC>
DESIRED RESULT (after cleaning):
<wordA>
<wordB>
<wordX>useful text X</wordX> no need to clean this
<wordY>useful text Y</wordY>
no need to clean this <wordZ>useful text Z</wordZ>
</wordC>
In the examle above:
* wordA, wordB, wordC, wordX, wordY, wordZ are any words.
* "useful text X" is any text
* "no need to clean this" is any text
I am not sure that every useful line begins with "<". There may be spaces or even junk text, which however I do not need to clean out.
Here is one suggestion:
Removing all the lines which do not contain: <*>
The following example will remove all the lines containing tags. I want to do exactly the opposite:
Find: "%*<*>*^p"
Replace with: ""
Is it possible with regular expressions?
P.S. I use Ultraedit 11.10+1, but please if you have any other suggestions about different methods to solve this, it will be interesting to see. Maybe such a cleaning is a common feature in some other software? (Suggestions for macros are also welcome and appreciated).
P.S. 2: It is not my priority, but I'm curious - is it somehow possible to obtain this with regular expressions:
<wordA>
<wordB>
<wordX>useful text X</wordX>
<wordY>useful text Y</wordY>
<wordZ>useful text Z</wordZ>
</wordC>
Thank you!
Please, help in solving this problem!
How to clean an XML file from all the data outside the tags?
I am sure there are thousands of ways to do it but so far I fail.
Let's see how it can be done by using regular expressions in ultraedit.
Here is an example of the XML file structure I have:
BEFORE THE CLEANING:
text to clean
<wordA>
<wordB>
text to clean
<wordX>useful text X</wordX> no need to clean this
text to clean
text to clean
text to clean
<wordY>useful text Y</wordY>
no need to clean this <wordZ>useful text Z</wordZ>
text to clean
</wordC>
DESIRED RESULT (after cleaning):
<wordA>
<wordB>
<wordX>useful text X</wordX> no need to clean this
<wordY>useful text Y</wordY>
no need to clean this <wordZ>useful text Z</wordZ>
</wordC>
In the examle above:
* wordA, wordB, wordC, wordX, wordY, wordZ are any words.
* "useful text X" is any text
* "no need to clean this" is any text
I am not sure that every useful line begins with "<". There may be spaces or even junk text, which however I do not need to clean out.
Here is one suggestion:
Removing all the lines which do not contain: <*>
The following example will remove all the lines containing tags. I want to do exactly the opposite:
Find: "%*<*>*^p"
Replace with: ""
Is it possible with regular expressions?
P.S. I use Ultraedit 11.10+1, but please if you have any other suggestions about different methods to solve this, it will be interesting to see. Maybe such a cleaning is a common feature in some other software? (Suggestions for macros are also welcome and appreciated).
P.S. 2: It is not my priority, but I'm curious - is it somehow possible to obtain this with regular expressions:
<wordA>
<wordB>
<wordX>useful text X</wordX>
<wordY>useful text Y</wordY>
<wordZ>useful text Z</wordZ>
</wordC>
Thank you!