Let us assume that
- the element pet exists always only within the element vetPatient and
- the element pet exists always just once within the element vetPatient and
- the element breed also exists always only within the element vetPatient and
- the element breed also exists always just once within the element vetPatient and
- the element breed is always below element pet within the element vetPatient.
In this case can be created a
copy of the directory containing all the XML files to search for the data.
Next run a
Perl regular expression Replace in Files with checked option
Match case on the
copy of the directory searching in all
*.xml files for a string matching the regular expression
(<pet>Dog
</pet>)(?:[\s\S](?!</vetPatient>))*?(<breed>.+?</breed>) and using the expression
\1\2 as replace expression.
The case sensitive
Perl regular expression Replace in Files changes an XML content like
Code: Select all
<vetPatient>
<pet>Dog</pet>
<breed>AnyBreed</breed>
</vetPatient>
<vetPatient>
<pet>Other</pet>
<breed>AnyBreed</breed>
</vetPatient>
<vetPatient>
<pet>Dog</pet><breed>AnyBreed 2</breed>
</vetPatient>
<vetPatient>
<pet>Dog</pet>
<breed></breed>
</vetPatient>
<vetPatient>
<pet>Dog</pet>
</vetPatient>
<vetPatient>
<pet>Dog</pet>
<other>whatever</other>
<breed>AnyBreed 3</breed>
</vetPatient>
to the following XML content:
Code: Select all
<vetPatient>
<pet>Dog</pet><breed>AnyBreed</breed>
</vetPatient>
<vetPatient>
<pet>Other</pet>
<breed>AnyBreed</breed>
</vetPatient>
<vetPatient>
<pet>Dog</pet><breed>AnyBreed 2</breed>
</vetPatient>
<vetPatient>
<pet>Dog</pet>
<breed></breed>
</vetPatient>
<vetPatient>
<pet>Dog</pet>
</vetPatient>
<vetPatient>
<pet>Dog</pet><breed>AnyBreed 3</breed>
</vetPatient>
Now should be opened
Advanced - Settings or Configuration - Search - Find output format and unchecked the options
Header,
File summary and
Find summary as just the lines with the two elements are of interest for the final output file.
Next can be executed a case sensitive
Perl regular expression Find in Files with checked option
Results to edit window with the search expression
<pet>Dog
</pet><breed>.+?</breed> and UltraEdit creates a new UTF-16 encoded text file with the found lines in all XML files in the
copy of the directory.
The created file could look like this:
Code: Select all
C:\Temp\Test\Test1.xml(2): <pet>Dog</pet><breed>AnyBreed</breed>
C:\Temp\Test\Test1.xml(11): <pet>Dog</pet><breed>AnyBreed 2</breed>
C:\Temp\Test\Test2.xml(2): <pet>Dog</pet><breed>AnyBreed</breed>
C:\Temp\Test\Test2.xml(9): <pet>Dog</pet><breed>AnyBreed 2</breed>
C:\Temp\Test\Test2.xml(19): <pet>Dog</pet><breed>AnyBreed 3</breed>
A case sensitive
Perl regular expression Find in Files executed from top of the file
** Find Results ** with the search expression
^(?:.+?\\
)+(.+\.xml
)\(
[0
-9
]+\):
.+?<breed>(.+?)</breed>.*$ and the replace expression
\1\2 would reformat the results output to a valid CSV file as long as no
breed value contains the character
" with the following lines for the example above:
Code: Select all
"Test1.xml","AnyBreed"
"Test1.xml","AnyBreed 2"
"Test2.xml","AnyBreed"
"Test2.xml","AnyBreed 2"
"Test2.xml","AnyBreed 3"
The
** Find Results ** should be saved now as *.csv file without or with conversion of the file from UTF-16 to UTF-8 or to ANSI. The
copy of the directory should be deleted finally.