I have some files in which there are one or more <aff> tags in it, each <aff> can contain one or more of other tags like <institution type="institution">, <institution type="department">, <city>, <country>, <sup>, etc. According to the rules that needs to be followed in these files is that each group of <institution type="..."> in each <aff> must be inside another tag namely <institution-wrap>. Some files have these tag put and some don't. How do I find that?
For example, here is a sample text:
The search should find <aff id="aff1"> and <aff id="aff4"> or something similar to it from the above sample file.
NOTE: Some <aff> might not contain any <institution type="..."> at all, those will be ignored and there could be other tags containing <institution type="..."> but we only want to look inside the <aff> tags.
For example, here is a sample text:
Code: Select all
<aff id="aff1">
<institution type="institution">NSF</institution>
<institution type="department">Dept. of History</institution>
<city>New York</city>
<country>USA</country>
</aff>
<aff id="aff2"><sup>†<sup>
<institution-wrap>
<institution type="institution">NSF</institution>
<institution type="department">Dept. of History</institution>
</institution-wrap>
<city>New York</city>
<country>USA</country>
</aff>
<aff id="aff3">
<institution-wrap>
<institution type="division">NASA</institution>
</institution-wrap>
<city>New York</city>
<country>USA</country>
</aff>
<aff id="aff4">
<sup>1</sup>
<institution type="division">Caltech</institution>
</aff>
NOTE: Some <aff> might not contain any <institution type="..."> at all, those will be ignored and there could be other tags containing <institution type="..."> but we only want to look inside the <aff> tags.