I have some files which has lots of url's in it which I need to put inside a tag, say <uri>...</uri>.
There are some url's which already has the tag, so I want to find those which are not inside the tag <uri>...</uri>
and <email>...</email> as emails also have the keywords .com, .gov, .in, .org, .ftp, .net in them.
sample text:
The regex I'm currently using is: (\.com|\.gov|\.in|\.org|\.ftp|\.net)(?!(</email>|</uri>))
Which is not perfect as it only works when each of the strings .com, .gov, .in, .org, .ftp, .net are immediately followed by
</email> or </uri>
Can anyone help?
There are some url's which already has the tag, so I want to find those which are not inside the tag <uri>...</uri>
and <email>...</email> as emails also have the keywords .com, .gov, .in, .org, .ftp, .net in them.
sample text:
Code: Select all
<p>IEEE Aerospace and Electronic Systems Magazine is a monthly magazine that publishes articles concerned with the various aspects of systems for space, air, ocean, or ground environments <email>[email protected]</email> as well as news and information of interest to IEEE Aerospace and <email>[email protected]</email> Electronic Systems Society members (ieee.org).</p>
<p>The boundaries of acceptable subject matter has been intentionally left flexible so that the Magazine amiac.lio.in/se can follow the research activities, technology applications and future trends (http://gogl.net/oli?nom=14) to better meet the needs of the members of the IEEE ieee.com.op Aerospace and Electronic Systems Society. IEEE <uri>ieeexplore.ieee.org/themes</uri> Aerospace and Electronic Systems Magazine articles apprise readers of new developments, new applications of cornerstone technology, and news of society members, meetings, and related items.</p>
<p>A description for this result is not available because of this site's robots.txt</p>
Which is not perfect as it only works when each of the strings .com, .gov, .in, .org, .ftp, .net are immediately followed by
</email> or </uri>
Can anyone help?