Here is my problem. I cannot seem to get this right. I have loads of HTML files in the following format, and below is the HTML I would like to get out. See, since they follow a standard format I am hoping that I Can just translate it into a format that will be hyperlinked by chapter and display a table of contents, any help is appreciated. Replacing all the tags is not all that important to me, as there are alot of macros that do that and it is fairly simple to do,, the anchor tags ares the most important, I could even live without the TOC in the macro.
Oh, I almost forgot, the tricky thing is the A NAME= tag, cause some how you would have to auto-generate the name, like chapter_## as you go, that seems to be where I am getting the most hangup, that and the fact that the TOC is at the end, so you don't know the HREF="# ahead of time. Although, you could just drop it right next to the NAME (above it) and then do a regex and strip all those out, and then paste them in the begining. But how do you auto number those names?
I was considering doing a regex and grabbing everything between the <H></H> tags, then sticking that text into a clipboard, removing all the spaces, converting it to lowercae, then using that to the anchor name, but i would be SOOOO much easier if I could just use an incrementer and loop while I found the "# expression and stuck that incremented number in there. I hope that makes it clearer? If not, I can explain it further.
Oh, I almost forgot, the tricky thing is the A NAME= tag, cause some how you would have to auto-generate the name, like chapter_## as you go, that seems to be where I am getting the most hangup, that and the fact that the TOC is at the end, so you don't know the HREF="# ahead of time. Although, you could just drop it right next to the NAME (above it) and then do a regex and strip all those out, and then paste them in the begining. But how do you auto number those names?
I was considering doing a regex and grabbing everything between the <H></H> tags, then sticking that text into a clipboard, removing all the spaces, converting it to lowercae, then using that to the anchor name, but i would be SOOOO much easier if I could just use an incrementer and loop while I found the "# expression and stuck that incremented number in there. I hope that makes it clearer? If not, I can explain it further.
Code: Select all
<h2>Title of Training Manual</h2>
<p>
Yada Yada Yada, this is what this training manual is about
</p>
<table border>
<tr>
<td>Chapter 1</td>
<td>
Yada (...)
</td>
</tr>
</table>
then becomes....
<h2>Title of Training Manual</h2>
<p>
Yada Yada Yada, this is what this training manual is about
</p>
<A NAME="toc">Table Of Contents</A>
<A HREF="#chapter_1">Chapter 1</A>
<HR />
<A NAME="chapter_1">Chapter 1</A>
Yada (...)