Hi, new to UltraEdit. I have been looking through the macros and search and replace forum (to no avail) to see if I could figure out how to do the following:
I have an OCR scanned book which consists of 40 chapters each of which are individual HTML pages. I have bookmarked each of the original pages with a simple format p001 - p600 while marking up the text.
I have now arrived at the index which have OCR scanned fine and the entries alphabetical (of course) have one or more page references primary in the form [space]7. [space]27. [space]127.
What I wish to do is find each number in this form and check which chapter it belongs to (i.e. from a manually created table within the macro) and then change the entry to
The second form used in the index is for multiple pages 97-102 of which there are a handful and could be done manually so long as they are avoided with the main macro.
I think, I have successfully created the macro you want.
First check your index file for bad page references like <p>Yellin. 463</p> where no '-' or '.' is after the page number before you run the macros on the index file. You can search for such numbers with following UltraEdit style regex string: [0-9]++[0-9]++[0-9][~.^-]
There are few such bad page references which must be fixed manually.
The macro needs the macro property Continue if a Find with Replace not found checked. You should run this macro only with the index file open. No other file should be opened when starting the macro. And please modify the red marked path specifications to your htm files in the FindInFiles and the regex Find command.
I hope you have never more than 1 <a name="P???"></a> in a single line in your htm files because this would result in a not solved number to link conversion because the macro deletes everything of a line after <a name="P???"></a>.
I have not used the content file to get the infos for the page references. I have used a find in files command. It's better to work with original source.
The loop after the FindInFiles command is a workaround for a bug of UltraEdit. The FindInFiles result edit window has not always the focus after the command was executed.
UE v12.00 produces an Unicode result file and there are some problems with regex replaces on Unicode files. So I temporary switch to hex mode to check if the result file is an Unicode file (because you use v12 of UE) and if so convert it to ASCII before the macro continues to reformat the result with some regular expression replaces to a list of URLs in the format you want. The auto detection of Unicode result file and conversion lets the macro work for v12 of UE and also all previous versions which produces an ASCII result file.
After the result file is converted into the list of URLs, the macro switches back to the index file and sets the cursor position to begin of the index list. This is necessary because the head contains also some "number." entries.
Then the first macro below searches in a loop for page numbers (a number followed by a '-' or '.') and replaces it with the appropriate link from the result file.
While writing this explanation, I thought that this is a slow approach and found a faster solution. So this first macro is just for education on UE macro writing.
InsertMode
ColumnModeOff
HexOff
UnixReOff
FindInFiles MatchCase RegExp PreserveCase Log "F:\Temp\chapter\" "*.htm" "<a name="P[0-9]+">"
Loop
Find RegExp Up "%Search complete, found "
IfFound
DeleteLine
ExitLoop
Else
NextWindow
EndIf
EndLoop
Top
HexOn
Find "00"
IfFound
HexOff
Top
UnicodeToASCII
Else
HexOff
EndIf
Find "----------------------------------------^p"
Replace All ""
Find RegExp "%Find '*^p"
Replace All ""
Find RegExp "%Found '*^p"
Replace All ""
Find RegExp "%F:\Temp\chapter\^(*/[0-9]+: ^)*<a name="P"
Replace All "^1<a name="P"
Find RegExp "</a>*$"
Replace All "</a>"
Find RegExp "%^(*^)/[0-9]+: <a name="P^(0++^)^([0-9]+^)"></a>"
Replace All "<a href="^1#P^2^3" title="Page ^3" target="_self">^3</a>"
NextWindow
Top
Find "<h1>INDEX"
IfNotFound
ExitMacro
EndIf
Clipboard 9
Loop
Find RegExp "[0-9]+[^-.]"
IfNotFound
ExitLoop
EndIf
StartSelect
Key LEFT ARROW
Copy
EndSelect
PreviousWindow
Top
Find "Page ^c"
IfFound
Key HOME
StartSelect
Key END
Copy
EndSelect
EndIf
NextWindow
Paste
EndLoop
ClearClipboard
Clipboard 0
So here is the second version which is much faster. It copies each hyperlink from the result file into the index file and replaces all appropriate page numbers in the index file with the hyperlink. That was a little bit tricky because you have not used leading zeros for the page numbers and so a page number like 25 also exists in 125, 225, 325, ... The solution was to use find option MatchWord and because a '-' is not a word delimiter the numbers with following a '-' has to be converted to something different before running the replaces and convert it back to a single '-' at the loop exit. This macro is much faster because window switching is decreased to the number of pages instead of number of page references.
InsertMode
ColumnModeOff
HexOff
UnixReOff
FindInFiles MatchCase RegExp PreserveCase Log "F:\Temp\chapter\" "*.htm" "<a name="P[0-9]+">"
Loop
Find RegExp Up "%Search complete, found "
IfFound
DeleteLine
ExitLoop
Else
NextWindow
EndIf
EndLoop
Top
HexOn
Find "00"
IfFound
HexOff
Top
UnicodeToASCII
Else
HexOff
EndIf
Find "----------------------------------------^p"
Replace All ""
Find RegExp "%Find '*^p"
Replace All ""
Find RegExp "%Found '*^p"
Replace All ""
Find RegExp "%F:\Temp\chapter\^(*/[0-9]+: ^)*<a name="P"
Replace All "^1<a name="P"
Find RegExp "</a>*$"
Replace All "</a>"
Find RegExp "%^(*^)/[0-9]+: <a name="P^(0++^)^([0-9]+^)"></a>"
Replace All "<a href="^1#P^2^3" title="Page ^3" target="_self">^3</a>"
NextWindow
Top
Find "<h1>INDEX"
IfNotFound
ExitMacro
EndIf
EndSelect
Key END
"
"
Find RegExp "^([0-9]+-^)"
Replace All "^1---- "
Loop
PreviousWindow
IfEof
CloseFile NoSave
DeleteLine
Find "----- "
Replace All "-"
ExitLoop
EndIf
Clipboard 9
StartSelect
Key END
Copy
EndSelect
Key HOME
Find "Page"
EndSelect
Key RIGHT ARROW
SelectWord
Clipboard 8
Copy
EndSelect
Key HOME
Key DOWN ARROW
NextWindow
Clipboard 9
Paste
StartSelect
Key HOME
EndSelect
Clipboard 8
Find MatchWord "^c."
Replace All "^s." Key RIGHT ARROW
Key LEFT ARROW
StartSelect
Key END
EndSelect
Find MatchWord "^c-----"
Replace All "^s-----" Key LEFT ARROW
Key RIGHT ARROW
StartSelect
Key HOME
Delete
EndSelect
EndLoop
ClearClipboard
Clipboard 9
ClearClipboard
Clipboard 0
For adding navigation links before words with A, B, C ... you can use this macro which also needs Continue if a Find with Replace not found checked, although there is no replace used (wrong property title). Run this macro after the page number to URL conversion macro.
InsertMode
ColumnModeOff
HexOff
Top
Find "<h1>INDEX"
IfNotFound
ExitMacro
EndIf
Find MatchCase "<p>A"
IfFound
EndSelect
Key HOME
"
Insert here your HTML code for the navigation links for letter A
"
EndIf
Find MatchCase "<p>B"
IfFound
EndSelect
Key HOME
"
Insert here your HTML code for the navigation links for letter B
"
EndIf
And so on till
Find MatchCase "<p>Z"
IfFound
EndSelect
Key HOME
"
Insert here your HTML code for the navigation links for letter Z
"
EndIf
Add UnixReOn or PerlReOn (v12+ of UE) at the end of the macros if you do not use UltraEdit style regular expressions by default - see search configuration. Macro command UnixReOff sets the regular expression option to UltraEdit style.
Best regards from an UC/UE/UES for Windows user from Austria
Thank you very much, I just 'idly' looked at the forum, the most I expected was an acknowledgement of 'receipt', not the complete macros! which I hope to use this evening, again thanks a lot also for the explanation, which hopefully will also allow me to learn to use the UltraEdit macro languages
I have now run the main index macro,
On first run I saw several 'unindexed' numbers I realised straight away that these were 'missing' or wrongly named bookmarks and indeed was able to correct most easily, running a second time I managed to find a few more 'mistakes in the index file were missing spaces caused a few misses.
However there seems to be a bug, and I don't think it's the code
with the 123-125. type of entry it seems to be adding a partial anchor, but for the number immediately preceding the first
at first I thought it was only happening were the entry wasn't the first on a line, but finding one dissavowed that
from the first run I saw one that had been correct were now 'wrong' and a third run confirmed this - so no ryme or reason, and I have edited out the 'additions' using DW's search and replace.
You are right. There was a bug in the macro. I edited the macro code to fix it. See the inserted blue lines. If there is no page reference to replace for a specific URL, the selection is still present after the replace all. So the macro must always unselect the URL which can be done by a simple cursor move.
Best regards from an UC/UE/UES for Windows user from Austria
Thanks again, didn't have as great a success with the a-z it seems to call out for a loop, but I ended up doing it manually. In basic (20 years ago) I would have used the character codes for the loop and swapped them to ASCII for the compare, would this have been possible here?
The a-z macro is not a loop and cannot be realized with a loop. It's just a record of what you have done manually or exactly what I think you wanted to do manually. The macro language of UE does not support variables, math expressions or conditions like a compare. So the a-z macro cannot be done with a simple loop as every programmer would do it with a real programming language.
Best regards from an UC/UE/UES for Windows user from Austria