Macro/script for index list creation from paragraphs indented by TAB character(s)

Rajesh · Aug 04, 2016#12016-08-04T13:31+00:00

Dear all,

I have some problem for index tagging.

Input with each 3 leading spaces being a horizontal tab character in file:

Code: Select all

<div>
<h2>A</h2>
<p>&#x201C;A to Z: Aortic Stenosis&#x201D; (The Nemours Foundation/KidsHealth&#x00AE;) 228n</p>
<p>&#x201C;A to Z: Aortic Stenosis&#x201D; (The Nemours Foundation/KidsHealth&#x00AE;) 228n</p>
   <p>&#x201C;A to Z: Pulmonic Valvular Stenosis&#x201D; (The Nemours Foundation/KidsHealth&#x00AE;) 237n</p>
      <p>abdomen, defined 737</p>
         <p>abdominal aortic aneurysm, depicted <em>300</em></p>
<p>abnormal heart murmurs, described 240</p>
<p>abnormal heart murmurs, described 240</p>
   <p>About KidsHealth, contact 745</p>
   <p>ACAS <em>see</em> asymptomatic carotid atherosclerosis study</p>
   <p>ACE inhibitors <em>see</em> angiotensin-converting enzyme inhibitors</p>
<p>acetaminophen, vasculitis 347</p>
   <p>acquired heart valve disease</p>
   <p>heart murmurs 240</p>
</div>
<div>
<h2>B</h2>
<p>&#x201C;A to Z: Aortic Stenosis&#x201D; (The Nemours Foundation/KidsHealth&#x00AE;) 228n</p>
<p>&#x201C;A to Z: Aortic Stenosis&#x201D; (The Nemours Foundation/KidsHealth&#x00AE;) 228n</p>
<p>&#x201C;A to Z: Aortic Stenosis&#x201D; (The Nemours Foundation/KidsHealth&#x00AE;) 228n</p>
   <p>&#x201C;A to Z: Pulmonic Valvular Stenosis&#x201D; (The Nemours Foundation/KidsHealth&#x00AE;) 237n</p>
      <p>abdomen, defined 737</p>
<p>abdominal aortic aneurysm, depicted <em>300</em></p>
<p>abnormal heart murmurs, described 240</p>
   <p>About KidsHealth, contact 745</p>
<p>ACAS <em>see</em> asymptomatic carotid atherosclerosis study</p>
   <p>ACE inhibitors <em>see</em> angiotensin-converting enzyme inhibitors</p>
<p>acetaminophen, vasculitis 347</p>
   <p>acquired heart valve disease</p>
   <p>heart murmurs 240</p>
</div>

Output with each 3 leading spaces being a horizontal tab character in file:

Code: Select all

<div>
<h2>A</h2>
<ul><li>&#x201C;A to Z: Aortic Stenosis&#x201D; (The Nemours Foundation/KidsHealth&#x00AE;) 228n</li></ul>
<ul><li>&#x201C;A to Z: Aortic Stenosis&#x201D; (The Nemours Foundation/KidsHealth&#x00AE;) 228n
   <ul><li>&#x201C;A to Z: Pulmonic Valvular Stenosis&#x201D; (The Nemours Foundation/KidsHealth&#x00AE;) 237n
      <ul><li>abdomen, defined 737
         <ul><li>abdominal aortic aneurysm, depicted <em>300</em></li></ul></li></ul></li></ul></li></ul>
<ul><li>abnormal heart murmurs, described 240</li></ul>
<ul><li>abnormal heart murmurs, described 240
   <ul><li>About KidsHealth, contact 745</li>
   <li>ACAS <em>see</em> asymptomatic carotid atherosclerosis study</li>
   <li>ACE inhibitors <em>see</em> angiotensin-converting enzyme inhibitors</li></ul></li></ul>
<ul><li>acetaminophen, vasculitis 347
   <ul><li>acquired heart valve disease</li>
   <li>heart murmurs 240</li></ul></li></ul>
</div>
<div>
<h2>B</h2>
<ul><li>&#x201C;A to Z: Aortic Stenosis&#x201D; (The Nemours Foundation/KidsHealth&#x00AE;) 228n</li></ul>
<ul><li>&#x201C;A to Z: Aortic Stenosis&#x201D; (The Nemours Foundation/KidsHealth&#x00AE;) 228n</li></ul>
<ul><li>&#x201C;A to Z: Aortic Stenosis&#x201D; (The Nemours Foundation/KidsHealth&#x00AE;) 228n
   <ul><li>&#x201C;A to Z: Pulmonic Valvular Stenosis&#x201D; (The Nemours Foundation/KidsHealth&#x00AE;) 237n
      <ul><li>abdomen, defined 737</li></ul></li></ul></li></ul>
<ul><li>abdominal aortic aneurysm, depicted <em>300</em></li></ul>
<ul><li>abnormal heart murmurs, described 240
   <ul><li>About KidsHealth, contact 745</li></ul></li></ul>
<ul><li>ACAS <em>see</em> asymptomatic carotid atherosclerosis study
   <ul><li>ACE inhibitors <em>see</em> angiotensin-converting enzyme inhibitors</li></ul></li></ul>
<ul><li>acetaminophen, vasculitis 347
   <ul><li>acquired heart valve disease</li>
   <li>heart murmurs 240</li></ul></li></ul>
</div>

I created a macro for this tagging. Some time the macro is not working properly.

Here is my macro for your reference:

Code: Select all

InsertMode
ColumnModeOff
HexOff
Key Ctrl+HOME
PerlReOn
Find RegExp "^(?:(?:\r?\n|\r))+"
Replace All ""
PerlReOn
Find RegExp ",(\d+)"
Replace All ", \1"
PerlReOn
Find RegExp " </(em|i|b)>"
Replace All "</\1> "
PerlReOn
Find RegExp "  "
Replace All " "
PerlReOn
Find RegExp ",</(em|i|b)>"
Replace All "</\1>,"
PerlReOn
Find RegExp ",(\d+)"
Replace All ", \1"
PerlReOn
Find RegExp "(<span class="page" title="[^<>\r\n"]*?" data-seq="[^<>\r\n"]*?"/>)\r\n(\t+<p>)"
Replace All "\2\1"
PerlReOn
Find RegExp "^(\t\t\t\t)<p>(.+?)</p>"
Replace All "\1<li>\2</li>"
PerlReOn
Find RegExp "^(\t\t\t)<p>(.+?)</p>"
Replace All "\1<li>\2</li>"
PerlReOn
Find RegExp "^(\t\t)<p>(.+?)</p>"
Replace All "\1<li>\2</li>"
PerlReOn
Find RegExp "^(\t)<p>(.+?)</p>"
Replace All "\1<li>\2</li>"
PerlReOn
Find RegExp "^<p>(.+?)</p>"
Replace All "<ul><li>\1</li></ul>"
PerlReOn
Find RegExp "(</li>\r\n)^(\t\t\t\t)(.*\r\n(?:\2.*\r\n)*)"
Replace All "\r\n\2<ul>\3</ul>\1</ul>\r\n"
PerlReOn
Find RegExp "\r\n</ul></li>\r\n</ul>\r\n"
Replace All "</ul></li></ul>\r\n"
PerlReOn
Find RegExp "(</li>\r\n)^(\t\t\t)(.*\r\n(?:\2.*\r\n)*)"
Replace All "\r\n\2<ul>\3\1</ul>\r\n"
PerlReOn
Find RegExp "\r\n</li>\r\n</ul>\r\n"
Replace All "</li></ul>\r\n"
PerlReOn
Find RegExp "(</li>\r\n)^(\t\t)(.*\r\n(?:\2.*\r\n)*)"
Replace All "\r\n\2<ul>\3\1</ul>\r\n"
PerlReOn
Find RegExp "\r\n</li>\r\n</ul>\r\n"
Replace All "</li></ul>\r\n"
PerlReOn
Find RegExp "(</li></ul>\r\n)^(\t)(.*\r\n(?:\2.*\r\n)*)"
Replace All "\r\n\2<ul>\3\1"
PerlReOn
Find RegExp "\r\n</li></ul>\r\n"
Replace All "</li></ul>\r\n"
PerlReOn
Find RegExp "</li></li></ul>"
Replace All "</li></ul></li></ul>"
PerlReOn
Find RegExp "</ul>\r\n\t"
Replace All "\r\n\t"

Then I want to tag it by scripting for better tagging/time save.
Can any one help for that. It will be very help full for me.

Rajesh

Mofi · Aug 09, 2016#22016-08-09T13:12+00:00

Coding the macro which means using the right Perl regular expression Replace All using backreferences is much easier when the output file is structured as defined by HTML/XHTML standard (also with each 3 leading spaces being a horizontal tab character in file).

Code: Select all

<div>
<h2>A</h2>
<ul>
   <li>&#x201C;A to Z: Aortic Stenosis&#x201D; (The Nemours Foundation/KidsHealth&#x00AE;) 228n</li>
   <li>&#x201C;A to Z: Aortic Stenosis&#x201D; (The Nemours Foundation/KidsHealth&#x00AE;) 228n
      <ul>
         <li>&#x201C;A to Z: Pulmonic Valvular Stenosis&#x201D; (The Nemours Foundation/KidsHealth&#x00AE;) 237n
            <ul>
               <li>abdomen, defined 737
                  <ul>
                     <li>abdominal aortic aneurysm, depicted <em>300</em></li>
                  </ul>
               </li>
            </ul>
         </li>
      </ul>
   </li>
   <li>abnormal heart murmurs, described 240</li>
   <li>abnormal heart murmurs, described 240
      <ul>
         <li>About KidsHealth, contact 745</li>
         <li>ACAS <em>see</em> asymptomatic carotid atherosclerosis study</li>
         <li>ACE inhibitors <em>see</em> angiotensin-converting enzyme inhibitors</li>
      </ul>
   </li>
   <li>acetaminophen, vasculitis 347
      <ul>
         <li>acquired heart valve disease</li>
         <li>heart murmurs 240</li>
      </ul>
   </li>
</ul>
</div>
<div>
<h2>B</h2>
<ul>
   <li>&#x201C;A to Z: Aortic Stenosis&#x201D; (The Nemours Foundation/KidsHealth&#x00AE;) 228n</li>
   <li>&#x201C;A to Z: Aortic Stenosis&#x201D; (The Nemours Foundation/KidsHealth&#x00AE;) 228n</li>
   <li>&#x201C;A to Z: Aortic Stenosis&#x201D; (The Nemours Foundation/KidsHealth&#x00AE;) 228n
      <ul>
         <li>&#x201C;A to Z: Pulmonic Valvular Stenosis&#x201D; (The Nemours Foundation/KidsHealth&#x00AE;) 237n
            <ul>
               <li>abdomen, defined 737</li>
            </ul>
         </li>
      </ul>
   </li>
   <li>abdominal aortic aneurysm, depicted <em>300</em></li>
   <li>abnormal heart murmurs, described 240
      <ul>
         <li>About KidsHealth, contact 745</li>
      </ul>
   </li>
   <li>ACAS <em>see</em> asymptomatic carotid atherosclerosis study
      <ul>
         <li>ACE inhibitors <em>see</em> angiotensin-converting enzyme inhibitors</li>
      </ul>
   </li>
   <li>acetaminophen, vasculitis 347
      <ul>
         <li>acquired heart valve disease</li>
         <li>heart murmurs 240</li>
      </ul>
   </li>
</ul>
</div>

Here is the macro code which produces above output from input data:

Code: Select all

InsertMode
ColumnModeOff
HexOff
Top
TrimTrailingSpaces
PerlReOn
Find MatchCase RegExp "^(?:(?:\r?\n|\r))+"
Replace All ""
Top
Find MatchCase RegExp ",(?=\d)"
Replace All ", "
Find RegExp " </(em|i|b)>"
Replace All "</\1> "
Top
Find MatchCase RegExp "  "
Replace All " "
Find RegExp ",</(em|i|b)>"
Replace All "</\1>,"
Find MatchCase RegExp ",(?=\d)"
Replace All ", "
Find RegExp "(<span class="page" title="[^<>\r\n"]*?" data-seq="[^<>\r\n"]*?"/>)\r\n(\t+<p>)"
Replace All "\2\1"
Find RegExp "^<p"
Replace "<ul>\r\n<p"
Find RegExp "(</\w+>(?<!</p>)(?:\r\n)+)<p"
Replace All "\1<ul>\r\n<p"
Bottom
IfColNumGt 1
InsertLine
DeleteToStartofLine
EndIf
"<"
Top
Find RegExp "(</?)p>"
Replace All "\1li>"
Find MatchCase RegExp "^(?=\t{3}<)"
Replace All "\t\t\t\t"
Find MatchCase RegExp "^(?=\t\t<)"
Replace All "\t\t\t"
Find MatchCase RegExp "^(?=\t<)"
Replace All "\t\t"
Find MatchCase RegExp "^(?=<li)"
Replace All "\t"
Find MatchCase RegExp "^(\t<li>.+)</li>(?=\r\n\t{3}<)"
Replace All "\1\r\n\t\t<ul>"
Top
Find MatchCase RegExp "^(\t{3}<li>.+)</li>(?=\r\n\t{5}<)"
Replace All "\1\r\n\t\t\t\t<ul>"
Top
Find MatchCase RegExp "^(\t{5}<li>.+)</li>(?=\r\n\t{7}<)"
Replace All "\1\r\n\t\t\t\t\t\t<ul>"
Top
Find MatchCase RegExp "^(\t{7}<li>.+</li>\r\n)(?=\t{0,5}<)"
Replace All "\1\t\t\t\t\t\t</ul>\r\n\t\t\t\t\t</li>\r\n"
Top
Find MatchCase RegExp "^(\t{5}</?li>(?:.+</li>)?\r\n)(?=\t{0,3}<)"
Replace All "\1\t\t\t\t</ul>\r\n\t\t\t</li>\r\n"
Top
Find MatchCase RegExp "^(\t{3}</?li>(?:.+</li>)?\r\n)(?=\t?<)"
Replace All "\1\t\t</ul>\r\n\t</li>\r\n"
Top
Find MatchCase RegExp "^(\t</?li>(?:.+</li>)?\r\n)(?=<)"
Replace All "\1</ul>\r\n"
Bottom
Key BACKSPACE
Top

Running a Perl regular expression Replace All which modifies number of lines requires an explicit move of caret to top of file depending on version of UltraEdit/UEStudio because of a bug in affected versions of UE/UES. A Replace All from top of file should keep the caret at top of file which is the case for non regular expression, any UltraEdit and any Unix regular expression Replace All. But on using a Perl regular expression Replace All the caret is positioned somewhere in the file if the number of lines is modified by the replace.

Rajesh · Aug 10, 2016#32016-08-10T07:22+00:00

Thanks Mofi.

Samir · Aug 23, 2016#42016-08-23T13:30+00:00

Hi Mofi,

Your macro for nested list is working fine.

But when there is no header tag like "h2" then there is the problem that the opening and closing tags are not generated properly for most outer list.

So I am trying to do it with script as follows:

Code: Select all

function getActiveDocumentIndex()
{
   if (typeof(UltraEdit.activeDocumentIdx) == "number")
   {
      return UltraEdit.activeDocumentIdx;
   }
   for (var i = 0; i < UltraEdit.document.length; i++)
   {
      if (UltraEdit.activeDocument.path == UltraEdit.document[i].path)
      {
         return i;
      }
   }
   return (-1);
}

if (UltraEdit.document.length > 0)
{
   UltraEdit.insertMode();
   UltraEdit.columnModeOff();
   var nActiveDocIndex = getActiveDocumentIndex();
   if (UltraEdit.activeDocument.isSel())
   {
      UltraEdit.selectClipboard(9);
      UltraEdit.activeDocument.copy();
      UltraEdit.newFile();
      UltraEdit.activeDocument.paste();
      UltraEdit.clearClipboard();
      UltraEdit.selectClipboard(0);
      UltraEdit.activeDocument.key("CTRL+END");
      UltraEdit.activeDocument.write("\r\n");

      UltraEdit.activeDocument.top();
      UltraEdit.perlReOn();
      UltraEdit.activeDocument.findReplace.mode=0;
      UltraEdit.activeDocument.findReplace.matchCase=true;
      UltraEdit.activeDocument.findReplace.matchWord=false;
      UltraEdit.activeDocument.findReplace.regExp=true;
      UltraEdit.activeDocument.findReplace.searchDown=true;
      UltraEdit.activeDocument.findReplace.preserveCase=false;
      UltraEdit.activeDocument.findReplace.replaceAll=true;
      UltraEdit.activeDocument.findReplace.replaceInAllOpen=false;
      UltraEdit.activeDocument.findReplace.replace("(\\r?\\n){2,}","$1");
      UltraEdit.activeDocument.findReplace.replace(",(?=\\d)",", ");
      UltraEdit.activeDocument.findReplace.replace(" </(em|i|b)>","</\\1> ");
      UltraEdit.activeDocument.findReplace.replace("  +"," ");
      UltraEdit.activeDocument.findReplace.replace(",</(em|i|b)>","</\\1>,");
      UltraEdit.activeDocument.findReplace.replace(",(?=\\d)",", ");
      UltraEdit.activeDocument.findReplace.replace("(<p class=\"font-serif\"><span id=\"page\\w+\" class=\"page-number\">Page \\w+</span></p>|<span id=\"page\\w+\" class=\"page-number\">Page \\w+</span>|<p class=\"pagenumber\" id=\"page\\w+\">\[page \\w+\]</p>|<span[^<>\\r\\n/]*?></span>|<span[^<>\\r\\n/]*?/>|<a[^<>\\r\\n/]*?></a>|<a[^<>\\r\\n/]*?/>)[\\r\\n]+(\\t+<p[^<>\\r\\n/]*?>|<p[^<>\\r\\n/]*?>)","\\2\\1");
      UltraEdit.activeDocument.findReplace.replace("<p[^<>\\r\\n/]*?>","<li>");
      UltraEdit.activeDocument.findReplace.replace("(</?)p>","\\1li>");
      UltraEdit.activeDocument.findReplace.replace("^((?:<li.+?</li>[\\r\\n]*|\\t{1,}<li.+?</li>[\\r\\n]*)+)","<ol>\\r\\n$1</ol>\\r\\n");
      UltraEdit.activeDocument.findReplace.replace("(</li>[\\r\\n]+)^((?:\\t{1,}<li.+?</li>[\\r\\n]*)+)","\\r\\n\\t<ol>\\r\\n$2\\t</ol>\\r\\n\\1");
      UltraEdit.activeDocument.findReplace.replace("(</li>[\\r\\n]+)^((?:\\t{2,}<li.+?</li>[\\r\\n]*)+)","\\r\\n\\t\\t<ol>\\r\\n$2\\t\\t</ol>\\r\\n\\t\\1");
      UltraEdit.activeDocument.findReplace.replace("(</li>[\\r\\n]+)^((?:\\t{3,}<li.+?</li>[\\r\\n]*)+)","\\r\\n\\t\\t\\t<ol>\\r\\n$2\\t\\t\\t</ol>\\r\\n\\t\\t\\1");
      UltraEdit.activeDocument.findReplace.replace("(</li>[\\r\\n]+)^((?:\\t{4,}<li.+?</li>[\\r\\n]*)+)","\\r\\n\\t\\t\\t\\t<ol>\\r\\n$2\\t\\t\\t\\t</ol>\\r\\n\\t\\t\\t\\1");
      UltraEdit.activeDocument.findReplace.replace("(</li>[\\r\\n]+)^((?:\\t{5,}<li.+?</li>[\\r\\n]*)+)","\\r\\n\\t\\t\\t\\t\\t<ol>\\r\\n$2\\t\\t\\t\\t\\t</ol>\\r\\n\\t\\t\\t\\t\\1");
      UltraEdit.activeDocument.key("CTRL+END");
      UltraEdit.activeDocument.findReplace.searchDown=false;
      UltraEdit.activeDocument.findReplace.find("\\r\\n");
      UltraEdit.activeDocument.key("DEL");
      UltraEdit.activeDocument.top();
      UltraEdit.activeDocument.selectAll();
      UltraEdit.selectClipboard(8);
      UltraEdit.activeDocument.copy();
      UltraEdit.closeFile(UltraEdit.activeDocument.path,2);
      UltraEdit.document[nActiveDocIndex].paste();
      UltraEdit.clearClipboard();
      UltraEdit.selectClipboard(0);
   }
}

I am running it with UEStudio v11.

When I run my script on complete attached input file (later deleted), there are some problems with lists ending unexpected.

Please help me solving the problem.

Mofi · Aug 23, 2016#52016-08-23T21:00+00:00

I modified my macro in my previous post to

insert a line with <ul> above first line beginning with <p> independent if there is something above like <p> is at top of the file;
insert a line with <ul> between each non empty line ending with an end tag not being </p> like </h2> or </div> and a line beginning with <p>;
make sure that also the last list items/lists have an end tag even if the file ends with a line with </p> without or with a line termination.

The output produced with your script by selecting in your input file all paragraphs in body is valid completely on using latest UltraEdit/UEStudio. So I don't see any reason for wrong listing. Also removing all paragraphs above <p><b>15 THE LYMPHATIC, selecting the remaining paragraphs and running the script produces correct output displayed as expected in browser.

Edit: The macro code in my previous post has been optimized additionally for inserting </ul> with less regular expression replaces than before.

Samir · Aug 24, 2016#62016-08-24T09:09+00:00

Hi Mofi,

I am using UEStudio v11.00.0.1011. When I run my script on a small file, it is working fine.

But when I run the script on a larger file then incorrect reformats occur on a lines near 500, 600-700, ...

I modified my script and replaced <ol by <ul.

1. After running script on larger file, there are wrong list tags in block:

Code: Select all

<li><b>15 THE LYMPHATIC (LYMPHOID) SYSTEM AND IMMUNITY 520</b>
	<ul>
	<li><b>15.1 The Concept of Immunity 521</b></li>
	</ul>
</li>
</ul>
	<li><b>15.2 Lymphatic System Structure and Functions 521</b></li>
<ul>
		<li><b>Structure 521</b>
	<ul>
		<li><b>Functions 521</b></li>
	<li><b>15.3 Lymphatic Vessels and Lymph Circulation 523</b>
		<ul>
		<li><b>Lymphatic Capillaries 523</b></li>
		<li><b>Lymph Trunks and Ducts 524</b></li>
		<li><b>Formation and Flow of Lymph 524</b></li>
		</ul>
	</li>
	<li><b>15.4 Lymphatic Organs and Tissues 526</b>
		<ul>
		<li><b>Thymus 526</b></li>
		<li><b>Lymph Nodes 527</b></li>
		<li><b>Spleen 529</b></li>
		<li><b>Lymphatic Nodules 531</b></li>
		</ul>
	</li>
	<li><b>15.5 Principal Groups of Lymph Nodes 531</b></li>
	<li><b>15.6 Development of Lymphatic Tissues 542</b></li>
	<li><b>15.7 Aging and the Lymphatic System 542</b></li>

1a. This block is correct after running script on a smaller file on which lines above were removed before:

Code: Select all

<li><b>15 THE LYMPHATIC (LYMPHOID) SYSTEM AND IMMUNITY 520</b>
	<ul>
	<li><b>15.1 The Concept of Immunity 521</b></li>
	<li><b>15.2 Lymphatic System Structure and Functions 521</b>
		<ul>
		<li><b>Structure 521</b></li>
		<li><b>Functions 521</b></li>
		</ul>
	</li>
	<li><b>15.3 Lymphatic Vessels and Lymph Circulation 523</b>
		<ul>
		<li><b>Lymphatic Capillaries 523</b></li>
		<li><b>Lymph Trunks and Ducts 524</b></li>
		<li><b>Formation and Flow of Lymph 524</b></li>
		</ul>
	</li>
	<li><b>15.4 Lymphatic Organs and Tissues 526</b>
		<ul>
		<li><b>Thymus 526</b></li>
		<li><b>Lymph Nodes 527</b></li>
		<li><b>Spleen 529</b></li>
		<li><b>Lymphatic Nodules 531</b></li>
		</ul>
	</li>
	<li><b>15.5 Principal Groups of Lymph Nodes 531</b></li>
	<li><b>15.6 Development of Lymphatic Tissues 542</b></li>
	<li><b>15.7 Aging and the Lymphatic System 542</b></li>
	<li><b><em>Key Medical Terms Associated with the Lymphatic System and Immunity 544</em></b></li>
	<li><b><em>Chapter Review and Resource Summary 544</em></b></li>
	<li><b><em>Critical Thinking Questions 545</em></b></li>
	<li><b><em>Answers to Figure Questions 545</em></b></li>
	</ul>
</li>

2. The are also wrong list tags in this block after running script on larger file:

Code: Select all

	<li><b>26.4 Birth Control Methods and Abortion 870</b>
		<ul>
		<li><b>Birth Control Methods 870</b>
			<ul>
			<li><b>Surgical Sterilization 870</b></li>
			<li><b>Non-incisional Sterilization 870</b></li>
			<li><b>Hormonal Methods 871</b></li>
			</ul>
		</li>
		</ul>
	</li>
			<li><b>Intrauterine Devices 871</b>
		<ul>
			<li><b>Spermicides 871</b>
			<ul>
			<li><b>Barrier Methods 871</b></li>
			<li><b>Periodic Abstinence 872</b></li>
			</ul>
		</li>
		<li><b>Abortion 872</b></li>
		</ul>
	</li>
	</ul>
</li>
</ul>
	<li><b>26.5 Development of the Reproductive Systems 872</b></li>
<ul>
	<li><b>26.6 Aging and the Reproductive Systems 874</b>
	<ul>
	<li><b><em>Key Medical Terms Associated with the Reproductive Systems 875</em></b></li>
	<li><b><em>Chapter Review and Resource Summary 876</em></b></li>
	<li><b><em>Critical Thinking Questions 878</em></b></li>
	<li><b><em>Answers to Figure Questions 878</em></b></li>
	</ul>
</li>
<li><b>27 SURFACE ANATOMY 880</b>

2a. But this block is correct reformatted after running script in reduced file:

Code: Select all

	<li><b>26.4 Birth Control Methods and Abortion 870</b>
		<ul>
		<li><b>Birth Control Methods 870</b>
			<ul>
			<li><b>Surgical Sterilization 870</b></li>
			<li><b>Non-incisional Sterilization 870</b></li>
			<li><b>Hormonal Methods 871</b></li>
			<li><b>Intrauterine Devices 871</b></li>
			<li><b>Spermicides 871</b></li>
			<li><b>Barrier Methods 871</b></li>
			<li><b>Periodic Abstinence 872</b></li>
			</ul>
		</li>
		<li><b>Abortion 872</b></li>
		</ul>
	</li>
	<li><b>26.5 Development of the Reproductive Systems 872</b></li>
	<li><b>26.6 Aging and the Reproductive Systems 874</b></li>
	<li><b><em>Key Medical Terms Associated with the Reproductive Systems 875</em></b></li>
	<li><b><em>Chapter Review and Resource Summary 876</em></b></li>
	<li><b><em>Critical Thinking Questions 878</em></b></li>
	<li><b><em>Answers to Figure Questions 878</em></b></li>

I do not understand why this different outputs depending on file size occur in UEStudio v11.00.0.1011.

I can see the same problem when I am manually running the replaces for nested list in larger file, but in small file everything is correct reformatted.

Please help me solve this problem.

Mofi · Aug 24, 2016#72016-08-24T19:20+00:00

This is your script rewritten to do everything in memory for the selected block in active file:

Code: Select all

if (UltraEdit.document.length > 0)
{
   if (UltraEdit.document[0].isSel())
   {
      UltraEdit.insertMode();
      UltraEdit.columnModeOff();
      var sIndexBlock = UltraEdit.activeDocument.selection;
      var nNewlinesAppended = 0;
      if (sIndexBlock.substr(sIndexBlock.length - 2) != "\r\n")
      {
         sIndexBlock += "\r\n";
         nNewlinesAppended = 2;
      }
      sIndexBlock = sIndexBlock.replace(/(\r?\n){2,}/g,"$1");
      sIndexBlock = sIndexBlock.replace(/,(?=\d)/g,", ");
      sIndexBlock = sIndexBlock.replace(/ <\/(em|i|b)>/gi,"</$1> ");
      sIndexBlock = sIndexBlock.replace(/  +/g," ");
      sIndexBlock = sIndexBlock.replace(/,<\/(em|i|b)>/gi,"</$1>,");
      sIndexBlock = sIndexBlock.replace(/,(?=\d)/g,", ");
      sIndexBlock = sIndexBlock.replace(/(<p class="font-serif"><span id="page\w+" class="page-number">Page \w+<\/span><\/p>|<span id="page\w+" class="page-number">Page \w+<\/span>|<p class="pagenumber" id="page\w+">\[page \w+]<\/p>|<span[^<>\r\n\/]*?><\/span>|<span[^<>\r\n\/]*?\/>|<a[^<>\r\n\/]*?><\/a>|<a[^<>\r\n\/]*?\/>)[\r\n]+(\t+<p[^<>\r\n/]*?>|<p[^<>\r\n\/]*?>)/gi,"$2$1");
      sIndexBlock = sIndexBlock.replace(/<p[^<>\r\n/]*?>/gi,"<li>");
      sIndexBlock = "<ol>\r\n" + sIndexBlock.replace(/<\/?p>/gi,"</li>") + "</ol>\r\n";
      sIndexBlock = sIndexBlock.replace(/(<\/li>\r\n)((?:\t{1,}<li.+?<\/li>(?:\r\n)*)+)/g,"\r\n\t<ol>\r\n$2\t</ol>\r\n$1");
      sIndexBlock = sIndexBlock.replace(/(<\/li>\r\n)((?:\t{2,}<li.+?<\/li>(?:\r\n)*)+)/g,"\r\n\t\t<ol>\r\n$2\t\t</ol>\r\n\t$1");
      sIndexBlock = sIndexBlock.replace(/(<\/li>\r\n)((?:\t{3,}<li.+?<\/li>(?:\r\n)*)+)/g,"\r\n\t\t\t<ol>\r\n$2\t\t\t</ol>\r\n\t\t$1");
      sIndexBlock = sIndexBlock.replace(/(<\/li>\r\n)((?:\t{4,}<li.+?<\/li>(?:\r\n)*)+)/g,"\r\n\t\t\t\t<ol>\r\n$2\t\t\t\t</ol>\r\n\t\t\t$1");
      sIndexBlock = sIndexBlock.replace(/(<\/li>\r\n)((?:\t{5,}<li.+?<\/li>(?:\r\n)*)+)/g,"\r\n\t\t\t\t\t<ol>\r\n$2\t\t\t\t\t</ol>\r\n\t\t\t\t$1");
      UltraEdit.activeDocument.write(sIndexBlock.substr(0,sIndexBlock.length - nNewlinesAppended));
   }
}

Your script fails on larger files because of 2 reasons:

The string matched by a regular expression search string is not of unlimited size. It is just a few MB (never read about the limit or tested it out) even with several GB on free total RAM.

The JavaScript core engine inside UltraEdit like in other applications (browsers) has just access to a memory block which is limited in size. There is no access to total RAM via heap memory management. I think, but I'm not sure, this is a security restriction in JavaScript core. This memory management makes sure that a bad coded or special crafted script does not take more and more memory until there is no free RAM anymore at all which then could easily result in lots of other processes running out of memory and crash.

But this is most likely not the problem here as even your large file is most likely just a few 100 KB and not hundreds of MB.
Your several years old version of UEStudio loads files always in small blocks as most applications. The block size might be 64 KB, but I don't know it for sure. UES v16.20 and UE v23.20 especially in 64-bit version on Windows x64 makes use of much more RAM to hold much larger blocks of file contents in memory.

A Perl regular expression find/replace is always executed just on a block loaded from file by UE/UES and not on entire file which would be impossible for huge files with several GB of size. When the Perl regular expression find/replace function has already a complete positive match when reaching end of a file block passed by UE/UES to the Perl find/replace function, the find/replace is positive executed without loading next file block by UE/UES and passing also to Perl find/replace function.

Your expressions for inserting <ol> and </ol> find 1 or more list item lines of an entire list. That's the problem here. For example the currently loaded 64 KB file block ends with 5 lines of a list of level 3 with in total 8 list item lines, the Perl find/replace function returns on reaching end of the buffer to UE/UES that the replace was positive (1 or more!) and therefore there is no reason for UE/UES to load the next 64 KB file block and let Perl find/replace function run on a merged block with last 5 lines of previous block and the new lines from next block just loaded to really match all 8 list item lines of list of level 3.

As a result of this block based find/replace with find expressions allowing to match 1 or more list item lines it happens on larger files not being loaded completely in a single block into memory that <ol> and </ol> can be inserted unexpected anywhere within a list with 1 or more list item lines depending on file block boundaries.

A solution would be to use a find regular expression which makes sure that always the entire list of level X must be matched for a positive match and not simply 1 or more list item lines of same level. The Perl find/replace function must return to UE/UES on reaching end of a block that it needs more data for evaluating if the entire search expression results in a positive match for the current block.

Another solution (for files with less than 20 MB?) is loading the entire selection into JavaScript memory and reformat it there as done by my scripting solution. That's also faster for several reasons, but does not work for Unicode files with characters with a code point value greater U+00FF in file.

Samir · Aug 25, 2016#82016-08-25T07:28+00:00

Thanks Mofi, again.