Character count with space

isomenath · Sep 15, 2017#12017-09-15T07:04+00:00

I have a folder which contains many xml files. All xml files are in UTF-8 (Unicode-Editing)

The xml file structure like given below:
<?xml version="1.0" encoding="utf-8"?>
<DDD>
<DD>
<SO_TXT>Sample</SO_TXT>
<SO>Sample</SO>
<HT>Isabelle Moret</HT>
<PG>a3</PG>
<LA>DT</LA>
<DA>2017-09-14</DA>
<ME>01.pdf</ME>
<NZ>###</NZ>
<TX>
Bern â?? Isabelle Moret ist die einzige Frau im Bundesrats-rennen. Die 46-jÃ¤hrige WaadtlÃ¤nderin ist An-wÃ¤ltin. Seit 2006 sitzt sie fÃ¼r die FDP im Nationalrat. Ihre politischen Schwer-punkte sind die Familien-sowie die Gesundheits- und Sozial politik. Derzeit prÃ¤si-diert sie beispielsweise den Spitalverband H+.
Moret ist Mutter zweier Kinder und lebt getrennt von ihrem Mann in Yens VD, einer Gemeinde im Distrikt Morges. Als Kind wollte sie BÃ¤uerin werden â?? wegen der vielen Tiere. Was sie mag: Spaghetti mit hausgemachter Tomaten-sauce. Und den Schweizer Tennis-Stars Roger Federer und Stan Wawrinka beim Punkten zuschauen.
</TX>
</DD>
</DDD>

Now I want to put characters count of the body part <TX>...</TX> (with space) in the tag <NZ>###</NZ>.

In the body part which is <TX>...</TX>, other tags like or any others which are not include in count. Only text will be countable.

Kindly help me...

Mofi · Sep 15, 2017#22017-09-15T16:14+00:00

UltraEdit opened the attached XML file as Unicode file on my machine. For that reason I saw the German umlauts as they should look like.

Here is the script code which counts the Unicode characters, not the number of bytes in file as stored on storage media. Please read the comments for details on how this little script works.

Code: Select all

if (UltraEdit.document.length > 0)  // Is any file opened?
{
   var nCharCount;
   var nFindCount = 0;

   // Define environment for this script.
   UltraEdit.insertMode();
   if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
   else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();

   // Move caret to top of the active file.
   UltraEdit.activeDocument.top();

   // Define regular expression engine and the find parameters for the
   // Perl regular expression finds used to modify the active file.
   UltraEdit.perlReOn();
   UltraEdit.activeDocument.findReplace.mode=0;
   UltraEdit.activeDocument.findReplace.matchCase=true;
   UltraEdit.activeDocument.findReplace.matchWord=false;
   UltraEdit.activeDocument.findReplace.regExp=true;
   UltraEdit.activeDocument.findReplace.searchDown=true;
   if (typeof(UltraEdit.activeDocument.findReplace.searchInColumn) == "boolean")
   {
      UltraEdit.activeDocument.findReplace.searchInColumn=false;
   }
   UltraEdit.activeDocument.findReplace.preserveCase=false;
   UltraEdit.activeDocument.findReplace.replaceAll=false;
   UltraEdit.activeDocument.findReplace.replaceInAllOpen=false;

   // Find in a loop the value of each TX element not matching the DOS
   // line ending after start tag until nothing found anymore from current
   // position of caret in active file to end of file.

   // For each found value determine number of characters including all
   // whitespace characters (spaces, tabs, carriage returns and line-feeds).

   // Then search upwards for the value of NZ element and select it.

   // Replace the selection by the character count converted from integer
   // number to a string using decimal system, or in case of nothing is
   // selected, insert the character count between start and end tag of
   // NZ element.

   // Then search downwards for end tag of just processed TX element
   // to avoid processing first found TX element in an endless loop.

   while(UltraEdit.activeDocument.findReplace.find("(?s)(?<=<TX>\r\n).*?(?=</TX>)"))
   {
      nCharCount = UltraEdit.activeDocument.selection.length;

      UltraEdit.activeDocument.findReplace.searchDown=false;
      UltraEdit.activeDocument.findReplace.find("(?<=<NZ>).*?(?=</NZ>)");

      UltraEdit.activeDocument.write(nCharCount.toString(10));

      UltraEdit.activeDocument.findReplace.searchDown=true;
      UltraEdit.activeDocument.findReplace.find("</TX>");

      nFindCount++;
   }

   // Move caret to top of file and show a message box informing
   // the script user about the number of processed TX values.
   UltraEdit.activeDocument.top();
   UltraEdit.messageBox("Processed " + nFindCount + " TX+NZ element" + ((nFindCount != 1) ? "s." : "."));
}

isomenath · Sep 16, 2017#32017-09-16T10:57+00:00

At first, thanks for the support, but the script is not run. I have attached a screenshot of the error report (deleted). Kindly look.

Regards
Somenath

Mofi · Sep 16, 2017#42017-09-16T12:47+00:00

It would have been good to post that you are still using UEStudio version 09.xx because there have been some changes in the last 8 years like Perl regular expression have become more powerful.

For a script working also in UEStudio '09 replace in the posted script

Code: Select all

   while(UltraEdit.activeDocument.findReplace.find("(?s)(?<=<TX>\r\n).*?(?=</TX>)"))
   {
      nCharCount = UltraEdit.activeDocument.selection.length;

by

Code: Select all

   while(UltraEdit.activeDocument.findReplace.find("(?s)<TX>\r\n.*?(?=</TX>)"))
   {
      nCharCount = UltraEdit.activeDocument.selection.length - 6;

The look-behind including carriage return and line-feed does not work in UEStudio v9.xx. For that reason <TX>\r\n must be also matched and those 6 characters must be subtracted from length of selected block.