Character count with space

Character count with space

12
Basic UserBasic User
12

    Sep 15, 2017#1

    I have a folder which contains many xml files. All xml files are in UTF-8 (Unicode-Editing)

    The xml file structure like given below:
    <?xml version="1.0" encoding="utf-8"?>
    <DDD>
    <DD>
    <SO_TXT>Sample</SO_TXT>
    <SO>Sample</SO>
    <HT>Isabelle Moret</HT>
    <PG>a3</PG>
    <LA>DT</LA>
    <DA>2017-09-14</DA>
    <ME>01.pdf</ME>
    <NZ>###</NZ>
    <TX>
    <P>Bern â?? Isabelle Moret ist die einzige Frau im Bundesrats-rennen. Die 46-jährige Waadtländerin ist An-wältin. Seit 2006 sitzt sie für die FDP im Nationalrat. Ihre politischen Schwer-punkte sind die Familien-sowie die Gesundheits- und Sozial politik. Derzeit präsi-diert sie beispielsweise den Spitalverband H+.</P>
    <P>Moret ist Mutter zweier Kinder und lebt getrennt von ihrem Mann in Yens VD, einer Gemeinde im Distrikt Morges. Als Kind wollte sie Bäuerin werden â?? wegen der vielen Tiere. Was sie mag: Spaghetti mit hausgemachter Tomaten-sauce. Und den Schweizer Tennis-Stars Roger Federer und Stan Wawrinka beim Punkten zuschauen.</P>
    </TX>
    </DD>
    </DDD>


    Now I want to put characters count of the body part <TX>...</TX> (with space) in the tag <NZ>###</NZ>.

    In the body part which is <TX>...</TX>, other tags like <P> or any others which are not include in count. Only text will be countable.

    Kindly help me...
    Sample.zip (688 Bytes)   40
    Please find the Sample xml file

    6,603548
    Grand MasterGrand Master
    6,603548

      Sep 15, 2017#2

      UltraEdit opened the attached XML file as Unicode file on my machine. For that reason I saw the German umlauts as they should look like.

      Here is the script code which counts the Unicode characters, not the number of bytes in file as stored on storage media. Please read the comments for details on how this little script works.

      Code: Select all

      if (UltraEdit.document.length > 0)  // Is any file opened?
      {
         var nCharCount;
         var nFindCount = 0;
      
         // Define environment for this script.
         UltraEdit.insertMode();
         if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
         else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();
      
         // Move caret to top of the active file.
         UltraEdit.activeDocument.top();
      
         // Define regular expression engine and the find parameters for the
         // Perl regular expression finds used to modify the active file.
         UltraEdit.perlReOn();
         UltraEdit.activeDocument.findReplace.mode=0;
         UltraEdit.activeDocument.findReplace.matchCase=true;
         UltraEdit.activeDocument.findReplace.matchWord=false;
         UltraEdit.activeDocument.findReplace.regExp=true;
         UltraEdit.activeDocument.findReplace.searchDown=true;
         if (typeof(UltraEdit.activeDocument.findReplace.searchInColumn) == "boolean")
         {
            UltraEdit.activeDocument.findReplace.searchInColumn=false;
         }
         UltraEdit.activeDocument.findReplace.preserveCase=false;
         UltraEdit.activeDocument.findReplace.replaceAll=false;
         UltraEdit.activeDocument.findReplace.replaceInAllOpen=false;
      
         // Find in a loop the value of each TX element not matching the DOS
         // line ending after start tag until nothing found anymore from current
         // position of caret in active file to end of file.
      
         // For each found value determine number of characters including all
         // whitespace characters (spaces, tabs, carriage returns and line-feeds).
      
         // Then search upwards for the value of NZ element and select it.
      
         // Replace the selection by the character count converted from integer
         // number to a string using decimal system, or in case of nothing is
         // selected, insert the character count between start and end tag of
         // NZ element.
      
         // Then search downwards for end tag of just processed TX element
         // to avoid processing first found TX element in an endless loop.
      
         while(UltraEdit.activeDocument.findReplace.find("(?s)(?<=<TX>\r\n).*?(?=</TX>)"))
         {
            nCharCount = UltraEdit.activeDocument.selection.length;
      
            UltraEdit.activeDocument.findReplace.searchDown=false;
            UltraEdit.activeDocument.findReplace.find("(?<=<NZ>).*?(?=</NZ>)");
      
            UltraEdit.activeDocument.write(nCharCount.toString(10));
      
            UltraEdit.activeDocument.findReplace.searchDown=true;
            UltraEdit.activeDocument.findReplace.find("</TX>");
      
            nFindCount++;
         }
      
         // Move caret to top of file and show a message box informing
         // the script user about the number of processed TX values.
         UltraEdit.activeDocument.top();
         UltraEdit.messageBox("Processed " + nFindCount + " TX+NZ element" + ((nFindCount != 1) ? "s." : "."));
      }
      Best regards from an UC/UE/UES for Windows user from Austria

      12
      Basic UserBasic User
      12

        Sep 16, 2017#3

        At first, thanks for the support, but the script is not run. I have attached a screenshot of the error report (deleted). Kindly look.

        Regards
        Somenath

        6,603548
        Grand MasterGrand Master
        6,603548

          Sep 16, 2017#4

          It would have been good to post that you are still using UEStudio version 09.xx because there have been some changes in the last 8 years like Perl regular expression have become more powerful.

          For a script working also in UEStudio '09 replace in the posted script

          Code: Select all

             while(UltraEdit.activeDocument.findReplace.find("(?s)(?<=<TX>\r\n).*?(?=</TX>)"))
             {
                nCharCount = UltraEdit.activeDocument.selection.length;
          by

          Code: Select all

             while(UltraEdit.activeDocument.findReplace.find("(?s)<TX>\r\n.*?(?=</TX>)"))
             {
                nCharCount = UltraEdit.activeDocument.selection.length - 6;
          The look-behind including carriage return and line-feed does not work in UEStudio v9.xx. For that reason <TX>\r\n must be also matched and those 6 characters must be subtracted from length of selected block.
          Best regards from an UC/UE/UES for Windows user from Austria