Take strings in file and paste it into others tags file

hizard77 · Oct 09, 2007#12007-10-09T22:48+00:00

Hello,

I want to make a script in UltraEdit that copy strings from a text file and paste them in a xml file in the correct tag.

More precisely:

My idea is to put in the .txt file the same tags that are in the xml file, and put a string after those tags. Example in the .txt file:

<ref_language>give water to plants
<creator>tell me something
<release_date>i like to play chess
<fragment>dont drink and drive
etc...

Next step is to copy that string after the same tags in the xml file. The txt file and the xml have a the same name

===========================
So let me explain my concept:

1) Get the filename of the txt file in a variable for example "index_chocolat.xml" that is in a folder.

2) Tell the script to find the first tag that are in the txt file "<ref_language>" and put it in a variable. Then find the string after that tag "give water to plants" and copy it in to an other variable

3) Look for a xml file with the name "index_chocolat.xml" in a folder.

4) Find the same tag "<ref_language>" that is in the variable in the xml file, and copy the string "give water to plants" after it.

5) Find the next tag in the txt file, and make the procedure again.

Is it possible? Can you help me?

Thank you very much for your help, I am new to JavaScript...

- I use the very last version of UltraEdit currently the 13.10a+2
- I prefer the UltraEdit regex
- I use Windows XP Pro SP2

Lets get to the main thing

2 file are processed, a TXT file to extract the data, and a XML file to insert those data.

TXT FILE CONTENT (Path: C:\website\higher-tech\meta\index_en.txt, encoding utf-8 )

Code: Select all

<language_reference>EN
<hd_short_title>Tidligere nyheder
<hd_long_title>Det europæiske informationssystem for skovbrande
<content_priority>2
<clc_creation_date>07/03/2007
<str_reference>higher-tech/news/archives
<str_title>Skift til Eurotarif
<str_language>DA
<str_document_type>57-Web home page (text, logo, image of a department)
<str_classification>16-000- Institutions and Legislation
<str_keywords>technology
<str_keywords>computer
<str_keywords>hardware
<str_keywords>cellular
<str_keywords>processor
<str_description>technology and information

XML FILE CONTENT (Path: C:\website\higher-tech\xml\index_en.xml, encoding utf-8 )
The content is placed in the open xml tags. I think it will be hard to deal with the html content in the <xhtml_fragment> tags, it s not just a line, so I will deal that in an other way I think.

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<short_content>
  <language_descriptor>
    <language_reference></language_reference>
  </language_descriptor>
  <heading>
    <hd_short_title></hd_short_title>
    <hd_long_title></hd_long_title>
    <hd_abstract>
      <xhtml_r>
        <xhtml_fragment/>
      </xhtml_r>
    </hd_abstract>
    <hd_media>
      <media_reference/>
      <media_alternative_text/>
    </hd_media>
  </heading>
  <content_qual>
    <content_priority></content_priority>
    <content_category_r>
      <content_category/>
    </content_category_r>
    <content_life_cycle>
      <clc_creation_date></clc_creation_date>
      <clc_modification_date/>
    </content_life_cycle>
    <content_str>
      <str_reference></str_reference>
      <str_title></str_title>
      <str_creator/>
      <str_language></str_language>
      <str_document_type></str_document_type>
      <str_classification></str_classification>
      <str_keywords></str_keywords>
      <str_description></str_description>
    </content_str>
  </content_qual>
  <content_ref_short_content>
    <ref_date/>
    <ref_contact_mail/>
    <ref_links>
      <ref_link_r>
        <short_title/>
        <url/>
        <abstract>
          <xhtml_r>
            <xhtml_fragment/>
          </xhtml_r>
        </abstract>
      </ref_link_r>
    </ref_links>
    <ref_target_audience/>
    <ref_author/>
  </content_ref_short_content>
  <text>
    <content_r>
      <xhtml_fragment>
        <h1>title of some content</h1>
        <h2>subtitle of some content</h2>
 <p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean placerat. Integer feugiat. In hac habitasse platea dictumst. 
 Quisque ornare, felis tempus pulvinar fringilla, enim diam tincidunt sem, eu vulputate lacus nisi euismod libero. Sed metus arcu, 
 posuere hendrerit, pretium ultricies, iaculis eget, odio. In at nunc. Etiam a tellus. In vehicula, felis sit amet hendrerit venenatis, 
 lectus massa eleifend odio, eget consectetuer augue enim ultrices libero. In sagittis vehicula sem. Nam massa turpis, laoreet vitae, 
 condimentum id, imperdiet eget, risus. Pellentesque nec urna sit amet ipsum facilisis bibendum. Quisque scelerisque, elit nec nonummy 
 semper, est lectus accumsan libero, at pellentesque eros nisl ut nulla. Vivamus faucibus, velit eu consectetuer adipiscing, metus elit 
 eleifend purus, quis porttitor eros felis ut quam. Cras vitae leo.</p>
 
 <p>Nullam sed leo. Integer pede pede, placerat ac, accumsan ut, sollicitudin eget, augue. Donec consectetuer sem eget tortor. Quisque 
 arcu sapien, egestas a, commodo eget, vehicula in, odio. Quisque eu magna et pede laoreet cursus. Curabitur non ante. Aenean commodo
  eros eu quam. Pellentesque magna. Nunc porta orci et sapien. Quisque id risus in magna vulputate gravida.</p>

      </xhtml_fragment>
    </content_r>
  </text>
  <medias>
    <media_r>
      <short_title/>
      <media>
        <media_reference/>
        <media_alternative_text/>
      </media>
      <abstract>
        <xhtml_r>
          <xhtml_fragment/>
        </xhtml_r>
      </abstract>
      <alternative_text/>
      <transcript/>
    </media_r>
  </medias>
  <links>
    <link_r>
      <short_title/>
      <url/>
      <abstract>
        <xhtml_r>
          <xhtml_fragment/>
        </xhtml_r>
      </abstract>
    </link_r>
  </links>
</short_content>

Mofi · Oct 15, 2007#22007-10-15T11:15+00:00

I have looked into your request and already have some ideas how to code the script. But before I start I need some more informations.

Are the tags in the ouput XML file always empty or could it be necessary to delete first the existing content between for example <str_language>...</str_language> before inserting the text from the text file?
About the multiple <str_keywords> in the text file. Will they always be listed in that way because then the script must combine them in the XML file to a single tag because the XML file contains only one <str_keywords></str_keywords>?
If in the text file the keywords are defined as <str_keywords>technology, computer, hardware, cellular, processor it would be no difference for handling the keywords in comparison to the other tags.
Multi-line tags as needed for the <xhtml_fragment> would be no problem to handle correct for the script as long as there is a rule to detect always correct the end of the multi-line tag. For example the xhtml fragment is always the last tag in the text file. Or the next tag in the text file is always the same tag with a known tag name. Or the end of the multi-line tag is at the end of the line above a line starting with < (start of next tag name). It would be also possible to handle all tags in general as multi-line tags when a rule like the last one defines end of the tag content. But the last rule would mean also that care must be taken not to start a line of the tag content with < which could be a potential problem because of the HTML tags inside this multi-line text.
Do you want an error message in the output window when a tag specified in the text file is not found in the XML file (for example a typo)?

After answering my questions I will try to code the script as soon as possible. I'm low on free time at the moment.

hizard77 · Oct 15, 2007#32007-10-15T12:26+00:00

1. if you could erase the existing content it will be great.

2. the xml content is managed via a CMS, and each keywords must be in a "keywords" tags to be recognized.

3. I receive the html via .word file that I change in RTF.
Then I transform those RTF in XSL:fo and tranform it in a htm file with an XSL.
Like that i have text content, link, tags, exactly as i want.
So idealy dealing with that part is to put in the <xhtml_fragment> tags the html content that are in an other location.
(Path: C:\website\higher-tech\content\index_en.htm, encoding utf-8 )

4. ok for the message error if it does not stop the process.

=========

You are giving a great support mofi, thank you.

Mofi · Oct 15, 2007#42007-10-15T15:39+00:00

No problem. The script can erase the content of the tags which are specified in the text file before inserting new content. Other tags in the XML file are not modified.
Can you post the XML file content after script execution based on the sample text file content?
I still don't know how exactly the multiple keywords from the text file should be copied into the XML file.
The script will open the HTML file and copy everything between <body ...>...</body> between <xhtml_fragment></xhtml_fragment> (with automatical removal of unnecessary blank lines at top and bottom of the HTML block before inserting it into the XML file).
That will not stop the script. You maybe will later add by yourself some code to create a statistic report in the output window or a separate file.

hizard77 · Oct 15, 2007#52007-10-15T18:23+00:00

Here is an simple xml example for the txt file content. It contents all the content filled in the xml file. Is it enough?

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<event>
  <language_descriptor>
    <language_reference>es</language_reference>
  </language_descriptor>
  <heading>
    <hd_short_title>Cambio climático</hd_short_title>
    <hd_long_title>Adaptarse o morir</hd_long_title>
    <hd_abstract>
      <xhtml_r>
        <xhtml_fragment>
          <p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Sed vel sapien. Nam iaculis odio nec nulla sollicitudin adipiscing. Mauris non nisi ut ligula lobortis rhoncus. Etiam et quam. Aenean hendrerit sem eget risus. Sed enim. Curabitur pulvinar, urna nec viverra feugiat, mi lectus semper sapien, vitae tristique ipsum lectus in ipsum. Vestibulum nec augue ut magna malesuada viverra. Nam magna lacus, adipiscing id, bibendum sit amet, laoreet at, tortor. Nullam a dui non metus vestibulum accumsan. Pellentesque tortor.</p>
        </xhtml_fragment>
      </xhtml_r>
    </hd_abstract>
    <hd_media>
      <media_reference/>
      <media_alternative_text/>
    </hd_media>
  </heading>
  <content_qual>
    <event_start_date>25/09/2007</event_start_date>
    <event_end_date/>
    <content_priority>2</content_priority>
    <content_category_r>
      <content_category>080126248057fa44</content_category>
      <content_category>080126248057faef</content_category>
    </content_category_r>
    <content_life_cycle>
      <clc_creation_date>24/09/2007</clc_creation_date>
      <clc_modification_date/>
    </content_life_cycle>
    <content_str>
      <str_reference>ENVIRONMENT/NEWS/ES</str_reference>
      <str_title>Medio Ambiente para los Europeos – Adaptarse o morir</str_title>
      <str_creator>ENV/A1</str_creator>
      <str_language>es</str_language>
      <str_document_type>Newsletter</str_document_type>
      <str_classification>Environment</str_classification>
			<str_keywords>cambio </str_keywords>
			<str_keywords>climático</str_keywords>
			<str_keywords>adaptación</str_keywords>
			<str_keywords>mitigar</str_keywords>
			<str_keywords>Libro Verde</str_keywords>
			<str_keywords>medidas políticas</str_keywords>
	       <str_description>Libro Verde sobre la adaptación al cambio climático.</str_description>
    </content_str>
  </content_qual>
  <content_ref_event_content>
    <ref_date/>
    <ref_contact_mail/>
    <ref_links>
      <ref_link_r>
        <short_title/>
        <url/>
        <abstract>
          <xhtml_r>
            <xhtml_fragment/>
          </xhtml_r>
        </abstract>
      </ref_link_r>
    </ref_links>
    <ref_target_audience>Adaptarse o morir</ref_target_audience>
    <ref_organisation/>
    <ref_geographic_region/>
    <ref_full_address/>
    <ref_author/>
  </content_ref_event_content>
  <text>
    <content_r>
      <xhtml_fragment>
        <p><img title="Un agricultor " alt="Un agricultor " class="picleft" src="/environment/report_06.jpg"/> Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Sed vel sapien. Nam iaculis odio nec nulla sollicitudin adipiscing. Mauris non nisi ut ligula lobortis rhoncus. <a href="http://www.greenfacts.org/es/cambio-climatico-ie4/index.htm">Etiam et quam</a>. Aenean hendrerit sem eget risus. Sed enim. Curabitur pulvinar, urna nec viverra feugiat, mi lectus semper sapien, vitae tristique ipsum lectus in ipsum. Vestibulum nec augue ut magna malesuada viverra. Nam magna lacus, adipiscing id, bibendum sit amet, laoreet at, tortor. Nullam a dui non metus vestibulum accumsan. Pellentesque tortor.</p>
  		
  		<h5>Mauris non nisi ut ligula lobortis</h5>
        
        <p><img title="Un agricultor " alt="Un agricultor " class="picleft" src="/environment/report_06.jpg"/> Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Sed vel sapien. Nam iaculis odio nec nulla sollicitudin adipiscing. Mauris non nisi ut ligula lobortis rhoncus. Etiam et quam. Aenean hendrerit sem eget risus. Sed enim. Curabitur pulvinar, urna nec viverra feugiat, mi lectus semper sapien, vitae tristique ipsum lectus in ipsum. Vestibulum nec augue ut magna malesuada viverra. Nam magna lacus, adipiscing id, bibendum sit amet, laoreet at, tortor. Nullam a dui non metus vestibulum accumsan. Pellentesque tortor.</p>
        
        <p>Vestibulum nec augue ut magna malesuada viverra. Nam magna lacus, adipiscing id, bibendum sit amet, laoreet at, tortor. Nullam a dui non metus vestibulum accumsan. Pellentesque tortor.</p>
    	
    	<div id="Info">
 		<h5>Mauris non nisi ut ligula lobortis</h5>
			<ul>
				<li>ut magna malesuada viverra</li>
				<li>ut magna malesuada viverra</li>
				<li>ut magna malesuada viverra</li>
			</ul>
		</div>
      </xhtml_fragment>
    </content_r>
  </text>
  <medias>
    <media_r>
      <short_title/>
      <media>
        <media_reference/>
        <media_alternative_text/>
      </media>
      <abstract>
        <xhtml_r>
          <xhtml_fragment/>
        </xhtml_r>
      </abstract>
      <transcript/>
    </media_r>
  </medias>
  <links>
    <link_r>
      <short_title/>
      <url>
<p>Vestibulum nec augue ut magna malesuada viverra (<a href="http://www.greenfacts.org/es/cambio-climatico-ie4/index.htm"></p>
      </url>
      <abstract>
        <xhtml_r>
          <xhtml_fragment/>
        </xhtml_r>
      </abstract>
    </link_r>
  </links>
</event>

Mofi · Dec 27, 2007#62007-12-27T21:04+00:00

Hi,

today I have finished the script and it works with UE v13.20a with the test files I have created according to your examples. I hope the script does what you want.

A brief functional description: It first creates a list of TXT files found in the directory you must specify in variable TxtDir. For every file in this list following is done:

The text file is opened and prepared with some regular expression replaces to be inserted completely at top of an XML file with the same file name as the text file. The text file is closed always without saving the changes.

Then the script opens the XML file which must be in the directory you must specify in variable XmlDir. The content of the tags specified in the text file will be replaced. It does not matter if the XML file tags are empty or contain already data. The script only modifies those tags specified in the text file.

Next it opens a HTML file with extension HTM from the directory you must specify in variable HtmDir and copies everything between <body...>...</body> into the XML file between <xhtml_fragment>...</xhtml_fragment> with replacing existing text. The temporarily modified HTML file is closed always without saving the changes.

After all these modifications the XML file is saved and closed and the script continues with next text file from the list until the list is empty.

The script should work with ASCII, UTF-16 LE and UTF-8 files with DOS or UNIX line endings. The script continues on errors and warnings as long as it is possible. The output window is automatically opened and an error or warning message is printed into the output window if an error or warning occurs. Last it prints also a summary line after script execution into the output window. But the output window is not automatically opened if no error and no warning occurred.

For more details read the comments in the script code.

Note: It is important that none of the file to be processed by the script is currently open in UltraEdit. Any already opened TXT, XML or HTM file in the 3 specified directories would result in a wrong script execution. Other files not handled by this script can be open while executing the script.

The script code below is not the full code. You need additionally the functions getActiveDocumentIndex, GetListOfFiles, FindSelectInner and FindSelectOuter copied into the script.

Code: Select all

// Define the working environment.
UltraEdit.insertMode();
UltraEdit.columnModeOff();
UltraEdit.ueReOn();
UltraEdit.outputWindow.clear();

// Define the working directories with the TEXT, XML and HTML files.
var TxtDir = "C:\\website\\higher-tech\\meta\\";
var XmlDir = TxtDir.replace(/meta\\$/,"xml\\");
var HtmDir = TxtDir.replace(/meta\\$/,"content\\");

var CurrentFilename;  // This variable holds always the current file
                      // name without the fixed part of the path.
var ActiveFile;       // Active document.
var OpenFiles;        // Number of open files.
var Warnings = 0;     // Number of warnings during script execution.
var Errors = 0;       // Number of errors during script execution.
var Files = 0;        // Number of files processed by the script.

var DebugMessage = 0; /* Controls the output of the debug messages:
                         0 ... no debug messages
                         1 ... debug to the output window
                         2 ... debug to message boxes. */


/*** printError ***********************************************************/
function printError (ErrorText) {
   if ((typeof(ErrorText) != "string") || (ErrorText == "")) return;
   if (UltraEdit.outputWindow.visible == false) UltraEdit.outputWindow.showWindow(true);
   UltraEdit.outputWindow.write(ErrorText);
}


/*** main script **********************************************************/

// Remember the current active file before real start of the script.
var CurrentFileIndex = getActiveDocumentIndex();

/* Create a temporary new ASCII DOS file to convert a tag string from
   Unicode to ASCII for printing it in the output window in an error
   message. Hopefully this temporary file must be never used. */
UltraEdit.newFile();
var ConvFileIndex = UltraEdit.document.length-1;
ConvertFile = UltraEdit.document[ConvFileIndex];
ConvertFile.unixMacToDos();
ConvertFile.unicodeToASCII();

// First run a Find In Files to get a list of files in an edit window.
if (GetListOfFiles(0,TxtDir,"*.txt") == false) {
   // If that fails abort the script with an error message in the output window.
   printError("Script aborted: No file with extension TXT found in "+TxtDir);
} else {

   // Get the document index number for the active file with the file names.
   var FileListIndex = getActiveDocumentIndex();
   var FileListFile = UltraEdit.document[FileListIndex];

   // Delete the fixed part of the path from the file names.
   FileListFile.findReplace.matchCase=false;
   FileListFile.findReplace.matchWord=false;
   FileListFile.findReplace.regExp=false;
   FileListFile.findReplace.replaceAll=true;
   FileListFile.findReplace.replaceInAllOpen=false;
   FileListFile.findReplace.searchDown=true;
   FileListFile.findReplace.selectText=false;
   FileListFile.findReplace.replace(TxtDir,"");
   FileListFile.sortAsc(0,false,false,1,-1);

   // Run the following loop until no file to process anymore.
   while (!FileListFile.isEof()) {

      // Use clipboard 9 as main working clipboard.
      UltraEdit.selectClipboard(9);
      // Get next file list entry and then delete the line.
      FileListFile.startSelect();
      FileListFile.key("END");
      CurrentFilename = FileListFile.selection;
      FileListFile.endSelect();
      FileListFile.key("HOME");
      FileListFile.deleteLine();
      Files++;

      /*** Start of TXT file handling routine ***/

      // Open and prepare the text file for inserting it into the XML file.
      UltraEdit.open(TxtDir+CurrentFilename);
      ActiveFile = UltraEdit.document[UltraEdit.document.length-1];

      // Make sure the last line of the file has a line ending.
      ActiveFile.bottom();
      if (ActiveFile.isColNumGt(1)) ActiveFile.insertLine();

      // Add a line with a marker character at end of the text file.
      ActiveFile.write("#");
      ActiveFile.insertLine();  // Don't know format of the line endings!
      ActiveFile.top();
      ActiveFile.trimTrailingSpaces();

      // Delete all preceding white-space characters.
      ActiveFile.findReplace.matchCase=false;
      ActiveFile.findReplace.matchWord=false;
      ActiveFile.findReplace.regExp=true;
      ActiveFile.findReplace.replaceAll=true;
      ActiveFile.findReplace.replaceInAllOpen=false;
      ActiveFile.findReplace.searchDown=true;
      ActiveFile.findReplace.selectText=false;
      ActiveFile.findReplace.replace("%[ ^t]++<", "<");

      /* The text file contains 1 tag with its content per line without
         end tag. Add the end tag with a regular expression replace. */
      ActiveFile.findReplace.replace("%<^(*>^)^(*^)$", "<^1^2</^1");

      // Merge multiple keyword tags temporarily to a single line.
      ActiveFile.findReplace.replace("</str_keywords>^r++^n<str_keywords>","|KeYwOrDs|");

      // Copy whole prepared file content now to user clipboard 9.
      ActiveFile.selectAll();
      ActiveFile.copy();
      // Close the text file without saving the modifications.
      UltraEdit.closeFile(ActiveFile.path,2);

      /*** End of TXT file handling routine ***/

      OpenFiles = UltraEdit.document.length;
      CurrentFilename = CurrentFilename.replace(/txt$/,"xml");

      // Open the XML file which hopefully does always exist.
      UltraEdit.open(XmlDir+CurrentFilename);
      if(UltraEdit.document.length == OpenFiles) {
         printError("Error:   Failed to open "+XmlDir+CurrentFilename);
         Errors++;
      } else {
         ActiveFile = UltraEdit.document[UltraEdit.document.length-1];

         /*** Start of XML file handling routine ***/

         /* In the XML file first delete content of first "str_keywords"
            tag and then all other lines with a "str_keywords" tag. */
         ActiveFile.findReplace.matchCase=false;
         ActiveFile.findReplace.matchWord=false;
         ActiveFile.findReplace.regExp=true;
         ActiveFile.findReplace.replaceAll=false;
         ActiveFile.findReplace.replaceInAllOpen=false;
         ActiveFile.findReplace.searchDown=true;
         ActiveFile.findReplace.selectText=false;
         ActiveFile.findReplace.replace("<str_keywords>*</str_keywords>","<str_keywords></str_keywords>");

         if (ActiveFile.isFound()) {
            ActiveFile.key("LEFT ARROW");
            ActiveFile.findReplace.replaceAll=true;
            ActiveFile.findReplace.replace("%*<str_keywords>*</str_keywords>^r++^n","");
         }

         // Next insert the content from the TXT file at top of the XML file.
         ActiveFile.top();
         ActiveFile.paste();
         ActiveFile.top();


         // Run the following loop until line starting with # is found.
         while (!UltraEdit.activeDocument.isChar("#")) {

            // Copy current line without line ending into clipboard 9.
            ActiveFile.startSelect();
            ActiveFile.key("END");
            ActiveFile.copy();
            ActiveFile.endSelect();
            ActiveFile.key("HOME");
            ActiveFile.key("RIGHT ARROW");

            // Copy tag name without starting '<' into clipboard 8.
            ActiveFile.findReplace.regExp=true;
            ActiveFile.findReplace.find("[~>]+>");
            UltraEdit.selectClipboard(8);
            ActiveFile.copy();

            /* Search for <tag name>...</tag name> in the XML file to
               replace it with new content. Try first a fast regular
               expression search which finds only single line tags. If
               this fails use FindSelectOuter to find the multi-line tag.
               If this also fails report that the XML file does not contain
               the tag which should be updated. */
            ActiveFile.findReplace.find("<^c*<^c");
            if (ActiveFile.isNotFound()) {
               ActiveFile.findReplace.regExp=false;
               if (!FindSelectOuter("<^c","</^c",UltraEdit.document.length-1,true,8)) {
                  /* Convert the Unicode string in clipboard 8 to ASCII and
                     print it with an error message to the output window. */
                  ConvertFile.paste();
                  ConvertFile.selectAll();
                  var ErrorString = ConvertFile.selection;
                  ConvertFile.deleteText();
                  printError("Error:   Failed to find \"<"+ErrorString+"...</"+ErrorString+"\" in "+XmlDir+CurrentFilename);
                  Errors++;
                  UltraEdit.selectClipboard(9);
               } else {  // Multi-line block is replaced now with new text.
                  UltraEdit.selectClipboard(9);
                  ActiveFile.paste();
               }
            } else {     // Single-line block is replaced now with new text.
               UltraEdit.selectClipboard(9);
               ActiveFile.paste();
            }
            /* Back at top of the XML file delete the processed
               line from the TXT file and continue with next line. */
            ActiveFile.top();
            ActiveFile.deleteLine();
         }
         ActiveFile.deleteLine();  // Delete the separating line with '#'.

         /* All tags from the TXT file have been processed. What still
            is needed is to break up the line with the multiple keywords.
            To do this always with correct indenting, first select and
            copy the line ending and the preceding white-spaces before
            the  <str_keywords> tag with the tag name itself. */
         ActiveFile.findReplace.regExp=true;
         ActiveFile.findReplace.find("^r++^n[ ^t]++<str_keywords>");
         if (ActiveFile.isFound()) {
            ActiveFile.copy();
            ActiveFile.findReplace.matchCase=true;
            ActiveFile.findReplace.replaceAll=true;
            ActiveFile.findReplace.replace("|KeYwOrDs|","</str_keywords>^c");
         }


         /* Check if there is also a HTML file with extension HTM with
            content to insert into the XML file replacing existing HTML
            content. */
         OpenFiles = UltraEdit.document.length;
         CurrentFilename = CurrentFilename.replace(/xml$/,"htm");

         // Open the HTML file while the unsaved XML file is still open.
         UltraEdit.open(HtmDir+CurrentFilename);
         if(UltraEdit.document.length == OpenFiles) {
            printError("Warning: Failed to open "+HtmDir+CurrentFilename);
            CurrentFilename = CurrentFilename.replace(/htm$/,"xml");
            Warnings++;
         } else {
            ActiveFile = UltraEdit.document[UltraEdit.document.length-1];

            /*** Start of HTML file handling routine ***/

            ActiveFile.findReplace.matchCase=false;
            ActiveFile.findReplace.matchWord=false;
            ActiveFile.findReplace.regExp=true;
            ActiveFile.findReplace.replaceAll=false;
            ActiveFile.findReplace.replaceInAllOpen=false;
            ActiveFile.findReplace.searchDown=true;
            ActiveFile.findReplace.selectText=false;

            if (!FindSelectInner("<body[~>]++>","</body>",UltraEdit.document.length-1,true,9)) {
               printError("Error:   Failed to find HTML body text in "+HtmDir+CurrentFilename);
               Errors++;
               UltraEdit.closeFile(ActiveFile.path,2);
               CurrentFilename = CurrentFilename.replace(/htm$/,"xml");
               ActiveFile = UltraEdit.document[UltraEdit.document.length-1];
            } else {
               // Remove everything except the HTML body.
               ActiveFile.copy();
               ActiveFile.selectAll();
               ActiveFile.paste();
               // Delete white-space characters at the end of the body text.
               ActiveFile.findReplace.searchDown=false;
               ActiveFile.findReplace.find("[~^t^r^n ]");
               ActiveFile.endSelect();
               ActiveFile.cut();
               ActiveFile.paste();
               ActiveFile.selectToBottom();
               if (ActiveFile.isSel()) ActiveFile.deleteText();
               // But keep the line ending of the last line.
               ActiveFile.insertLine();
               // Delete white-space characters at top of the body text.
               ActiveFile.top();
               ActiveFile.findReplace.searchDown=true;
               ActiveFile.findReplace.find("[~^t^r^n ]");
               ActiveFile.endSelect();
               ActiveFile.key("LEFT ARROW");
               ActiveFile.selectToTop();
               if (ActiveFile.isSel()) ActiveFile.deleteText();
               /* Inserting a line ending at top of the file without
                  knowing the line ending format is not a simple job. */
               ActiveFile.insertLine();
               ActiveFile.selectLine();
               ActiveFile.cut();
               ActiveFile.endSelect();
               ActiveFile.top();
               ActiveFile.paste();
               // Copy the body text into clipboard 9 and close the HTML file.
               ActiveFile.selectAll();
               ActiveFile.copy();
               UltraEdit.closeFile(ActiveFile.path,2);

               /*** End of HTML file handling routine ***/

               /*** Continue with XML file handling routine ***/

               CurrentFilename = CurrentFilename.replace(/htm$/,"xml");
               ActiveFile = UltraEdit.document[UltraEdit.document.length-1];
               ActiveFile.top();
               ActiveFile.findReplace.regExp=false;
               if (!FindSelectInner("<xhtml_fragment>","</xhtml_fragment>",UltraEdit.document.length-1,true,9)) {
                  printError("Error:   Failed to find \"<xhtml_fragment>...</xhtml_fragment>\" in "+XmlDir+CurrentFilename);
                  Errors++;
               } else ActiveFile.paste();
            }
         }
         // Save and close the XML file after all the modifications.
         UltraEdit.closeFile(ActiveFile.path,1);

         /*** End of XML file handling routine ***/
      }
   }
   /* Now all TXT files from the list have been processed. Close
      the already empty file list file and print the final message. */
   UltraEdit.closeFile(FileListFile.path,2);
   ErrorString = (Errors != 1) ? "s" : "";
   WarnString  = (Warnings != 1) ? "s" : "";
   FileString  = (Files != 1) ? "s" : "";
   UltraEdit.outputWindow.write("Summary: "+String(Warnings)+" warning"+WarnString+" and "+String(Errors)+" error"+ErrorString+" occurred while processing "+String(Files)+" file"+FileString+".");
}

// Restore normal working environment.
UltraEdit.closeFile(ConvertFile.path,2);
UltraEdit.selectClipboard(8);
UltraEdit.clearClipboard();
UltraEdit.selectClipboard(9);
UltraEdit.clearClipboard();
UltraEdit.selectClipboard(0);
if (CurrentFileIndex >= 0) UltraEdit.document[CurrentFileIndex].setActive();
UltraEdit.outputWindow.showStatus=false;

hizard77 · Jan 17, 2008#72008-01-17T16:15+00:00

It's working. You are a precious help Mofi.

Straight away I notice 2 things.

First the keywords tags are not totally filled, only the first one.

Notice that in the xml there will be only one <ipg_keywords> tag. I can't make the correct amount of them for each XML, it depends of the page event of the language.

And in the txt I would like to put the keywords like this (with a comma):

<ipg_keywords> Belgium, institutions, news, calendar

So idealy the script should look for comma after the <ipg_keywords> txt tag and create all the <ipg_keywords> filled in the XML.

Second the tag </xhtml_fragment>, the one closing the html content is deleted by the script.

Regards from Belgium.

Mofi · Jan 17, 2008#82008-01-17T17:08+00:00

The deletion of </xhtml_fragment> was my cause. I have copied FindSelectInner and FindSelectOuter from my offline copy on my computer at work which was not up-to-date and equal with the online version and the script file on my computer at home. The attached script file contains now the correct versions of FindSelectInner and FindSelectOuter.

I have also made the correction for the keywords although I must think that you want to drive me crazy. I have asked you twice how to handle the keywords and now you tell me that they are <ipg_keywords> and not <str_keywords> as you have posted in your example and that they are specified different in the text file as you have posted first. I really don't like writing macros and scripts more than once because the questioner does not post how the input data really look like. I have had already enough practice on writing macros and scripts. I don't need extra lessons.

I hope the script is now correct for your files. If it is not, read my comments and modify the code by yourself so it fits to your data.

Mofi

hizard77 · Jan 17, 2008#92008-01-17T17:55+00:00

Thank you Mofi.

I apologize about the keywords. I knew that you would not appreciate.

I am surprised that you correct the code so fast.

I agree, it is up to me now to understand and modify the code by myself.

But, I may ask you some help to understand the code later on.

Regards from Belgium.