The ultimate syntax highlighting tools

Mofi · Dec 09, 2004#12004-12-09T13:55+00:00

Hello wordfile creator!

Here is what you are waiting for a long time – a macro (set) which is able to sort all words of all color groups of a syntax highlighting language definition.

It handles correct case sensitivity according to Nocase, words beginning with /, substrings defined with ** and also special language settings like HTML_LANG, XML_LANG and LATEX_LANG. FORTRAN_LANG and the other language markers have no importance on the sort order of the words in the color groups.

It does not matter if the language definition with words to sort is stored in a file with other language definitions, for example wordfile.txt or wordfile.uew, or the file contains only one language definition. Also blank lines within the whole language definition are allowed and are removed by the macro during execution. Set the caret anywhere within the language definition you want to sort and start the macro SortLanguage. That's all, lean back and look what's going on.

ATTENTION!

Do not use the macro SortLanguage with UltraEdit for Windows v15.00.0.1033 to v15.00.0.1047.
With these versions of UltraEdit for Windows this macro does not work because of two bugs in UltraEdit.

The macro is not working correct if the setting Automatically copy to clipboard when selection is made is enabled at Configuration - Editor - Miscellaneous. Uncheck this setting before running the macro. And also disable word-wrap mode if word-wrap is enabled for the wordfile or by default for new (temporary) files.

GENERAL SORTING REQUIREMENTS

Here are some general information about the sorting requirements of words in a syntax highlighting wordfile for UltraEdit and UEStudio.

The first line of a syntax highlighting language block in a wordfile is the language definition line. It starts with uppercase /Lx with x is a number in the range of 1 to 20. Normally the name of the syntax highlighting language in double quotes follows immediately the language number. The language definition line must end with either File Extensions = or File Names = and the list of file extensions or file names of those files of which contents should be highlighted with this syntax highlighting language.

All keywords and key strings supported by UltraEdit and UEStudio to define how a syntax highlighting language should highlight the contents of a file are case-sensitive and those key strings with an equal sign require exactly 1 space before and 1 space after the equal sign. An example demonstrating the incorrect usage of keywords and key strings in a language definition line with red marked errors:

/L20"Example" NoCase String Chars =" Line Comment = // File Extensions= TXT

/L20"Example" Nocase String Chars = " Line Comment = // File Extensions = TXT

Compare the first line with the errors with the correct second line. What is wrong in the first line?

In keyword "Nocase" the character c is written in wrong case.
In key string "String Chars = " the space after the equal sign is missing.
In key string "Line Comment = " there are 2 spaces before the equal sign.
In key string "File Extensions = " the space before the equal sign is missing.

Important for the sorting order of the words in the color groups is the keyword Nocase in the language definition line because it controls among other things the case sensitivity of the words. Therefore the macro SortLanguage searches in the language definition line for the word nocase in any case and replaces it always by the correct keyword Nocase before it starts to sort the words.

All words in the color groups starting with the same character may be on the same line or spread across multiple lines, however if they are spread across multiple lines the lines must be one after the other with no empty lines or other line lines between them.

If the language is case-sensitive, the letter A is different from a and so words starting with A must be on a different line from words starting with a.
Words starting with the letter A must be on the same line as words starting with the letter a if the language is not case-sensitive.

First an example for a case-sensitive language with several sorting errors marked with red color:

Collection
Checkbox case
Anchor Applet Dictionary Area Arguments Array abstract
Boolean
Button
Crypto
Date Document Drive Drives
break
byte
default delete do double
class const catch char continue

What is wrong in the example above and why?

The word case starts with a lowercase c and the language is case-sensitive. So this word must be on a different line than the word Checkbox which starts with an uppercase C. The same mistake was made here for the word abstract.

The word Dictionary starts with D and therefore must be on a different line than the words starting with A.

The word Crypto starts with C and therefore must be on the same line with Collection or Checkbox or on a separate line, but with no other lines between the lines with words starting with C. In the example there are lines with words starting with A and B between the line with Checkbox and the line with Crypto and therefore this word is ignored.

That the words class const catch char continue are not sorted alphabetically within the line is no problem for UltraEdit/UEStudio. It is also no problem that for example the line with the words default delete do double is above the line with the words starting with c. And it also doesn't matter if some lines contain multiple words starting with the same character and other words starting with the same character are spread over multiple lines as long as lines with words starting with the same character build a unique block within a color group. But with such a weird grouping and ordering of the words mistakes can happen very easily when inserting additional words. Therefore the SortLanguage macro sorts also the words within a line and the entire lines alphabetically. Here is the corrected words list as produced by the macro:

Anchor Applet Area Arguments Array
Boolean Button
Checkbox Collection Crypto
Date Dictionary Document Drive Drives
abstract
break byte
case catch char class const continue
default delete do double

Now let us assume the keyword Nocase exists on the language definition line and therefore the case of the letters of the words in the color groups is not important. In this case all the words starting with a lowercase character in the list above would not be correct highlighted. The correct word order for a language ignoring the case of the letters A to Z would be:

abstract Anchor Applet Area Arguments Array
Boolean break Button byte
case catch char Checkbox class Collection const continue Crypto
Date default delete Dictionary do Document double Drive Drives

Language specific letters with a character value greater 127 are interpreted by the syntax highlighting engine always case-sensitive independent on presence of keyword Nocase in the wordfile. But wordfiles usually do not contain such letters and therefore the macro set for sorting the keywords do not process words with such letters different although the syntax highlighting engine would require it.

Lines starting with / are interpreted by UltraEdit/UEStudio as a line with a special syntax highlighting keyword. Therefore all lines in the color groups containing one or more "words" starting with / must start with // to be correct interpreted. An example with a wrong and a correct line:

/word1 /word2

// /word1 /word2

A line starting with ** defines a line with 1 or more substrings. The strings on this line can start with different characters. The lines with substrings must only build a block within a color group, best at top of the color group. Normally only 1 line is required for the definition of substrings in a color group. All words starting with those substrings are completely highlighted with the color of the color group. For more details on substrings see the documentation of TestForDuplicate below.

Languages marked with HTML_LANG or XML_LANG in the language definition line enables the HTML/XML specific interpretation of the words in the color groups. If one of these keywords is present, < or </ may be placed in front of any word (tag) to highlight as desired without all keywords starting with < need to be on the same line. Instead the tags starting with the same letter must be on the same or contiguous lines as normally required for words like if the tags would not begin with < or </.

A language marked with LATEX_LANG in the language definition line enables the LaTex/Tex specific interpretation of the words in the color groups. If a word begins with \ then the second character is used to determine which line the word should be on. All words beginning with \a should be on the same line as other words beginning with \a or just a. In the same way, all words beginning with \b should be on the same line as other words beginning with \b or just b, but on a different line from those starting with \a, and so on.

For more details and help about syntax highlighting wordfiles see in help of UltraEdit or UEStudio the page Syntax Highlighting and the forum topic Readme for the Syntax Highlighting forum.

GENERAL MACRO INFORMATION

Some general information about the macros used to sort the words in all color groups of a language.

The macros are ready for usage in the macro file SyntaxTools.mac. The macros are developed with having in view the compatibility with many versions of UltraEdit and UEStudio and are tested with many versions of UltraEdit. But always take a quick look on the result of the sorting operation. It is always possible that a version of UE/UES released after last update of the macro set has a bug in program code resulting in a wrong macro execution.

To use this macro set you need at least v8.20 of UltraEdit or any version of UEStudio. The macros were developed and tested with UE v10.10c and later versions of UltraEdit.

If you find any bugs or have other related questions, post it here.

You can see the source code of the macros in the file SyntaxTools.uem with lots of comments in case of being interested in how the macros work. If you want to make changes to fit your requirements better, feel free to do so, but take following into consideration:

All macros should have following properties:

Show Cancel Dialog for this macro ............ disabled
Continue if a Find with Replace not found ... enabled ( < UE v13.10a+2)
Continue if search string not found ........... enabled (>= UE v13.10a+2)
Hotkey = none

You can assign a hotkey to macro SortLanguage if it is used frequently. Never run the submacros manually!

Remove the green comment lines with the blank lines before copying the instructions to the macro edit window. The comments are only for experts who want to know how the macros work.

The submacro WrapLines sets the maximum numbers of characters per line to 106 which is the best value for printing with Courier New 8 with 1.5 cm left and right border on a European A4 sheet. This line length is also good for lower resolutions (1024x768) and at least one additional view open on left or right side and a normal font size used for displaying the text. A wordfile for the UE/UES community should not work with larger line lengths to be readable by most users without the need to scroll the lines horizontally. But if you don't want this line length limit, remove in macro SortLanguage the command

Code: Select all

PlayMacro 1 "WrapLines"

DISCLAIMER

THIS MACRO SET IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, STATUTORY OR OTHERWISE, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO USE, RESULTS AND PERFORMANCE OF THE MACRO SET IS ASSUMED BY YOU AND IF THE MACRO SET SHOULD PROVE TO BE DEFECTIVE, YOU ASSUME THE ENTIRE COST OF ALL NECESSARY SERVICING, REPAIR OR OTHER REMEDIATION. UNDER NO CIRCUMSTANCES, CAN THE AUTHOR BE HELD RESPONSIBLE FOR ANY DAMAGE CAUSED IN ANY USUAL, SPECIAL, OR ACCIDENTAL WAY OR BY THE MACRO SET.

DOWNLOAD

You can download the macros as a ready to use macro file with the description written here. File to download is SyntaxTools.zip.

HISTORY

2005-04-27:
Fixed problem with more than one space between the words resulting in additional empty lines for each additional space on sorted language.

2006-02-19:
A workaround for a bug of UltraEdit v10.xx was not well down because if the caret was already on the language definition line of which words should be sorted and this language definition is the last one in a wordfile with more than 1 language definition, the previous language definition was sorted. Found this problem by myself and fixed it now with a rewritten code block for finding start of current language definition. This new solution is even easier than the previous one.

Second, I rewrote the code block for selecting the whole language definition. The command Find RegExp "%*^p^p" was used in previous versions to select all lines of the language definition. This worked only because of a bug of UltraEdit. The new language selecting code block is much more complicated but with additional code it now allows also blank lines within the language definition. Note: Such blank lines are removed during macro execution.

2006-04-02:

Because of a bug in UltraEdit/UEStudio the Cancel dialog can cause crashes when calling a submacro. To avoid those crashes the macro property Show Cancel Dialog for this macro is not set any more in all macros.

Many users creating a syntax highlighting language definition for the first time write the keyword Nocase wrong. The keywords are case-sensitive and so nocase and NoCase are ignored by UE/UES. The macro SortLanguage detects now also a wrongly written Nocase keyword and corrects it automatically before sorting is executed.

With UltraEdit v11 and with UEStudio the keyword XML_LANG was introduced which has the same special meaning for words starting with < or </ as HTML_LANG. The macro SortLanguage recognizes now this special keyword too.

Before macro SortLanguage is executed the caret must be set anywhere within a language definition. If the caret is set on a blank line above a language definition and the file contains more than one language definition, the language definition above the caret was sorted by previous versions of this macro. If no other language definition is in the file and the caret is set on a blank line above the only language definition, the previous versions of the macro SortLanguage have done nothing. The macro SortLanguage was modified to first set the caret on a line which does contain any character before selecting the whole language definition. Now always the language below the current caret position is sorted if the caret is set on a blank line (= line which contains no or only whitespace characters).

Last some spelling mistakes were corrected in the documentation and the style of the documentation changed also a little.

2006-04-17:

The macros were designed for being executed on files with DOS line terminations because the syntax highlighting wordfile must be also a DOS file. The SortLanguage macro creates twice a new file. If the user has specified in the configuration dialog that the Default file type for new files is UNIX or MAC and not DOS and additionally has not selected the option Automatically convert to DOS format, new files were created not with CR/LF as line terminations and the macros failed. To solve this problem the command UnixMacToDos was inserted immediately after the 2 NewFile commands to make sure that the new file is always a DOS file.

Added to this documentation where to insert the macro command UnixReOn or PerlReOn if the user prefers the UNIX or Perl compatible regular expression engine instead of the UltraEdit regular expression engine which is used for these macros. Search for UnixReOn to find the 2 exit positions.

The order of the macros within the macro file has changed. The main macro SortLanguage is now the first macro in the file. That allows the user to run the macro SortLanguage also with Play Again from the Macro menu immediately after loading the macro file. So there is no need any more to select the macro SortLanguage from the macro list before execution after loading the macro file.

The macro file SyntaxTools.mac now contains also 3 additional macros to test a language definition for duplicate words. See the ultimate test for duplicate words macro for details about this additional macro set.

Last some small mistakes were corrected in the documentation.

2006-11-19:

There were 2 small errors in the macro codes for SortLanguage and ExpandSubstring. In both macros there was 1 Else which should be EndIf. These errors did not have an effect on the function of the macros.

The UltraEdit versions 12.10+3 to v12.10b and the UEStudio versions 5.50 and 5.50a move the focus always to nearest left tab in the file tab order instead of the last used file according to the window history when closing a file. With release of UltraEdit v12.20 and UEStudio v6.00 the focus handling after a file tab close can be customized with the option Move to nearest left tab after current tab is closed at Configuration - Application Layout - File Tabs. If this option is set or one of the UE/UES versions is used which always sets the focus to nearest left file tab after a file is closed, the wordfile with the language definition to sort has had to be the most right tab or the macro pasted the sorted language definition to the wrong file after closing the temporary files at end of the macro SortLanguage.

Now the macro SortLanguage has been improved and works independent of which file gets the focus after closing the 2 temporary file tabs. The macro now searches for the still existing selection of the whole language definition in all open windows before it pastes the sorted language definition over the unsorted definition.

2006-12-05:

In the source file the selected part of the language definition line does not start with a slash. Also in the temp file right before copying the sorted language back the language definition line has no slash at start. But for the loop to find correct file after closing last temp file a regular expression search was inserted which should find the start of the language definition line and should copy the language name with its language number to clipboard 8. This search was never successful and so the "find correct file" loop was executed with last contents of clipboard 8 which could successfully find the correct file, but could also lead to an endless window switching loop. Fixed this bug by deleting the slash character in the regular expression search for the language definition number and name.

Last some small spelling mistakes were corrected in the documentation.

2007-05-01:

The macro file was renamed from SyntaxSort.mac to SyntaxTools.mac. Also the zip file was renamed to SyntaxTools.zip. And the file SyntaxSort.htm was renamed to SortLanguage.htm. The macro file contains now also an additional macro to test a language definition for invalid words. See the ultimate test for invalid words macro for details about this additional macro. The macro source code is also available as UEM file - see top of this text file. The macros for sorting the words were not modified.

2007-08-02:

Near the end of macro SortLanguage UltraEdit does not find under certain conditions (depending on PC hardware, version, source file) the language number and language name at top of the first temporary file and so does not copy those data to user clipboard 8. This could result in an endless loop because the correct window is never found because of wrong contents in clipboard 8. A workaround was added for this very special UltraEdit problem.

2009-04-30:

Changed the regular expression for finding the language definition line to find also such lines which have no language name, only the language number. And made small modifications in comments and code, but without any effect on execution or result of the macro and therefore not really worth to document them in detail. Most changes were made on this description with lots of new information for interested readers.

2009-06-11:

In macro ExpandSubstring changed the method used to delete the remaining space at start of lines with substrings because of a bug detected with UltraEdit for Windows v15.00.0.1048.

2009-11-13:

Added the attention at top of the post.

2011-06-13:

Updated the macro to support also languages with up to 20 color groups.

2012-05-29:

Modified the macros SortLanguage and ReconvertWords to get in HTML and XML wordfiles the strings <? and ?> listed as <? ?> instead of ?> <? on a line after sort. This modification has no effect on syntax highlighting. <? ?> is just the better order for these 2 strings.

2013-01-06:

Rewrote the submacros CollectCase and CollectNocase and added submacro WrapLines for faster and better collecting words starting with same character on lines wrapped after column 106.

Modified main macro SortLanguage once more for better sorting HTML/XHTML and XML tags. The sort order is now

<tag <tag> </tag>

instead of

<tag> <tag </tag>

These 2 changes resulting in a better output after using macro SortLanguage have no effect on syntax highlighting based on wordfiles sorted already before with this macro.

Space characters at beginning of lines with words are now removed too. In previous versions such a space character at beginning of lines with words resulted in a blank line within a color group.

Apr 17, 2006#22006-04-17T20:27+00:00

Hello wordfile creator!

Here is an add-on macro set for the syntax highlighting sort macro set. During execution of the macro SortLanguage duplicate words within a color group are automatically removed. But words existing more than once in different color groups are not detected by the macro SortLanguage.

This macro set with the main macro TestForDuplicate is designed to test a language definition for duplicate words in different color groups and creating a report. The macro TestForDuplicate never modifies the wordfile with the language definition. It only creates a report of possibly found duplicate words in a new temporary file which is never saved. So this macro set does not change anything on your hard disk.

TestForDuplicate recognizes the language definition keyword Nocase for the test. But in comparison to the macro SortLanguage it does not detect and correct a wrong written Nocase specification. If Nocase is written wrong the words of the language definition are interpreted case-sensitive by the macro TestForDuplicate as UE/UES do too.

The macro TestForDuplicate can be run before or after the macro SortLanguage. But I suggest using TestForDuplicate after macro SortLanguage because SortLanguage corrects the keyword Nocase and also automatically removes duplicate words within a color group.

Duplicate words in different color groups are not really bad because they have no bad effect on the general syntax highlighting. The only problem with duplicate words is that a word is maybe not highlighted with the expected color. And maybe duplicate words decrease the speed of the syntax highlighting engine of UE/UES a little because of the higher amount of words.

How duplicate words are handled by UE/UES?

While developing this macro set I checked how UE/UES handles duplicate words and conflicts with substring definitions. For better understanding here is an example of a language definition and how the words are highlighted in a test file.

Language definition:

/L20"DuplicateWords" Noquote File Extensions = TXT
/Delimiters = # ,
/C1"Red"
** # p_x w_
redword
/C2"Green"
** p_ w_x
/C3"Maroon"
#
p_keyword p_xkeyword
redword
w_keyword w_xkeyword

Content of a test file with resulting syntax highlighting:

redword p_xredword w_redword #anyword # p_keyword p_xkeyword w_keyword w_xkeyword p_greenword w_xnotgreen

Explanation:

The word redword is defined at color 1 and 3. It is ignored at color 3. So if a word is defined more than once it is always highlighted with the color with the lower color number.

All words starting with #, p_x and w_ are highlighted with color 1, except the 5 words defined at color 3. You can see here that UE/UES handles 100% word matches with a higher priority than substring definitions even if the words are defined in a color group with a higher number than the color group of the substrings.

Last at color 2 there are 2 additional substring definitions. The first one p_ starts like p_x at color 1. So it is not surprising that words starting with p_x are highlighted with color 1 if they are not defined as normal words in any of the 20 color groups. But all words starting not with p_x but with p_ are highlighted with color 2. So if there are multiple substring definitions that one with the lower color number has a higher priority. This is the reason why the second substring definition w_x at color 2 is useless. The substring definition w_ at color 1 already highlights all words starting with w_ if the words are not defined as normal words in any of the 20 color groups.

The macro TestForDuplicate handles rule 1 and 2 correct. Duplicate words are reported. If a substring definition also matches a normal word, the macro accepts it without any message because it is correct.

A substring match – two or more substrings start with the same characters – is always reported by the macro. TestForDuplicate is not capable to find out if the substring definitions which match are correct or not. The user has to evaluate it. But I think, such substring matches are very rare.

Usage of macro TestForDuplicate

The usage of the macro TestForDuplicate is as simple as for macro SortLanguage. Set the caret anywhere within the language definition you want to test for duplicate words and start the macro TestForDuplicate. That's all, lean back and look what's going on.

If the macro finds no duplicate words or matching substrings the report contains only following line:

Code: Select all

Congratulations! No duplicate words found.

If the macro finds duplicate words or matching substrings the report looks like the report for the language definition above:

Code: Select all

Sorry! Found following duplicate words:

** p_ -> in /C2"Green"
** p_x -> in /C1"Red"

** w_ -> in /C1"Red"
** w_x -> in /C2"Green"

redword -> in /C1"Red"
redword -> in /C3"Blue"

Now you have to look at this report and you should remove the duplicates in the wordfile.

All macros should have following properties:

Show Cancel Dialog for this macro ............ disabled
Continue if a Find with Replace not found ... enabled ( < UE v13.10a+2)
Continue if search string not found ........... enabled (>= UE v13.10a+2)
Hotkey = none

You can assign a hotkey to macro TestForDuplicate if it is used frequently. Never run the submacros manually!

Remove the green comment lines with the blank line above before copying the instructions to the macro edit window. The comments here are only for experts who want to know how the macros work.

The macros use the UltraEdit style regular expression engine. If you prefer the UNIX or Perl compatible regular expression engine you have to insert the macro command UnixReOn or PerlReOn before every macro exit. Search for UnixReOn to find the 3 exit positions.

To use this macro set you need at least v8.20 of UltraEdit or UEStudio. The macros were developed and tested with UE v10.10c, v11.20a and v12.20a.

If you find any bugs or have other related questions, post it here.

DISCLAIMER

THIS MACRO SET IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, STATUTORY OR OTHERWISE, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO USE, RESULTS AND PERFORMANCE OF THE MACRO SET IS ASSUMED BY YOU AND IF THE MACRO SET SHOULD PROVE TO BE DEFECTIVE, YOU ASSUME THE ENTIRE COST OF ALL NECESSARY SERVICING, REPAIR OR OTHER REMEDIATION. UNDER NO CIRCUMSTANCES, CAN THE AUTHOR BE HELD RESPONSIBLE FOR ANY DAMAGE CAUSED IN ANY USUAL, SPECIAL, OR ACCIDENTAL WAY OR BY THE MACRO SET.

DOWNLOAD

You can download the macros as a ready to use macro file with the description written here. File to download is SyntaxTools.zip.

Acid · Apr 21, 2006#32006-04-21T16:51+00:00

Mofi... your a god! Thanks for making this macro =) It's quite useful =)

Mofi · May 01, 2007#42007-05-01T20:35+00:00

Hello wordfile creator!

Here is an add-on macro for the syntax highlighting sort and the test for duplicate words macro set (see above). During execution of the macro SortLanguage duplicate words within a color group are automatically removed. And the macro TestForDuplicate finds and reports duplicate words in different color groups which then can be removed by the user. But both do not find and report invalid word definitions.

The macro TestForInvalid is designed to test a language definition for invalid words in all color groups and creating a report. The macro TestForInvalid never modifies the wordfile with the language definition. It only creates a report of possibly found invalid words in a new temporary file which is never saved. So this macro does not change anything on your files.

The macro TestForInvalid can be run before or after the macros SortLanguage and TestForDuplicate. But I suggest using TestForInvalid after the other macros to decrease the number of possible invalid words if those invalid words were additionally duplicate.

Invalid words are not really bad because they have no bad effect on the general syntax highlighting. The only problem with invalid words is that a word is maybe not highlighted with the expected color. And maybe invalid words decreases the speed of the syntax highlighting engine of UE/UES a little because of the higher amount of words.

What is an invalid word?

To answer that question first it must be explained and understood how the syntax highlighting engine of UltraEdit and UEStudio works.

A text file is nothing else than a more or less large sequence of characters. Some rules are necessary to be able to interpret this sequence of characters and convert it into something which can be understood by you (your brain) or a program like UE/UES or a compiler. For programming languages there are 4 main rules which work all according to the same principle: lines, comments, strings and delimiters.

Lines

When a text file is read by a program the first rule used is: scan for line termination character(s) to split up the big sequence of characters into smaller parts called "lines". But this is already more complicated as it might be because there are 3 standards:

For MS-DOS and Windows text files the character sequence carriage return with a following line-feed is used as line termination sequence.
For UNIX text files the line-feed character is enough to split up the content of a text file into lines.
And for MAC text files the carriage return character specifies the end of a line.

The character carriage return has the byte code 13 decimal or 0D hexadecimal and is often specified in strings as \r or ^r.
The character line-feed has the byte code 10 decimal or 0A hexadecimal and is often specified in strings as \n or ^n.

So a general text editing program like UltraEdit must be already capable to handle 3 different formats of a text file. But that split up into lines becomes even more complicated if a text file contains more than 1 of the 3 formats above. This is often caused by a programming error like on Windows operating systems opening a file for write in text mode, but printing to the file with "\r\n" which results in a file containing 2 carriage returns before the line-feed instead of only 1. In text mode every \r\n is automatically converted into a \n when reading the sequence of characters from a text file. And when writing to a text file opened in text mode, every \n is automatically written as \r\n. So if the programmer uses in the program code \r\n when writing to a text file in text mode, the program writes \r\r\n and the line ending detection problems start. Also in PHP, Perl and other scripts for HTML files the line termination is often mixed because of wrong handling by the script developer. And also using the wrong FTP transfer mode when transferring text files with FTP between UNIX servers and Windows computers is a source of creating text files with line terminations which follow none of the 3 standards above.

Comments

A comment is a sequence of characters which should be ignored by the program when interpreting the content of a file. But how to identify in the big sequence of characters of a file now such a comment character sequence, even if this file is split up often already into lines?

Most comments are specified by special character sequences like

/*   for Block Comment On and
*/   for Block Comment Off or
//   for Line Comment On

for example for C/C++. The rule is quite simple and best explained with an example.

If /* is found in the sequence of characters in a file, a block comment starts and it ends if */ is found. For line comments the same rule is used. The only difference is that the Comment Off character sequence is predefined with the line termination character(s).

This simple rule for comments can be further extended. Some interpreters for example support nesting block comments where several Block Comment On/Off sequences can be inside a block comment and counted by the program which reads it to find the Block Comment Off sequence which belongs to the first Block Comment On sequence instead of ending the block comment on first occurrence of the Block Comment Off sequence.

For line comments there are sometimes also several other additional rules. Most of such extended rules for line comments exist when the Line Comment On is a single character instead of a character sequence. The developer of such a language definition maybe thought, it is more easily to use special rules instead of simply add a second character to the Line Comment On definition to avoid misinterpretation when reading the character sequence of the file. I personally often cannot follow this thoughts and I'm a programmer.

Strings

After splitting up the character sequence of a file into lines and editing out those parts which are comments and so are ignored for most other evaluations, the next step is often to find strings. A string is a sequence of characters which has a special meaning for various reasons and so the characters of a string should be always hold together and care must be taken when modifying this sequence. But how to identify in the number of remaining character sequences in a file such string character sequences?

Wherever strings are possible there is always at least 1 special character which identifies the start and end of a string character sequence. Often used is the double quote character. When this character is found in the sequence of characters in a file, a string starts, and it ends when the same character is found again. This simple rule can be extended by several other rules like a second string identifying character like the single quote character or an escape character (for example the backslash) which means that after the starting string identifying character the character following the escape character never ends the string sequence. Some languages also have the rule that a string sequence must end before the line termination character(s). For those languages DisableMLS (disable multi-line string) should be used. For other languages like HTML or C/C++ multi-line strings are possible (often with an extra rule) and for those languages EnableMLS can be used.

Delimiters

After applying the rules for lines, comments and strings there are still enough character sequences which must be further divided into many smaller parts which human call "words". This is done by using the same method as above. A set of characters has to be defined which delimits those character sequences into words. Everything between 2 delimiters is a word. Do you understand what the sentence before means?

The delimiter characters define what a word is and not the characters of a word!

For example look on highlighting. I'm sure, you will read this as 1 word. But why, because it contains also the words high and light which you also know as words? You interpret it as 1 word because of the delimiter space on the left side and the delimiter point on the right side. So never forget, the delimiters define what a word is. Without the delimiters the character sequences of a text file cannot be read and interpreted by you or a program. Look on the 'C' code example below:

printf("Found %u error%s!\n",errorcount,/* no 's' by exactly 1 error */errorcount==1?"":"s");

This is a valid code line for a 'C' compiler and 'C' programmers can also read it with syntax highlighting. But it would be much more difficult to read for 'C' programmers without syntax highlighting because our brain is trained to use only a small set of delimiters which is needed for reading text. The code example above is far away looking like a normal text.

Back to the question of this section: What is an invalid word?

Now the answer should be simple: Every character sequence in the color groups which contains a delimiter character in combination with other delimiter characters or normal characters. The delimiters specify what a word is and of course every delimiter itself is also a single character word. So it is simply not possible that a delimiter character is at start (see exception below), in the middle or the end of a word. And 2 or more delimiters cannot be combined to a word.

Which characters should be specified as delimiters for syntax highlighting?

Which characters should be specified as delimiters for syntax highlighting depends on the rules of the program used to read the text file you write and edit. As a general rule the space character must be specified as delimiter character. This is needed because the space character is the main delimiter character for the wordfile itself which is also a text file. And that answers the following question which is often asked by users not understanding how the syntax highlighting engine works:

Is it possible to define a word with a space?

No, that is not possible because the space is a delimiter for wordfiles and the delimiter characters specify what a word is. So it is not possible to define a character sequence with a space character to be interpreted as a word.

Often also a delimiter and often forgotten is the tab character which is a not visible character like the space. Don't forget to specify the tab character in the set of delimiters. How the tab character is interpreted and displayed can vary. It depends on the tab stop value(s) for the current file. Or it is like for HTML always displayed as a single space (except in a preformatted text area). Be careful when copying a wordfile definition from the browser window into a text file. Make sure you have a real tab character in the set of delimiter characters after pasting the text into the text file.

The line termination characters carriage return and line-feed are for text files always delimiter characters and cannot be specified extra as delimiter characters.

The characters which specify block and line comments and strings should be also always defined as delimiter characters. This is not absolutely necessary because the text file is interpreted in the order written above, but it should be done.

Operators and braces of any kind are for programming languages also delimiter characters. A color group with operators contains often also invalid words because of a combination of delimiter characters. For example == or != are invalid words if the equal sign and the exclamation mark are delimiters. Such operator specification mistakes in the word list for syntax highlighting is often not detected because the = and the ! are also specified in the word list as single character in the same color group and so nobody can see that for example != is highlighted with the color of ! and the color of = and the combined character sequence != is simply useless in the list of words. Remember, the delimiter characters specify what a word is.

Something special is the usage of marker characters with a definition like:

/Marker Characters = "[]%%"

Marker characters are a variant of strings. Every pair of marker character specifies a sequence of characters to be highlighted with one color. But in comparison with strings the start and end character for such a special highlighted or marked character sequence must not be identical like [ ] above shows. But since UE v9.20 marker characters can have the same start and end characters like %% above shows too. A character sequence highlighted with a marker string cannot span over a line termination (like single-line strings).

The marker characters should be also specified as delimiter characters.

And often most other special characters in the ASCII table are used as delimiters too. You can also use ANSI characters as delimiters, but not Unicode characters because the wordfile must be an ASCII/ANSI file. Here is a very often used delimiters definition:

/Delimiters = ~!@%^&*()-+=|\/{}[]:;"'<> ,tab.?

Note: tab it the line above is in real the tab character.

The delimiter characters are always case-sensitive independent of the keyword Nocase in the language definition line. But this is important only for letters which are normally not used as delimiters.

Usage of macro TestForInvalid

The usage of the macro TestForInvalid is as simple as for macro SortLanguage or TestForDuplicate. Set the caret anywhere within the language definition you want to test for invalid words and start the macro TestForInvalid. That's all, lean back and look what's going on.

If the macro finds no invalid words the report contains only following line:

Code: Select all

Congratulations! No invalid words found.

If the macro finds invalid words the report looks like the report below for language PHP in standard wordfile.txt of UltraEdit v13.00a:

Code: Select all

Sorry! Found following invalid words:

!=                                      <- contains the delimiter:  =
&&                                      <- contains the delimiter:  &
*=                                      <- contains the delimiter:  =
++                                      <- contains the delimiter:  +
+=                                      <- contains the delimiter:  =
--                                      <- contains the delimiter:  -
-=                                      <- contains the delimiter:  =
.=                                      <- contains the delimiter:  =
/=                                      <- contains the delimiter:  =
<=                                      <- contains the delimiter:  =
==                                      <- contains the delimiter:  =
||                                      <- contains the delimiter:  |
class.com                               <- contains the delimiter:  .
class.dir                               <- contains the delimiter:  .
class.dotnet                            <- contains the delimiter:  .
class.variant                           <- contains the delimiter:  .

Now you have to look at this report and you should remove the invalid word in the wordfile or modify the set of delimiter characters.

For correct identifying invalid word definitions in the list of words the macro has to apply some special rules.

The (visible) delimiters can be also specified in a color group as single character words. But a combination of delimiter characters is not valid which is a frequent mistake in the color group for operators (see example report above).
Since UE v10.00 it is allowed that a word definition starts with a delimiter character like the HTML entities in standard wordfile.txt which starts with & although this character is also a delimiter character. The & must be like the ; a delimiter or the entities would not be highlighted correct. But delimiters are not allowed anywhere else except as first character. That's the reason why the semicolon of the HTML entities is specified separate although that means all semicolons in the text of HTML files are then highlighted with the color of HTML entities, not only when found at the end of a HTML entity.
For language definitions which contain the case-sensitive keywords HTML_LANG or XML_LANG in the first line, words starting with < or </, and/or ending with > or /> or = are allowed, even when the 4 characters are also delimiters.
Every marker character pair must be specified like a word in a color group, although the marker characters can and should be also delimiter characters.

If you are interested in how the macro handles these four rules, read the comments for macro TestForInvalid in the macro code file SyntaxTools.uem.

You can assign a hotkey to macro TestForInvalid if it is used frequently.

The macro uses the UltraEdit style regular expression engine. If you prefer the UNIX or Perl compatible regular expression engine you have to insert the macro command UnixReOn or PerlReOn before every macro exit. Search for UnixReOn in the file SyntaxTools.uem to find the 4 exit positions. The macro source code is not included here.

To use this macro you need at least v8.20 of UltraEdit or UEStudio. The macro was developed and tested with UE v10.10c, v11.20a, v12.20b+1 and v13.00a+2.

If you find any bugs or have other related questions, post it here.

DISCLAIMER

THIS MACRO SET IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, STATUTORY OR OTHERWISE, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO USE, RESULTS AND PERFORMANCE OF THE MACRO SET IS ASSUMED BY YOU AND IF THE MACRO SET SHOULD PROVE TO BE DEFECTIVE, YOU ASSUME THE ENTIRE COST OF ALL NECESSARY SERVICING, REPAIR OR OTHER REMEDIATION. UNDER NO CIRCUMSTANCES, CAN THE AUTHOR BE HELD RESPONSIBLE FOR ANY DAMAGE CAUSED IN ANY USUAL, SPECIAL, OR ACCIDENTAL WAY OR BY THE MACRO SET.

DOWNLOAD

You can download the macro in a ready to use macro file with the description written here. File to download is SyntaxTools.zip.

maryv · May 09, 2007#52007-05-09T22:28+00:00

As usual, Mofi, a very slick piece of work! Thanks for sharing your knowledge and expertise so freely with the community.

Mofi · Apr 27, 2009#62009-04-27T09:25+00:00

Hello wordfile managers!

My SyntaxTools package contains now lots of additional macros ready for interested users for managing wordfiles and their color and font style settings. Included is also a full documentation for the additional macros in HTML with additional information about the color and font style settings stored since UE v15.00 and UES v9.10 in the wordfiles instead in the INI or registry. File to download is SyntaxTools.zip.

The additional macros are:

SettingsCopyAll
Copies the color and font style settings from an opened UltraEdit Syntax Highlighting color scheme file (or a wordfile with all the settings) to every language in an also opened wordfile with 1 or more languages. Existing color and font style settings in the wordfile are removed automatically before the settings are copied into the wordfile.
SettingsCopyCur
Copies the color and font style settings from an opened UltraEdit Syntax Highlighting color scheme file (or a wordfile with all the settings) to the language the caret is currently set in an also opened wordfile with 1 or more languages. Existing color and font style settings of the language in the wordfile are removed automatically before the settings are copied into the wordfile.
SettingsCopyIni
Copies the color and font style settings for a language in the wordfile from the corresponding [Language x Colors] section in an opened INI file to the opened wordfile. Use this macro if UltraEdit / UEStudio has not moved the language color settings correct from the INI to the wordfile after upgrade and therefore the colors are lost.
SettingsDelAll
Deletes all color and font style settings of UE v15.00+ and UES v9.10+ in a wordfile.
SettingsDelCur
Deletes the color and font style settings of UE v15.00+ and UES v9.10+ for the language the caret is currently set in a wordfile.
SettingsExtract
Copies the color and font style settings of the current wordfile (current language only) into a new file.
SplitToWordfiles
Splits up a wordfile with several languages into several wordfiles with 1 language per wordfile.
WordfilesMerger
Merges several wordfiles with 1 or more languages together to a new wordfile. It also renumbers the languages and resorts them. This macro can be also very useful for users with UltraEdit < v15.00 for adding correct a downloaded wordfile to the standard wordfile.

None of these macros saves automatically the updated or created new file(s). Saving must be done always by the user giving the user the possibility to review the changes before saving them.

Contributions like suggestions for further enhancements, bug reports, etc. are highly welcome.

rhapdog · May 04, 2012#72012-05-04T14:41+00:00

Mofi,

Recently, as I made note of in another post, I discovered that it is possible to have some of the language definition keywords on separate lines instead of on the first language definition line (/L# line). While language markers (HTML_LANG, etc.) *must* remain on the language definition line, keywords like Nocase can be moved to be contained on a line by itself by adding a slash, like so:

Code: Select all

/L3"HTML" HTML_LANG 
/Nocase 
/Noquote 
/Block Comment On = <!-- 
/Block Comment Off = --> 
/File Extensions = HTM HTML SHTML HTT HTA HTX CFM JSP PHP PHTML ASP TMPL ASAX ASHX ASMX ASPX ASCX

While this should be greatly discouraged, as I have noticed some syntax highlighting bugs crop in when certain items are done this way, it is possible and I have come across wordfiles from others that are done in this manner.

The reason I bring this up here is this:
Currently, the macro will only search for Nocase on the language definition line, and if not found, sort accordingly.
However, /Nocase may be on any line before /C1, which will cause the file to be sorted incorrectly by this macro.

I am not suggesting that you rewrite the macro to handle this, unless you just want to. I'm sure it is an extremely rare instance. However, I thought people should be aware, and, if they are having problems with a wordfile working correctly that has been sorted by this macro, perhaps it is something they can look for and correct by moving the Nocase to the first line.

As I create the sorting capability for my tool, I will to take this into account. However, my tool is a long way off. I'm only just now able to get started on it, and won't have a great deal of time to devote to it. Creating a standalone tool that is capable of replacing your macros will take a great deal of care and a great deal of coding. It's not as simple as calling a sort routine as you well know. Lots to consider and account for. Getting started on it finally has made me appreciate all the more how much work and dedication went into creating these macros in the first place, and I already appreciated it a lot before.

Mofi · Sep 20, 2013#82013-09-20T05:27+00:00

With UltraEdit for Windows v20.00 the color and font style settings management changed once again. Instead of storing the colors and font style settings in the wordfiles itself for the color groups as prior UE for Windows v20.00, the settings are stored now in the UltraEdit theme files.

On first start after installing the upgrade the user is asked for choosing the layout and the theme to use. It is possible to select Keep my existing color settings which results in importing the color and font style settings from the existing wordfiles in directory "%APPDATA%\IDMComp\UltraEdit\wordfiles\" into the created UltraEdit theme file.

But there are some limitations for this import by UltraEdit:

The color and font style settings are not removed from the wordfiles although not further read from the wordfiles.
The import works only for *.uew in directory "%APPDATA%\IDMComp\UltraEdit\wordfiles\". It does not work for *.uew files in a different directory as specified at Advanced - Settings or Configuration - Editor Display - Syntax Highlighting.
Color and font style settings from a project specific wordfile are whether imported to the theme file on opening the project nor further read from the wordfile.

The script WordfileColorsToTheme.js is written for all those cases.

As explained at top of the script in the first comment block, the script is designed for extracting all color and font style settings from all currently opened wordfiles containing each only one syntax highlighting language and creating a new file with all the settings converted to XML format of the UltraEdit theme files. This block can be copied next into the currently used theme file. For details on usage of the script see the comment block at top of the script.

As the script does not support multiple languages in a wordfile, please use first the macro SplitToWordfiles from the macro file also included in SyntaxTools.zip containing also the script file to create one wordfile per language, close the wordfile with the multiple languages and use now the script WordfileColorsToTheme.js for extracting the color and font style settings and creating the block for the UltraEdit theme file.

kmccabe · Oct 15, 2015#92015-10-15T19:54+00:00

Upgraded UltraEdit from 19 to 22 (64-bit) and noticed my syntax highlighting was no longer being respected. Initially thought it was the 32-bit to 64-bit install that did it but after researching found this post and realized it was the move of colors from wordfiles to themes.

My problem was due to keeping wordfiles in a different directory as Mofi astutely pointed out.

Ran the script WordfileColorsToTheme.js and followed the instructions and still my colors did not show up in the Theme Manager.

Long story short, the script did a brilliant job (thank you, Mofi!). The problem turned out to be one of my language names had an AMPERSAND & in it! I removed it and XML was happy. And so was I.

Hope this helps somebody.

Kevin McCabe

UltraEdit, UltraCompare, UEStudio forums