User to user discussion and support for UltraEdit, UEStudio, UltraCompare, and other IDM applications.

Syntax highlighting, code folding, brace matching, code indenting, and function list
6 posts Page 1 of 1
I've searched around on the UE site, here on the forum, and elsewhere around the Internet and I can't seem to find any reference to a wordfile for Sphinx or reStructuredText (reST). Does anyone know if one exists and perhaps I just haven't stumbled across it? I've tried checking for the names individually and even some of the ones they were based on (StructuredText and Setext), no results at all.

Looking to start using Sphinx for some documentation work and having the highlighting would speed things along. If one doesn't exist, I may end up trying to craft one eventually, at least for some of the basics.
I also could not find any *.uew file which could be for Sphinx. I have all *.uew files on my hard disk and searched in the files, but nothing found. There is no wordfile which contains Sphinx, structured, toctree, testsetup, or any other keyword from the index list of Sphinx. I searched also WWW if somebody published an UltraEdit syntax highlighting wordfile somewhere else for Sphinx, but also nothing found.

Well, Sphinx is of course very well documented. It should not take too much time to create a wordfile using its index. Let us know if you need help on writing the wordfile for Sphinx or need somebody to verify the wordfile created by you.
Best regards from Austria
Below is the start of a wordfile for reST/Sphinx, but I have a couple issues with it:

#1: I needed (wanted) to use : in some color definition words because they are quite common words and I didn't want them highlighted except when used in the proper context. To get that to work, I had to remove : from the delimiters list. Not sure if there may be fallout from that I haven't hit yet. Is there an alternate way to accomplish that same goal?

#2: reST specifies basic formatting of text like so:
Code: Select all
*Italic Text*   **Bold Text**   ``literal text``   `interpreted text`

I can't seem to find a good way to make it highlight everything that should be bold or italic that is longer than one word. Neither literal nor interpreted text appears to work properly either. I tried using marker characters for interpreted text and that doesn't seem to have worked. Ideas? I know marker characters work in general because I also set <> as markers and they work fine. I thought about using ` and * as a string characters but that might break when `` or ** is used.

The Italic and Bold parts may be impossible because they also use * to denote bulleted lists so it may be fine as-is. Part of this could be that I'm using uex on Linux, I haven't tried Windows, perhaps using the same start/end marker character isn't working on uex.

#3: Any ideas on how code blocks and literal blocks might be handled? http://thomas-cokelaer.info/tutorials/s ... ral-blocks

Other than the above items, the basics are working. It's a little rough around the edges and needs some refinement (surely I'm missing keywords, etc) but it's coming along.

Code: Select all
/L20"Sphinx/reST" PYTHON_LANG Nocase Escape Char = \ EnableSpellasYouType EnableCFByIndent File Extensions = rst rsti Noquote

#
# Wordfile for Sphinx/reStructuredText
# Created by JimP - 2015-06-03
#
# Based on:
# Template for creating a new language files
# Created by Dalibor Jelinek - 9.1.2007
# Revised by Mofi last time on 05.03.2015
#
#
# The maximum length of Language Name and Color Group Name may be up to
# 24 characters long. Every name longer is truncated. In previous versions
# of UltraEdit the limit was 18 characters. A creator of a public wordfile
# should not exceed the first limit of 18 characters or at least make sure
# the first 18 characters produce a unique description for a color group
# within the language definition.
#
# All the information provided is valid to the date of writing
# and for UltraEdit v13.20a.
#
# Please post your comments here:
# http://www.ultraedit.com/forums/viewtopic.php?t=4124


# Language Definition Line
# The following declarations must be specified all on the first line
# of a language definition - the language definition line. It does
# not matter how long the line will become. Do not break the line.

# All the keywords above must be on the first language definition line.
# ---------------------------------------------------------------------
# From here below every keyword is on its own line.


# Delimiters
/Delimiters = ! "    '*`._<>[]|"

# Function Definition Strings
# Let UE pick up substitution names as functions. Close enough...
/Function String = "%.. |^([a-z]*^)| "
# And labels
/Function String 1 = "%.. _^([a-z]*^):"
# And links
/Function String 2 = "%.. `^([a-z]*^)`_"
/Regexp Type = Perl

# Indent Strings
/Indent Strings = ":"

/Open Brace Strings = "``" "`"

/Close Brace Strings = "``" "`"

# Marker Characters
/Marker Characters = "<>||_:``"

# Color Group definitions
/C1"Keywords" STYLE_KEYWORD
acks:: autoattribute:: autoclass:: autoexception:: autofunction:: automethod:: automodule:: autosummary::
centered:: c:function:: class:: clsdir:: c:macro:: cmdoption:: c:member:: code-block:: compound:: confval:: contents:: cpp:class:: cssclass:: c:type:: currentmodule:: c:var::
data:: deprecated:: describe::
envvar:: epigraph:: event:: exception::
figure:: funcdir:: function::
glossary::
highlight:: highlightlang:: highlights:: hlist::
image:: include:: index::
js:attribute:: js:data:: js:function::
literalinclude::
math:: meta:: method:: module:: moduleauthor::
note::
object:: only:: option::
parsed-literal:: productionlist:: program:: pull-quote:: py:function::
replace:: rst:directive:: rst:role:: rubric::
sectionauthor:: seealso:: sidebar:: sourcecode::
table:: tabularcolumns:: testcleanup:: title:: toctree:: todo:: todolist:: topic::
userdesc::
versionadded:: versionchanged::
/C2"Attributes" STYLE_ATTRIBUTE
:align: :alt:
:figclass: :figwidth:
:glob:
:header: :height: :hidden:
:maxdepth:
:name: :numbered:
:scale: :subtitle:
:target: :titlesonly:
:width: :widths:
/C3"Styled Text"
<>
||
_:
` ``
* **
/C4"Semantic Style"
:abbr: :any:
:code: :command:
:dfn: :doc: :download: :dudir
:emphasis: :envvar:
:file:
:guilabel:
:kbd: :keyword:
:literal:
:mailheader: :makevar: :manpage: :math: :menuselection: :mimetype:
:newsgroup: :numref:
:option:
:pep: :pep-reference: :program:
:raw: :ref: :regexp: :rfc: :rfc-reference:
:samp: :strong: :subscript: :superscript:
:term: :title-reference: :token:
/C5"Admonitions"
:admonition: :attention:
:caution:
:danger:
:error:
:hint:
:important:
:note:
:tip:
:warning:
/C6"Operators" STYLE_OPERATOR
..
I have no reStructured Text file for testing, but I think the following syntax highlighting wordfile would be better.

Note: The spaces between " and ' must be replaced after copy and paste into a Sphinx.uew file by a horizontal tab character. Web browsers display and copy a horizontal tab character according to HTML specification as sequence of spaces.

Code: Select all
/L20"Sphinx" Nocase Block Comment On = ** Block Comment Off = ** Block Comment On Alt = `` Block Comment Off Alt = `` String Chars = *` EnableMLS EnableCFByIndent EnableSpellasYouType File Extensions = rst rsti
/Delimiters = ! "   '*`._<>[]|"
/Function String = "%.. |^([a-z]+^)| "
/Function String 1 = "%.. _^([a-z]+^):"
/Function String 2 = "%.. `^([a-z]+^)`_"
/Open Brace Strings = "<" "["
/Close Brace Strings = ">" "]"
/Indent Strings = ":"
/Unindent Strings = "no_unindent"
/Marker Characters = "<>||_:"
/C1"Keywords" STYLE_KEYWORD
acks:: autoattribute:: autoclass:: autoexception:: autofunction:: automethod:: automodule:: autosummary::
c:function:: c:macro:: c:member:: c:type:: c:var:: centered:: class:: clsdir:: cmdoption:: code-block::
compound:: confval:: contents:: cpp:class:: cssclass:: currentmodule::
data:: deprecated:: describe::
envvar:: epigraph:: event:: exception::
figure:: funcdir:: function::
glossary::
highlight:: highlightlang:: highlights:: hlist::
image:: include:: index::
js:attribute:: js:data:: js:function::
literalinclude::
math:: meta:: method:: module:: moduleauthor::
note::
object:: only:: option::
parsed-literal:: productionlist:: program:: pull-quote:: py:function::
replace:: rst:directive:: rst:role:: rubric::
sectionauthor:: seealso:: sidebar:: sourcecode::
table:: tabularcolumns:: testcleanup:: title:: toctree:: todo:: todolist:: topic::
userdesc::
versionadded:: versionchanged::
/C2"Attributes" STYLE_ATTRIBUTE
:align: :alt: :figclass: :figwidth: :glob: :header: :height: :hidden: :maxdepth: :name: :numbered: :scale:
:subtitle: :target: :titlesonly: :width: :widths:
/C3"Styled Text"
<>
_:
||
/C4"Semantic Style"
:abbr: :any: :code: :command: :dfn: :doc: :download: :dudir :emphasis: :envvar: :file: :guilabel: :kbd:
:keyword: :literal: :mailheader: :makevar: :manpage: :math: :menuselection: :mimetype: :newsgroup:
:numref: :option: :pep-reference: :pep: :program: :raw: :ref: :regexp: :rfc-reference: :rfc: :samp:
:strong: :subscript: :superscript: :term: :title-reference: :token:
/C5"Admonitions"
:admonition: :attention: :caution: :danger: :error: :hint: :important: :note: :tip: :warning:
/C6"Operators" STYLE_OPERATOR
.
/C7"Interpreted Text"
`

The syntax for wordfiles does not specify any line comment. Therefore it is impossible to add comments into the wordfile itself. A wordfile like this one which needs additional information for users who want to use it should be compressed with ZIP together with a readme.txt into a ZIP file. The readme.txt file should contain the additional information, not the *.uew file itself.

Everything after File Extensions = is interpreted as file extension string. Keywords for first line like Noquote must be therefore specified always left to File Extensions = or File Names = if that is used instead.

The regular expressions are in UltraEdit syntax. Therefore /Regexp Type = Perl is removed by me. I replace also * by + after each [a-z] as * means 0 or more and + means 1 or more characters as defined in the character class. You surely want to see a string on each line in function list and not empty lines.

My ultimate syntax highlighting tools package contains HTML file TestForInvalid.htm which explains how the syntax highlighting engine of UE/UES works in detail. The delimiters break up the character stream of a text file into strings for being searched in the database created in memory according to the strings in used *.uew file. A human calls those strings "words". It depends on the language which characters are interpreted as delimiters and which one as "word" characters. The Unicode standard defines for the human languages which character belongs to which character group. There are the groups word character, punctuation mark, ... For syntax highlighting in UltraEdit there are only 2 character groups: delimiter or non delimiter. If it is suitable for Spinx / reStructured Text to put the colon to list of non delimiters for better highlighting, it is of course okay to do so.

I changed language name to just Spinx. I suggest a short language name as it is displayed in status bar in a small field. Although it is possible to use a character like / in a language name, I prefer not making use of special characters not allowed for a file name in case of a syntax language based template file is created where language name becomes part of the file name. UltraEdit is forced to replace each character in a language name not allowed for a file name by an underscore. So usage of / is no problem for UE, but if it is easy possible to avoid it, such a character should be avoided.

Spinx syntax is based on Python syntax. But as far as I understood from the referenced page, reStructured Text files are not Python scripts. Therefore I suggest not to use language marker PYTHON_LANG and use just EnableCFByIndent if that makes sense for code folding.

If an indent string is specified, an unindent string should be also specified. Even if the specified unindent string never exists in a file, such a definition makes often nevertheless sense.

Syntax highlighting of UltraEdit is based on simple strings ("words"). Highlighting blocks is possible only with strings and block comments for multi-line blocks, and line comments and marker characters for single line blocks.

With the definitions above

  • bold text is highlighted with the settings for comments;
  • literal text is highlighted with the settings for alternate block comments;
  • italic text is highlighted with the settings for strings (EnableMLS used for multi-line string highlighting);
  • interpreted text is highlighted with the settings of color group 7.
    Yes, it is possible to specify 2 characters for begin/end of strings and define different settings for one of the two string types by putting one of the 2 string characters into a color group.
The single-line strings between <...>, |...| and _...: are highlighted using the marker characters definition and color group 3.

And last as . is a "word" delimiting character, defining .. as "word" is invalid resulting in .. never being highlighted with settings of color group 6. It is only possible to specify a single dot as "word".
Best regards from Austria
Amazing, thanks! I knew some of the limitations there but it was working (for the most part) so I didn't remove all the comments. The / in the language name worked fine, at least on uex, but the '/' wasn't in the filename. The short name is probably better anyhow.

I had marked it Python because Sphinx formats Python in code blocks so I figured it would be a convenience, though the place I'm using it now wouldn't be using Python anyhow so it's no big loss.

I removed the indent/unindent as after working with that yesterday I discovered it was a bit of an inconvenience.

I saw the macro pack but was unable to use it with uex, loading it resulted in "macro error", the stand-alone tools were built for Windows so that wouldn't work either of course :-) The .htm files I did read and they were quite informative. That's ultimately how I discovered the need to remove : from the delimiters field otherwise the color groups using : in the keywords didn't function.

It works fairly well with a couple exceptions:

  • Using * as a string character for *italic* breaks the ability to use bulleted lists with an odd number of entries (lines alternate as good/bad) but taking out * as a string char breaks in other ways unless I also take out ** as a block quote. Highlighting the formatted text may just be more trouble than it's worth.
  • The formatting for substitutions (|blah_blah|) doesn't seem to work properly -- but again this may be uex on Linux vs Windows, still haven't had an opportunity to try on my Windows system.
  • Using + for the function regexp broke it somehow, not sure why it works with * and not + when it clearly should match with +. Tried with both Perl and UE regex style.
At this point I'd call it "good enough" at least for how I want to use it. Here is some sample text with a bit of basic formatting inside:

Code: Select all
.. include:: common.rsti

.. |some_text| replace:: Long bit of text to replace easily
.. |inline-image-stuff| image:: ../_static/stuff.png

Title text
----------

.. toctree::
   :maxdepth: 2

   some-other-file

Here is *some* text. :ref:`label-stuff` describes **more** stuff |inline-image-stuff|.

User input would look like ``tappity tappity``.

An inline link looks like `This <http://www.example.com>`_, while an explicit external link looks like `Some external link`_

.. _label-stuff:

More Stuff
~~~~~~~~~~

More stuff here, and see figure :ref:`figure-blah` to see stuff [#f1]_.

.. note:: I like stuff.

I am lazy, so I like to find ways to save typing, especially for a |some_text|.

.. _figure-blah:
.. figure:: ../_static/blah.png
    :figclass: align-center

    Stuff to see

* This is a bulleted list
* And another list item
* One more
* Last one I promise

# This would be an ordered list
# And another line for an ordered list
# Third time is the charm

.. seealso:: In :doc:`some-other-file`, other files are discovered.

.. _Some external link: http://www.example.com

.. rubric:: Footnotes

.. [#f1] Footnote for stuff.

And here is what I've got in the wordfile now (yours plus some slight adjustments):

Code: Select all
/L20"Sphinx" Nocase Block Comment On = `` Block Comment Off = `` String Chars = "` EnableMLS EnableCFByIndent EnableSpellasYouType File Extensions = rst rsti
/Delimiters = ! "   '*`._<>[]|"
/Function String = "%.. |^([a-z]*^)| "
/Function String 1 = "%.. _^([a-z]*^):"
/Function String 2 = "%.. `^([a-z]*^)`_"
/Open Brace Strings = "<" "["
/Close Brace Strings = ">" "]"
/Marker Characters = "<>||_:"
/C1"Keywords" STYLE_KEYWORD
acks:: autoattribute:: autoclass:: autoexception:: autofunction:: automethod:: automodule:: autosummary::
c:function:: c:macro:: c:member:: c:type:: c:var:: centered:: class:: clsdir:: cmdoption:: code-block::
compound:: confval:: contents:: cpp:class:: cssclass:: currentmodule::
data:: deprecated:: describe::
envvar:: epigraph:: event:: exception::
figure:: funcdir:: function::
glossary::
highlight:: highlightlang:: highlights:: hlist::
image:: include:: index::
js:attribute:: js:data:: js:function::
literalinclude::
math:: meta:: method:: module:: moduleauthor::
note::
object:: only:: option::
parsed-literal:: productionlist:: program:: pull-quote:: py:function::
replace:: rst:directive:: rst:role:: rubric::
sectionauthor:: seealso:: sidebar:: sourcecode::
table:: tabularcolumns:: testcleanup:: title:: toctree:: todo:: todolist:: topic::
userdesc::
versionadded:: versionchanged::
/C2"Attributes" STYLE_ATTRIBUTE
:align: :alt: :figclass: :figwidth: :glob: :header: :height: :hidden: :maxdepth: :name: :numbered: :scale:
:subtitle: :target: :titlesonly: :width: :widths:
/C3"Styled Text"
<>
_:
||
/C4"Semantic Style"
:abbr: :any: :code: :command: :dfn: :doc: :download: :dudir :emphasis: :envvar: :file: :guilabel: :kbd:
:keyword: :literal: :mailheader: :makevar: :manpage: :math: :menuselection: :mimetype: :newsgroup:
:numref: :option: :pep-reference: :pep: :program: :raw: :ref: :regexp: :rfc-reference: :rfc: :samp:
:strong: :subscript: :superscript: :term: :title-reference: :token:
/C5"Admonitions"
:admonition: :attention: :caution: :danger: :error: :hint: :important: :note: :tip: :warning:
/C6"Operators" STYLE_OPERATOR
.
/C7"Interpreted Text"
`

Thanks again for all the help and extremely informative explanations!
I tried first my version of the wordfile on your reStructured Text sample with UE for Windows 22.10.

I looked first on the function strings. I could see quickly that instead of

  • [a-z]* to match a string starting with letter A-Z or a-z and 0 or more characters not being carriage return or line-feed (greedy), or
  • [a-z]+ to match a string consisting of only letters A-Z and a-z with at least 1 character
it would be better to use [a-z_][ a-z^-_]++ (first 2) or [a-z][ a-z^-]++ (last one) in the function strings. This expression matches a string starting with a letter (or underscore) and having 0 or more letters in any case, hyphens, spaces (or underscores) (non greedy).

I get now like in function list for the sample file:

Code: Select all
figure-blah
inline-image-stuff
label-stuff
Some external link
some_text

My version of the wordfile looked quite good with changing EnableMLS to DisableMLS. Each line of the bulleted list is highlighted now like an italic text. A problem would be with this definition a bulleted list line with an italic or bold text embedded.

That with first line being

Code: Select all
/L20"Sphinx" Nocase Block Comment On = ** Block Comment Off = ** Block Comment On Alt = `` Block Comment Off Alt = `` String Chars = ` EnableMLS EnableCFByIndent EnableSpellasYouType File Extensions = rst rsti

the highlighting does not work as expected is most likely a bug of UltraEdit which I could reproduce also on Windows.

It looks like UltraEdit has a problem with block comment highlighting if block comment on and off strings are identical. Highlighting for ``tappity tappity`` is also not correct when inserting a line break between the two words. With NestBlockComments in first line of wordfile I would understand why a block comment definition with identical on and off string would not work. But without nested block comment support, it should be no problem if block comment on/off string are identical. I will report this highlighting issue definitely by email to IDM support after some more tests.

Highlighting |some_text| works fine in UltraEdit for Windows since UE v9.20 which means since more than 10 years long before development of UEx started. Please report this issue of UEx to IDM support by email. Identical start and end characters should definitely work also on UEx.

I have one more version based on your wordfile in previous post above:

Code: Select all
/L20"Sphinx" Nocase Block Comment On = `` Block Comment Off = `` String Chars = "` EnableMLS EnableCFByIndent EnableSpellasYouType File Extensions = rst rsti
/Delimiters = ! "   '#*,`<>[]|
/Function String = "%.. |^([a-z_][ a-z^-_]++^)| "
/Function String 1 = "%.. _^([a-z_][ a-z^-_]++^):"
/Function String 2 = "%.. `^([a-z][ a-z^-]++^)`_"
/Open Brace Strings = "<" "["
/Close Brace Strings = ">" "]"
/Marker Characters = "<>||_:"
/C1"Keywords" STYLE_KEYWORD
acks:: autoattribute:: autoclass:: autoexception:: autofunction:: automethod:: automodule:: autosummary::
c:function:: c:macro:: c:member:: c:type:: c:var:: centered:: class:: clsdir:: cmdoption:: code-block::
compound:: confval:: contents:: cpp:class:: cssclass:: currentmodule::
data:: deprecated:: describe::
envvar:: epigraph:: event:: exception::
figure:: funcdir:: function::
glossary::
highlight:: highlightlang:: highlights:: hlist::
image:: include:: index::
js:attribute:: js:data:: js:function::
literalinclude::
math:: meta:: method:: module:: moduleauthor::
note::
object:: only:: option::
parsed-literal:: productionlist:: program:: pull-quote:: py:function::
replace:: rst:directive:: rst:role:: rubric::
sectionauthor:: seealso:: sidebar:: sourcecode::
table:: tabularcolumns:: testcleanup:: title:: toctree:: todo:: todolist:: topic::
userdesc::
versionadded:: versionchanged::
/C2"Attributes" STYLE_ATTRIBUTE
:align: :alt: :figclass: :figwidth: :glob: :header: :height: :hidden: :maxdepth: :name: :numbered: :scale:
:subtitle: :target: :titlesonly: :width: :widths:
/C3"Styled Text"
<>
_:
||
/C4"Semantic Style"
:abbr: :any: :code: :command: :dfn: :doc: :download: :dudir :emphasis: :envvar: :file: :guilabel: :kbd:
:keyword: :literal: :mailheader: :makevar: :manpage: :math: :menuselection: :mimetype: :newsgroup:
:numref: :option: :pep-reference: :pep: :program: :raw: :ref: :regexp: :rfc-reference: :rfc: :samp:
:strong: :subscript: :superscript: :term: :title-reference: :token:
/C5"Admonitions"
:admonition: :attention: :caution: :danger: :error: :hint: :important: :note: :tip: :warning:
/C6"Operators" STYLE_OPERATOR
#
*
..
[
]
_
/C7"Interpreted Text"
`
/C8"Links"
** ../ ./ ftp:// ftps:// http:// https://

Again the spaces between " and ' must be replaced after copy and paste into a Sphinx.uew file by a horizontal tab character.

I changed the list of delimiter characters, the function strings as described, added additional characters as operators, and added new color group 8 for "links" using a substring definition.
Best regards from Austria
6 posts Page 1 of 1