Updated Ruby Wordfile

mlgaunnac · Nov 23, 2018#12018-11-23T04:05+00:00

I created the Ruby wordfile in the early 2000s for version 1.8. There have been a great many changes so I have edited the file to reflect version 2.5.3
The only problem I've seen is code folding when a statement has an appended conditional including if, case, loop, unless, until and while clauses (e.g. print "Hello World\n" if im_feeling_great).

The file includes the Standard Library classes. modules, methods and constants any part of which can be omitted.

I have also included my Ruby theme override file and a text file pointing out its location in the folder structure.

Mofi · Nov 24, 2018#22018-11-24T13:06+00:00

Good work! But I found a few mistakes in wordfile and some lines which could be slightly improved. I repacked the archive file with the improved files:

The improvements and corrections are:

Improvement: A dot . inside a character class definition [...] in a Perl regular expression search string is always interpreted as literal character and therefore must not be escaped with a backslash as outside a character class definition for getting the dot interpreted as literal character. For that reason the backslash left to dot inside the square brackets is removed in all three Perl regular expression search strings.
Improvement: [ \n]* at end of first and second Perl regular expression search strings means: find a space or line-feed 0 or more times. This part of the expression is outside a capturing group and so the found whitespaces are never displayed in function list which of course makes sense. So this part of the expression is completely useless and just makes search for modules and classes a very little bit slower. The same is true for [ \n\(]* which is also not really needed (and additionally the backslash would not be necessary here).
Improvement: Carriage return \r was missing in character class definition [ \t\n]* being used for finding function parameters.
Error correction: Some keywords in the color groups were not correct sorted according to:
Note that ALL words starting with the same character may be on the same line or spread across multiple lines, however if they are spread across multiple lines the lines must be one after the other with no empty lines or other lines between them.
The sorting of the keywords in the color groups is corrected in wordfile in attached ZIP file.
The resorting of the keywords makes comparing the wordfiles nearly impossible.
Error correction: Some combination of word delimiters used as operators like << or <=> in Ruby files were defined invalid in the wordfile. Only the first character of a keyword can be a word delimiter.
Improvement: Keyword URI was listed in color groups 3 and 4. URI is removed from color group 4.
Improvement: Keyword to_s was listed in color groups 3 and 5. to_s is removed from color group 5.
C:\Users\<userid>\AppData\Roaming in file ThemeOverrideLocation.txt is very often a correct path since Windows Vista, but does in real not reference correct the path to roaming application data directory of current user account. Correct is always %APPDATA%, see Windows Environment Variables.

Most of these small mistakes exist also in ruby.uew installed with UE v25.20.0.88 as I could see on comparing the wordfiles for Ruby.

I don't have any knowledge about Ruby syntax and no Ruby files for testing improved wordfile. So I can't help on code folding issue. I would need a syntactical correct example Ruby code with all possible variants being important for code folding. Then I might find a solution for wrong code folding offered on a line with an appended condition.

mlgaunnac · Nov 24, 2018#32018-11-24T18:21+00:00

Thanks for the corrections.

sma33 · Dec 06, 2019#42019-12-06T17:06+00:00

Thanks for providing this.

Does anybody know how to add support for heredocs?

Mofi · Dec 07, 2019#52019-12-07T10:48+00:00

UltraEdit for Windows v26.20 supports multi-language syntax highlighting only for HTML files as far as I know. It does not support it for other file types like Ruby files with HTML or SQL blocks embedded. At the moment the only chance to get strings highlighted in files containing multiple languages other than HTML is creating a wordfile which contains syntax highlighting definitions for all languages of such a file as good as possible. That is not easy for the combination of Ruby and HTML because of the HTML syntax is very special because of the tags.

However, there are already the language markers RUBY_LANG, HTML_LANG and SQL_LANG. So it would be possible that a future version of UltraEdit supports multi-language syntax highlighting for Ruby files which are syntax highlighted with a wordfile containing language marker RUBY_LANG with automatic detection where an HTML block inside the Ruby file begins and ends which is syntax highlighted with the wordfile containing language marker HTML_LANG and where an SQL block begins and ends which is syntax highlighted with the wordfile containing language marker SQL_LANG.

It would be necessary that users of UltraEdit request such an enhancement for multi-language syntax highlighting and that the requesting users describe also the rules for detection of the beginning and the end of HTML and SQL blocks inside a Ruby file. The description can be added to the request also with a link to page in world wide web which contains the specification for such embedded blocks. It looks like <<-SQL anywhere within a line in a Ruby syntax highlighted block marks the beginning of an SQL block and SQL at beginning of a line marks the end of the SQL block. <<-HTML anywhere within a line in a Ruby syntax highlighted block marks the beginning of an HTML block and HTML at beginning of a line marks the end of an SQL block.

It is not clear for me after reading the referenced page if SQL and HTML at beginning and at end must be found case-sensitive or can be written in any case by a Ruby writer using heredoc. Is it valid if there are normal spaces or horizontal tabs left to SQL and HTML marking end of SQL and HTML block? Are <<-HTML and <<-SQL recognized as beginning of an HTML or an SQL block also on being inside a Ruby line or block comment?

PS: I don't know anything about Ruby language syntax and about here document syntax like most likely also the UltraEdit developers. So it is really necessary to explain in full detail the syntax on requesting the multi-language syntax highlighting enhancement, best with a link to an official and public published specification.

sma33 · Dec 09, 2019#62019-12-09T08:41+00:00

While multilanguage support could be great, I think it's not neccessary for herdocs and I didn't mean that in my question.
From rubyguides:

A heredoc is a way to define a multiline string, while maintaining the original indentation & formatting.

It would be enough if that multiline string ist highlighted as string. It needs not to be highlighted as SQL or HTML or what ever.

From the official docs:

Code: Select all

expected_result = <<HEREDOC
This would contain specially formatted text.

That might span many lines
HEREDOC

Heredocs start with <<, <<-,or <<~. Then an identifiers follows, e.g. HEREDOC

The heredoc text starts on the line following <<HEREDOC and ends with the next line that starts with HEREDOC. The result includes the ending newline.

You could use any Identifier instead of HEREDOC, e.g. SQL, HTML, STR, EOF etc. Uppercase Identifiers are common, but not a must.
As far as I Know blanks or tabs are not allowed left to the identifier.

Official docs (no direct link - search for here document)

Mofi · Dec 10, 2019#72019-12-10T06:47+00:00

Thank you for the links to the official documentation and your description. The syntax highlighting task is clearer for me now.

It looks like a Perl regular expression like this one could be used to find valid here document blocks in a Ruby file.

(?s)(?-i)<<(?:(\w+).*?[\r\n]\1\b|[\-~](\w+).*?[\r\n][\t ]*\2\b|[\-~](['"`])(\w+)\3.*?[\r\n][\t ]*\4\b)

The problem is that a Perl regular expression string cannot be used for syntax highlighting. So if blocks found by this expression should be highlighted by UltraEdit as string (or as HTML or SQL syntax highlighted block with additional case-insensitive evaluation of identifier), it would be necessary to request a Ruby syntax highlighting enhancement with an email to IDM support for being perhaps implemented in a future version of UltraEdit. UltraEdit must remove additionally line and block comments in memory before using this Perl regular expression search string on rest of file to find valid here document blocks in the file as otherwise a match could be wrong as it can be seen on running a Perl regular expression find with this search expression on the example file content below because of <<HEREDOC in line comment below first heredoc block.

UltraEdit supports multi-line highlighting for strings, block comments and alternate block comments at the moment.

A string is a sequence of characters outside a comment starting with one of the two possible characters which can be defined in the wordfile and ending with same character if not escaped with the character defined in wordfile as escape character. So string highlighting feature cannot be used for syntax highlighting heredoc blocks because of a here document block does not start and end with one specific character.

There is already a block comment definition in Ruby wordfile.

So remaining for multi-line highlighting would be only the alternate block comment highlighting with UltraEdit for Windows v26.20. But alternate block comment highlighting cannot be used here because of the string, which should be interpreted as beginning of an alternate block comment, cannot be a regular expression. It must be a definite string like the string which should be interpreted as end of an alternate block comment. There is additionally the issue that the identifier exists twice in a here document block, once after << and once marking end of the block. For that reason an alternate block comment definition like Block Comment On Alt = << Block Comment Off Alt = HEREDOC does not work because of HEREDOC at beginning of the block is interpreted already as end of the block by UltraEdit. Other variants of alternate block comment definitions do not work, too. In real practice an alternate block comment definition is not possible because of the fact that the identifier string can be nearly anything and so end of alternate block comment can be also more or less anything based on rules, rules which must be taken into account by a Ruby file parsing code. A simple fixed string based parsing is not possible here.

Conclusion: There is currently only the possibility to write a Ruby heredoc syntax highlighting enhancement request and send it by email to IDM support.

PS: I don't know how to avoid wrong here document block finds by the Perl regular expression search if the Ruby file contains any invalid heredoc block. A block starting with <<HEREDOC and having HEREDOC not at beginning of a line, but after indenting tabs/spaces is invalid according to official documentation and results in Perl regular expression find selecting everything up to next HEREDOC at beginning of a line or not selecting any other block in file because of no matching HEREDOC at beginning of a line found up to end of file. A here document block with different identifier string at beginning and at end is one more problem for the Perl regular expression.

File content used by me:

Code: Select all

# Here Documents

# If you are writing a large block of text you may use a “here document”
# or “heredoc”:

expected_result1 = <<HEREDOC
This would contain specially formatted text.

That might span many lines
HEREDOC

# The heredoc starts on the line following <<HEREDOC and ends with the next
# line that starts with HEREDOC. The result includes the ending newline.

# You may use any identifier with a heredoc, but all-uppercase identifiers
# are typically used.

# You may indent the ending identifier if you place a “-” after <<:

  expected_result2 = <<-INDENTED_HEREDOC
This would contain specially formatted text.

That might span many lines
  INDENTED_HEREDOC

# Note that the while the closing identifier may be indented, the content
# is always treated as if it is flush left. If you indent the content
# those spaces will appear in the output.

# To have indented content as well as an indented closing identifier, you
# can use a “squiggly” heredoc, which uses a “~” instead of a “-” after <<:

expected_result3 = <<~SQUIGGLY_HEREDOC
  This would contain specially formatted text.

  That might span many lines
SQUIGGLY_HEREDOC

# The indentation of the least-indented line will be removed from each
# line of the content. Note that empty lines and lines consisting solely of
# literal tabs and spaces will be ignored for the purposes of determining
# indentation, but escaped tabs and spaces are considered non-indentation
# characters.

# A heredoc allows interpolation and escaped characters. You may disable
# interpolation and escaping by surrounding the opening identifier with
# single quotes:

expected_result4 = <<-'EXPECTED'
One plus one is #{1 + 1}
EXPECTED

# p expected_result # prints: "One plus one is \#{1 + 1}\n"

# The identifier may also be surrounded with double quotes (which is the
# same as no quotes) or with backticks. When surrounded by backticks the
# HEREDOC behaves like Kernel#`:

puts <<-`HEREDOC`
cat #{__FILE__}
HEREDOC

# To call a method on a heredoc place it after the opening identifier:

expected_result5 = <<-EXPECTED.chomp
One plus one is #{1 + 1}
EXPECTED

# That is an invalid heredoc block because of HEREDOC line is indented at end.

expected_result6 = <<HEREDOC
This is an invalid multi-line block.

The identifier HEREDOC must be at beginning of the line without indents. 
    HEREDOC

# A valid heredoc block after an invalid block.

expected_result7 = <<HEREDOC
This is a valid multi-line block.

The HEREDOC identifier is at beginning of the line.
HEREDOC

# One more invalid heredoc block definition because of different identifiers.

expected_result8 = <<~HEREdoc
This is an invalid multi-line block.

There is HEREdoc at beginning, but HEREDOC at end.
    HEREDOC

# A valid heredoc block after an invalid block.

expected_result9 = <<~HEREdoc
This is a valid multi-line block.

There is HEREdoc at beginning and also HEREdoc at end.
    HEREdoc

# That is once again an invalid heredoc block because of HEREDOC line
# is indented at end.

invalid_block = <<HEREDOC
This is an invalid multi-line block.

The identifier HEREDOC must be at beginning of the line without indents. 
    HEREDOC

# A valid heredoc block after an invalid block.

sql_result = <<-SQL
SELECT * FROM #{table}
WHERE #{type} = true
SQL

# End of here document examples.

sma33 · Dec 10, 2019#82019-12-10T10:40+00:00

Thank you for your great explanation how it works in UltraEdit. I will send a feature request to IDM support