Thank you for the links to the official documentation and your description. The syntax highlighting task is clearer for me now.
It looks like a Perl regular expression like this one could be used to find
valid here document blocks in a Ruby file.
(?s)(?-i)<<(?:(\w+).*?[\r\n]\1\b|[\-~](\w+).*?[\r\n][\t ]*\2\b|[\-~](['"`])(\w+)\3.*?[\r\n][\t ]*\4\b)
The problem is that a Perl regular expression string cannot be used for syntax highlighting. So if blocks found by this expression should be highlighted by UltraEdit as string (or as HTML or SQL syntax highlighted block with additional case-insensitive evaluation of identifier), it would be necessary to request a Ruby syntax highlighting enhancement with an email to IDM support for being perhaps implemented in a future version of UltraEdit. UltraEdit must remove additionally line and block comments in memory before using this Perl regular expression search string on rest of file to find valid here document blocks in the file as otherwise a match could be wrong as it can be seen on running a Perl regular expression find with this search expression on the example file content below because of
<<HEREDOC in line comment below first heredoc block.
UltraEdit supports multi-line highlighting for strings, block comments and alternate block comments at the moment.
A string is a sequence of characters outside a comment starting with one of the two possible characters which can be defined in the wordfile and ending with same character if not escaped with the character defined in wordfile as escape character. So string highlighting feature cannot be used for syntax highlighting heredoc blocks because of a here document block does not start and end with
one specific character.
There is already a block comment definition in Ruby wordfile.
So remaining for multi-line highlighting would be only the alternate block comment highlighting with UltraEdit for Windows v26.20. But alternate block comment highlighting cannot be used here because of the string, which should be interpreted as beginning of an alternate block comment, cannot be a regular expression. It must be a definite string like the string which should be interpreted as end of an alternate block comment. There is additionally the issue that the identifier exists twice in a here document block, once after
<< and once marking end of the block. For that reason an alternate block comment definition like
Block Comment On Alt = << Block Comment Off Alt = HEREDOC does not work because of
HEREDOC at beginning of the block is interpreted already as end of the block by UltraEdit. Other variants of alternate block comment definitions do not work, too. In real practice an alternate block comment definition is not possible because of the fact that the identifier string can be nearly anything and so end of alternate block comment can be also more or less anything based on rules, rules which must be taken into account by a Ruby file parsing code. A simple fixed string based parsing is not possible here.
Conclusion: There is currently only the possibility to write a Ruby heredoc syntax highlighting enhancement request and send it by email to IDM support.
PS: I don't know how to avoid wrong here document block finds by the Perl regular expression search if the Ruby file contains any invalid heredoc block. A block starting with
<<HEREDOC and having
HEREDOC not at beginning of a line, but after indenting tabs/spaces is invalid according to official documentation and results in Perl regular expression find selecting everything up to next
HEREDOC at beginning of a line or not selecting any other block in file because of no matching
HEREDOC at beginning of a line found up to end of file. A here document block with different identifier string at beginning and at end is one more problem for the Perl regular expression.
File content used by me:
Code: Select all
# Here Documents
# If you are writing a large block of text you may use a “here document”
# or “heredoc”:
expected_result1 = <<HEREDOC
This would contain specially formatted text.
That might span many lines
HEREDOC
# The heredoc starts on the line following <<HEREDOC and ends with the next
# line that starts with HEREDOC. The result includes the ending newline.
# You may use any identifier with a heredoc, but all-uppercase identifiers
# are typically used.
# You may indent the ending identifier if you place a “-” after <<:
expected_result2 = <<-INDENTED_HEREDOC
This would contain specially formatted text.
That might span many lines
INDENTED_HEREDOC
# Note that the while the closing identifier may be indented, the content
# is always treated as if it is flush left. If you indent the content
# those spaces will appear in the output.
# To have indented content as well as an indented closing identifier, you
# can use a “squiggly” heredoc, which uses a “~” instead of a “-” after <<:
expected_result3 = <<~SQUIGGLY_HEREDOC
This would contain specially formatted text.
That might span many lines
SQUIGGLY_HEREDOC
# The indentation of the least-indented line will be removed from each
# line of the content. Note that empty lines and lines consisting solely of
# literal tabs and spaces will be ignored for the purposes of determining
# indentation, but escaped tabs and spaces are considered non-indentation
# characters.
# A heredoc allows interpolation and escaped characters. You may disable
# interpolation and escaping by surrounding the opening identifier with
# single quotes:
expected_result4 = <<-'EXPECTED'
One plus one is #{1 + 1}
EXPECTED
# p expected_result # prints: "One plus one is \#{1 + 1}\n"
# The identifier may also be surrounded with double quotes (which is the
# same as no quotes) or with backticks. When surrounded by backticks the
# HEREDOC behaves like Kernel#`:
puts <<-`HEREDOC`
cat #{__FILE__}
HEREDOC
# To call a method on a heredoc place it after the opening identifier:
expected_result5 = <<-EXPECTED.chomp
One plus one is #{1 + 1}
EXPECTED
# That is an invalid heredoc block because of HEREDOC line is indented at end.
expected_result6 = <<HEREDOC
This is an invalid multi-line block.
The identifier HEREDOC must be at beginning of the line without indents.
HEREDOC
# A valid heredoc block after an invalid block.
expected_result7 = <<HEREDOC
This is a valid multi-line block.
The HEREDOC identifier is at beginning of the line.
HEREDOC
# One more invalid heredoc block definition because of different identifiers.
expected_result8 = <<~HEREdoc
This is an invalid multi-line block.
There is HEREdoc at beginning, but HEREDOC at end.
HEREDOC
# A valid heredoc block after an invalid block.
expected_result9 = <<~HEREdoc
This is a valid multi-line block.
There is HEREdoc at beginning and also HEREdoc at end.
HEREdoc
# That is once again an invalid heredoc block because of HEREDOC line
# is indented at end.
invalid_block = <<HEREDOC
This is an invalid multi-line block.
The identifier HEREDOC must be at beginning of the line without indents.
HEREDOC
# A valid heredoc block after an invalid block.
sql_result = <<-SQL
SELECT * FROM #{table}
WHERE #{type} = true
SQL
# End of here document examples.