Defining string highlighting for a language without an escape symbol

Defining string highlighting for a language without an escape symbol

5
NewbieNewbie
5

    Apr 01, 2010#1

    Hi All,

    I just installed UltraEdit and started building language support for Stata, it's an interpreted language that has some similarities with c-type languages but lot's of differences making syntax highlighting tricky, I've tried many editors with varying success but still looking for the 'perfect' one (I guess that's how I got to UE) ...

    So, in Stata here are a) simple strings - "this is a simple string" that are enclosed in double quotes; and b) strings containing " character that are surrounded in so called compound double quotes - `"this is a string that contains " symbol"'

    Double quotes are symmetric both open and close quotes are the same " symbol (ascii 34);
    Compound double quotes open with ` (ascii 96) + double quote, and close with double quote + single quote (ascii 39)

    So as long as the string is in the compound double quotes it can contain any number of (unbalanced) " symbols. In addition, compound double quotes can be nested, so I can have something like:

    `"`"this is the first string"' `"this is the second one"'"' ...

    I kind of figured making this work with just a wordfile parameters won't work (but would love to be wrong) the question is, can it be done with any other means? I really like ultraedit and write almost exclusively in Stata so building good suport for the language (even if this will involve need for many custom scripts) is important to me.

    I will appreciate any advice you might have.
    Thanks,
    Zura

    6,602548
    Grand MasterGrand Master
    6,602548

      Apr 01, 2010#2

      My first approach to highlight your strings correct was to use in the first line of the wordfile String Chars = " DisableMLS and additionally

      /Marker Characters = "`'"
      /C1"Compound quoted strings"

      `'

      But that produced a wrong result for your last example with the nested strings.

      So I thought about making all string highlightings with marker characters by replacing String Chars = " DisableMLS in the first line of the wordfile by Noquote and using following:

      /Marker Characters = "`'"""
      /C1"Strings"

      ""
      `'

      That worked better, but did not highlight "' at end of your nested string example. But " and ' are listed also on the line starting with /Delimiters = and therefore I simply added two more lines to color group one.

      /Marker Characters = "`'"""
      /C1"Strings"

      " ""
      '
      `'

      Now the highlighting was correct, although to be honest the nested string is not really correct interpreted. Last just for interest I finally used 2 color groups for the 2 marker string pairs.

      /Marker Characters = "`'"""
      /C1"Double quoted strings"

      " ""
      /C2"Compound quoted strings"
      '
      `'

      That shows also that the nested string example is not really correct highlighted. Also putting something between the 2 nested strings like when using

      `"`"this is the first string"' and `"this is the second one'"'

      it shows that the solution I found is not really working because and is not highlighted with one of that two colors.

      But than I had a strange idea. Instead of using marker characters to highlight strings using compound quotes what happens when using the block comment feature because nesting of block comments is possible. So I deleted all the definitions listed above and used following in the first line of the wordfile:

      Block Comment On = ` Block Comment Off = ' NestBlockComments String Chars = "

      Strange definition, but working wonderful as long as ` and ' are not used anywhere else outside a string definition.

      Well, better would be to use

      Block Comment On Alt = ` Block Comment Off Alt = ' NestBlockComments String Chars = "

      because for the alternate block comment a color could be set different than the color for line and block comments. But this did not produce the same result which is probably a bug of UltraEdit with nested block comments. I will report this issue to IDM.
      Best regards from an UC/UE/UES for Windows user from Austria

      5
      NewbieNewbie
      5

        Apr 02, 2010#3

        Hi Mofi,

        Great idea! I've tried all those different options of defining markers etc but never thought of using block comments for this :)

        One thing though, `' symbols are used elsewhere too; Stata language has the concept of a -macro- kind of variables in the program. A macro has a name and
        contents; Everywhere a punctuated macro name appears in a code - macro contents are substituted for the macro name.

        for example, we define here a macro named hello :

        Code: Select all

        local hello = "Hello World!" // macros can be local or global to the procedure defining it
        and then to refer to the contents of that local macro we can write:

        Code: Select all

        display `hello'
        Macros can be embedded everywhere, in left of right side of any expression, in the strings, nested in other macros etc

        Code: Select all

        /*
        the following code will define 10 macros named m1 m2 ... m10
        and m7 will contain string "this is macro named m7"
        */
        forvalues i = 1/10 {
        	local m`i' = "this is macro named m`i'"
        }
        
        // so if I want to print contents of all these macros, I will write
        forvalues j = 1 / 10 {
        	display `m`j'' // notice nesting
        }
        If I want to define a string that contains " symbol I'll use the compound double quotes

        Code: Select all

        local str1 `"this is text with " in it"'
        and when referring to this macro, I'll have to use compound quotes again:

        Code: Select all

        display `"`str1'"'
        ...

        You can imajine how ugly the code line will look when I have multiple levels of nesting of macros containing macros that evaluate to strings that might contain double quotes :) ... so it can be a fun experiense to try and define all this logic correctly...

        Thanks again for your input,
        zura

        6,602548
        Grand MasterGrand Master
        6,602548

          Apr 02, 2010#4

          What about using following definition:

          /L20"Stata" Line Comment = // Block Comment On = `" Block Comment Off = "' Block Comment On Alt = /* Block Comment Off Alt = */ NestBlockComments String Chars = " DisableMLS File Extensions = *
          /Delimiters = ! "tab%&'()*+,-/:;<=>?@[\]^{|}~`

          with tab is a placeholder for the horizontal tab character. I think, this looks quite good on your examples. It is not possible for UltraEdit to know that str1 is a macro name and not a string. That would mean that Ultraedit would need a built-in Stata interpreter just for highlighting.

          The remaining main problem with highlighting is that it would be better to exchange block comment and alternate block comment definition to be able to highlight the real block and line comments with the comment color and the strings using compound quotes with a different color.

          In the meantime I got from IDM support the reply to my report of the wrong highlighting when using the alternate block comment for the compound quoted strings. IDM support could reproduce the wrong behavior according to my description and the attached test files and added this problem into the bug tracking database for being fixed by the developers.
          Best regards from an UC/UE/UES for Windows user from Austria

          5
          NewbieNewbie
          5

            Apr 02, 2010#5

            Thanks again Mofi, this looks good, now I have to define keyword groups with colors, function list expressions and I'll be ready to resume working :)

            ...
            not exactly related but I'll ask it here anyways:
            is it possible to capture and remove use of an optional keyword in function definition? Let me explain what I mean:
            In Stata functions can be defined like this:

            program [define] name
            ...
            end


            so if I write the search string

            /TGFindStr = "%[ ^t]++program[ ^t]+define[ ^t]+^(*^)$"

            everything is nice but obviously this will not find short definitions like:

            program name1
            ..
            end


            but if I use

            /TGFindStr = "%[ ^t]++program[ ^t]+^(*^)$"

            then function list shows word 'define' in front of the program name. Is there a way to capture both cases?

            Thanks,
            zurab

            236
            MasterMaster
            236

              Apr 03, 2010#6

              What about https://www.ultraedit.com/resources/wf/stata7.uew? I don't know Stata, and I don't know if the existing wordfile is any good, but perhaps it can save you some work.

              6,602548
              Grand MasterGrand Master
              6,602548

                Apr 03, 2010#7

                With one function string regular expression it is not possible with the UltraEdit regexp engine, perhaps with the Perl engine. However, using two regular expressions do the job. The UltraEdit regular expressions using pre UE v16.00 function string are:

                /Function String = "%[ ^t]++program[ ^t]+define[ ^t]+^(*^)$"
                /Function String 1 = "%[ ^t]++program[ ^t]+^(*^)$"


                That two expressions can't be used using the new function string syntax of UE v16.00 because that would result in having the function names with define twice in the function list because of different search algorithm.
                Best regards from an UC/UE/UES for Windows user from Austria