Multi-language syntax highlighting for HTML files

Multi-language syntax highlighting for HTML files

15
Basic UserBasic User
15

    Feb 08, 2005#1

    I'm confused, and can't get this to work how I would expect/like:
    IDM wrote:UE v11.00 has improved syntax highlighting and it now supports multiple languages within a single file. This is specifically for HTML type files. To help facilitate this, we have added additional language indicators that should be added to the wordfile.txt file to indicate the type of language for any languages that may be included within another. Our default wordfile has these modifications.

    Example:

    If an HTML file includes PHP then the syntax highlighting section must exist in the main wordfile and the PHP section should include in the definition line: PHP_LANG

    Currently, UE uses the above language markers to correctly syntax highlight multiple languages within a file. In the future UE may make further use of these languages markers in the Wordfile.
    So, I'm expecting that with the default wordfile.txt shipped with UltraEdit 11.00 that the PHP code will be formatted as per the section defined as:

    Code: Select all

    /L8"PHP" PHP_LANG Nocase Line Comment = // Line Comment Alt = # Block Comment On = /* Block Comment Off = */ Escape Char = \ String Chars = "' File Extensions = PHP PHTML INC PHP3 PHP4
    And the HTML code in the file will be formatted as per the section defined as:

    Code: Select all

    /L3"HTML" Nocase Noquote HTML_LANG Block Comment On = <!-- Block Comment Off = --> Block Comment On Alt = <% Block Comment Off Alt = %> String Chars = "' File Extensions = HTM HTML ASP SHTML HTT HTX JSP
    However, my PHP file is lacking any highlighting in the HTML portions of the document. I'm sure that it's probably my misunderstanding of the help files, or my misconfiguration somewhere. But if anyone can get this figured out so I can have two types of highlighting working in my PHP and ASP files I'd appreciate knowing how.

    Cheers,

    Mike

      Feb 08, 2005#2

      I think, I have solved this. To syntax highlight double language files:

      1) Define HTML to take the file extensions of your PHP/ASP files:

      Code: Select all

      /L3"HTML" Noquote HTML_LANG Block Comment On = <!-- Block Comment Off = --> Block Comment On Alt = <% Block Comment Off Alt = %> String Chars = "' File Extensions = HTM HTML SHTML HTT HTX JSP HELP ASP PHP
      2) Define VBS/PHP highlighting to NOT take them:

      Code: Select all

      /L8"PHP" PHP_LANG Nocase Line Comment = // Line Comment Alt = # Block Comment On = /* Block Comment Off = */ Escape Char = \ String Chars = "' File Extensions = MODULE
      Then UltraEdit appears to open your file as L3 (HTML), but see that it is a PHP_LANG file, and thus /also/ apply /L8 (PHP) to the PHP blocks of code.

      I don't think I've got it quite right, as my tags <?php and ?> in my /C2:

      Code: Select all

      /C2"Tags"
      <?
      <?php
      ?>
      Are not rendered in tag colors, as defined by PHP highlighting.

      I've mailed IDM for clarification.

      236
      MasterMaster
      236

        Feb 08, 2005#3

        I've got the same problem here. I do get PHP highlighting in HTML files if I start the PHP section with <?PHP. I don't get it if I use <?. I don't get HTML highlighting in PHP files, either.

        What seems strange to me is that I can't find any indication in the wordfile.txt that shows me how UE determines when "foreign" code sections start. So how does UE "know" that a certain section is in PHP or ASP or whatever? It must be defined someplace, but it doesn't seem to be wordfile.txt...

        Confused,
        Tim

        15
        Basic UserBasic User
        15

          Feb 08, 2005#4

          Yes, it seems this only works one way.

          Files that are being highlighted as the file that is HTML_LANG can contain blocks of code in the supported [other]_LANG formats. In which case, then the formatter starts formatting via the language defined with [other]_LANG until the end of that block.

          Blocks appear to be hard-coded within UltraEdit, so for PHP it's:

          Code: Select all

          <?php
          // Highlighted as PHP
          ?>
          And for ASP it's:

          Code: Select all

          <%
          %>
          So, if a file is a .HTML, then:

          Code: Select all

          This is highlighted as <b>html</b>.
          <?php
          echo('this will highlight as PHP');
          ?>
          <?php echo('this will highlight as HTML'); ?>
          <%
          Response.Write "This will highlight as ASP"
          %>
          <% Response.Write "This will highlight as HTML" %>
          Which makes the highlighting a bit too hit-and-miss when working with the horrible ASP I'm maintaining. Plus, the changes mean I can't go back to my 'old' wordfile with HTML tags defined in the wordfile with definition of PHP and ASP and HTML and JSP and... because they need the "Old style" HTML_LANG support.

          Plus ASP_LANG doesn't appear to have the code-folding definitions in it. Or at least, not the ones VB_LANG has.

          Bit of a trade off for me here, I love some of the new features, but I am finding it hard to work with the new syntax highlighting.

          2
          NewbieNewbie
          2

            Feb 09, 2005#5

            Here's what I have found:

            In the past I used a combined ASP-HTML-VB wordfile to combine everything I was doing (ASP/HTML web stuff). As this does not seem to work quite right in the new version, I started playing with the new way to do things.

            Although ASP_LANG is available to be used, nothing in the default WordFile seems to define it. So, for starters, I imported my ASP-HTML-VB block. View As this type works for syntax highlighting as before, no code folding.

            Left the ASP extension solely to the standard HTML definition in the file; ASP code doesn't syntax highlight at all, but HTML code does and Code Folding works in the HTML portions.

            Defined my ASP-HTML-VB block as ASP_LANG in the wordfile; now syntax highlighting works in the ASP portion, but no code folding.

            Defined some /Open Fold Strings and /Close Fold Strings in my ASP block; now, code folding works (for what I have defined), with a weird caveat; this won't work:

            Code: Select all

            <%
            If True Then
              Response.Write "OK"
            Else
              Response.Write "NO"
            End If
            %>
            Oddly, you need something between the <% starting block and the first line for which you want to do code folding (in my case, I defined the IF as a start string), and it can't just be a blank line. A blank line with a space on it, however, works, as would a comment or some line of code with no code fold start strings. Seems to be a bug.

            However, my defined INDENT strings don't work in the ASP block; only the HTML indents are working.

            So, I'm getting closer. I can get code folding working in both blocks (HTML and ASP), although I have to manually define the ASP portion. Indent strings seem to be being ignored for the ASP portion. I could define them in the HTML block, but then they would overwrite the default indents, which I don't want to lose and have to manually then redefine.

            It would really be nice to be able to define INDENT/FOLD/etc. strings that are INCREMENTAL in nature, so that they are added to the defaults, rather than replacing them.

            Anyone have more success than this?

            15
            Basic UserBasic User
            15

              Feb 09, 2005#6

              I've been corresponding with support on this highlighting issue, plus some toolbar issues and a crash bug saving new files. They've re-created everything so far, and are fixing the toolbar and crash bug. Not sure where they are at with the syntax highlighting problems:
              IDM wrote:> I was under the impression that it could work like the PHP example in
              > the wordfile. i.e. the PHP definition contains no HTML markup, but
              > contains the PHP_LANG tag, then it picks up HTML highlighting from the
              > language definition (which is identified by being the language with
              > HTML_LANG defined).

              You did everything right. I will look into it. In the meantime it appeared to me that if I View As an HTML file it appeared to be syntax highlighted correctly.

              > Attached is a test.asp page which contains two ASP blocks and an HTML
              > block, with my wordfile, only the ASP blocks are highlighted. Rogue
              > words in the (original) html block were highlighted as they were ASP
              > keywords in the text of the html (e.g. <b>Response</b>, the "Response"
              > was highlighted as it's an ASP Keyword)

              Thanks, I have reproduced this with your information.

              2
              NewbieNewbie
              2

                Feb 17, 2005#7

                I have passed on a bunch of stuff to support, as well, and they have been able to duplicate it (mostly related to syntax highlighting with multiple languages, but a few other issues, such as the function list). This was last week, and they tell me they are working on it. The last email I received (Friday, 2/11) sounded like they were at least a week out on any fixes, though. Hopefully there will be a new hotfix or version upgrade soon.

                1
                NewbieNewbie
                1

                  Feb 18, 2005#8

                  None of these PHP+HTML highlighting problems seems to be fixed in the latest version using the patch.

                  It is strange that if you select Highlight as HTML - then everything entered after <?php is highlighted okay - but not with the short <?.

                  No matter what I try, it doesn't seem to work - although my old wordfile from UltraEdit v10.20 version seems to be working fine.

                  Anybody had any luck with fixing this?

                  2
                  NewbieNewbie
                  2

                    Mar 07, 2005#9

                    The best results have manifested since I removed

                    Code: Select all

                    Block Comment On Alt = <% Block Comment Off Alt = %>
                    from my 11.00 default wordfile's L3 block, the HTML block.

                    This, in combination with viewing .ASP files as HTML, causes highlighting to behave more like it should with a multi-language file. :|

                    6,602548
                    Grand MasterGrand Master
                    6,602548

                      Apr 29, 2012#10

                      There is nothing to fix because this is an issue of definition. Multi-line string highlighting is supported since UltraEdit v11.00. It is by default enabled for all syntax highlighting languages. In first line of a syntax highlighting language definition (= first line of a wordfile nowadays) it is possible to use EnableMLS to explicitly enable, or DisableMLS to disable multi-line string highlighting.

                      In C/C++ multi-line strings are possible by ending every line of a multi-line string with backslash (the escape character) and therefore EnableMLS can be used. On the other hand most C/C++ programmers never define a string over multiple lines and therefore DisableMLS is often better.

                      Multi-line strings are very common in HTML files and EnableMLS should be therefore used in html.uew. Standard php.uew contains EnableMLS.

                      It's a decision of the user. Using multi-line string highlighting makes syntax highlighting a little bit slower, especially when entering a string and just beginning double or single quote is typed as this results in highlighting everything up to next double or single quote in file as string until the closing double or single quote is typed, too.

                        May 07, 2012#11

                        For multi-language syntax highlighting support in UE v15.00 and later, except German special edition UE v15.20.1.1000 SE, you have to
                        • specify at Advanced - Configuration - Editor Display - Syntax Highlighting a directory containing the *.uew files each with one language. The default path is %APPDATA%\IDMComp\UltraEdit\wordfiles with %APPDATA% referencing the value of environment variable APPDATA. The directory displayed must be with real full path to the directory. If you want to use the default wordfiles directory, but currently a single wordfile or a different directory is set, simply delete the entire string and close the configuration dialog with button OK. On next opening the configuration the path is set correct to default wordfiles directory.
                        • Next each *.uew file should contain only exactly one syntax highlighting definition. Open all the *.uew files in the configured directory with UltraEdit and verify that.
                        • Verify that there is only one *.uew file which contains language marker HTML_LANG (for HTML)
                        • Verify that there is only one *.uew file which contains language marker CSS_LANG (for CSS)
                        • Verify that there is only one *.uew file which contains language marker JSCRIPT_LANG (for JavaScript)
                        • Verify that there is only one *.uew file which contains language marker PHP_LANG.
                        • Verify that there is only one *.uew file which contains language marker ASP_LANG (for ASP using VBScript).
                        • The file extensions of files containing two or more of above languages must be all listed on end of first line of the wordfile containing HTML_LANG.
                        Other wordfiles can share a language marker string like XML_LANG which can be used in several wordfiles.

                        I don't know what ECMA_LANG is for because I have never seen it in any wordfile. I suppose that UltraEdit interprets this language marker like JSCRIPT_LANG and therefore it should exist also only in one wordfile.

                        Files with a file extension associated directly to a wordfile containing CSS_LANG, JSCRIPT_LANG, PHP_LANG or ASP_LANG should always contain only this language and never any other language embedded as multi-language syntax highlighting is enabled only for files associated with the wordfile containing HTML_LANG. For example if your *.php files contain usually only PHP code and no other language, you have to specify file extension PHP nevertheless in the wordfile containing HTML_LANG if some of your *.php files contain HTML sections. You should specify file extension PHP only in the wordfile containing PHP_LANG if ALL your PHP files contain only PHP code.

                        PS: Multi-language syntax highlighting works in the same way for UltraEdit prior v15.00 and German UE v15.20.1.1000 SE with the difference that a single wordfile instead of a wordfiles directory must be specified and the single wordfile contains up to 20 syntax highlighting languages, but just one language for HTML, one for CSS, one for JavaScript, one for PHP and one for ASP. A single wordfile is also supported by UE v15.00 and later for downwards compatibility after an upgrade.

                        2
                        NewbieNewbie
                        2

                          Feb 05, 2021#12

                          This makes no sense to me. How does UltraEdit determine which second language to use in an HTML file? It's not specified in any docs or discussion.

                          The comments above say UltraEdit sees <% and knows that's ASP.  Well it's not ASP.  It can be any number of languages. My UltraEdit v24.20.0.44 interprets it as VBscript. That's not right either - in my case it's actually Python.

                          No matter what I try, UltraEdit won't recognize <% as anything other than ASP.  I made a new file extension that I put in the file html.uew. I added <% as a language marker to python.uew. I removed <% from vbscript.uew. I restarted UltraEdit after each modification, nothing changed ever on syntax highlighting of embedded Python code.

                          How do I tell UltraEdit that <% in HTML means Python syntax?

                          6,602548
                          Grand MasterGrand Master
                          6,602548

                            Feb 05, 2021#13

                            I recommend to read for the languages supported by UltraEdit for syntax highlighting inside an HTML file:
                            There is no support for other languages as far as I know which can be embedded in an HTML template processed first by an interpreter producing the HTML file not containing anymore the proprietary tags and the code of the template processor.

                            <% as start tag and %> as end tag is used for ASP since December 1996 introduced by Microsoft. The same tags were used by Sun Microsystems on introducing in 1999 JSP. There are a lot of web template engines using nowadays common languages which are used also for other purposes than processing an HTML template file. Many of them use <% as start tag and %> as end tag. Was it a good idea that so many web template engine designers thought using <% and %> as tags is a good idea? I don't think so.

                            UltraEdit cannot know which language is used between the tags <% and %>. It is built-in defined in UltraEdit that the language of the code between these two tags is VBScript ASP or more precise the syntax highlighting language having the language marker keyword ASP_LANG.

                            It is neither possible to extend the multi-language syntax highlighting of UltraEdit for HTML files nor to redefine the mapping of the built-in supported start/end language marking strings to another language somewhere.

                            So an UltraEdit user editing HTML templates containing embedded a language of a web template processor not supported built-in by UltraEdit has following options:
                            1. Use the language selector at bottom in the status bar to overrule the automatic language selection and syntax highlight the entire file with the selected language. That is definitely not the best solution, but it is at least on more or less handy solution.
                            2. Tweak the syntax highlighting wordfiles for getting the code between <% and %> highlighted with a different language, for example by
                              • deleting the wordfile vbscript.uew containing by default the language marker ASP_LANG as never used in the wordfiles directory (default: %APPDATA%\IDMComp\UltraEdit\wordfiles) and edit another wordfile like java.uew with replacing in this wordfile JAVA_LANG by ASP_LANG to get the code syntax highlighted as Java code (more or less), or
                              • copying the wordfile python.uew containing by default the language marker PYTHON_LANG with a new file name like python_html.uew in the wordfiles directory (default: %APPDATA%\IDMComp\UltraEdit\wordfiles), next edit python_html.uew with replacing in this wordfile PYTHON_LANG by ASP_LANG and add the keyword EnableCFByIndent also to first line on which best everything after File Extensions= is removed to avoid conflicts with python.uew and redefine the language name to Python HTML to know which of the two Python syntax highlighting languages is active on looking on status bar, and edit the wordfile vbscript.uew with removing the language marker string ASP_LANG.
                            3. Add color groups to HTML wordfile with the keywords, function strings, etc. suitable for the other language used in the HTML template not automatically detected by UltraEdit at all.
                            It should be clear that none of these solutions is really good. HTML_LANG is a very important language marker keyword as lots of features of UltraEdit are available only for files syntax highlighted with the language having this language marker. PYTHON_LANG is another very important language marker keyword as there are also lots of features added in code of UltraEdit for proper editing Python code. HTML with Python code in one file is really the worst combination for the UltraEdit syntax highlighting because of these two languages have nearly nothing in common and have both very special syntax in comparison to hundreds of other languages.

                            I hope, I could help nevertheless with some ideas and how to get HTML templates with embedded Python code highlighted somehow more or less useful.

                            PS: I have never seen until now an HTML file with embedded Python code. Therefore I have no experience on tweaking syntax highlighting of UltraEdit for such files. The suggestions above are just ideas and not proposals based on tests.
                            Best regards from an UC/UE/UES for Windows user from Austria

                            2
                            NewbieNewbie
                            2

                              Feb 06, 2021#14

                              Thanks Mofi. I've been developing webpages for 20 years, quite familiar with structure of HTML, CSS, JS, etc. What's new to me is using UltraEdit to edit these files. Never used ASP but have worked with PHP, JSP, and other templating systems.

                              You said:
                              UltraEdit cannot know which language is used between the tags <% and %>. It is built-in defined in UltraEdit that the language of the code between these two tags is VBScript ASP or more precise the syntax highlighting language having the language marker keyword ASP_LANG.
                              That's a bit incomplete. UltraEdit cannot know on its own which language is used. I fully expect that. The issue is, I'm trying to tell UltraEdit which language is used in my particular files. And it won't let me.

                              As you said, <% is used for many templating languages. So why in the world would UltraEdit hard code <% as ASP with no way to alter that behavior? That's like programming 101. They have this intricate .uew filesystem for redefining editor behavior, yet they hard code the most basic feature. I'm speechless. Maybe I should just go back to my vim editor.

                              Anyway, I appreciate your proposed solutions. I already tried #1 before my original post (manually changing the syntax selector on Coding menu) and it doesn't work. When I choose any other syntax, the selector snaps back to VBScript immediately.

                              #3 isn't a good solution. It will highlight all text in the HTML file that are also Python keywords, even outside of <% %>.

                              #2 sounds reasonable though. I can just blow away the content in vbscript.uew and replace it with the text in python.uew (except the ASP_LANG line). Results don't have to be perfect, as long as it stops marking the rest of the line as a comment every time I start a string (VB designers really deserve a special place in the abyss for making ' a comment marker).

                              I thought of #2 before posting, but was reluctant as it's an awfully ham-fisted way to do things. Expected something more elegant from UltraEdit than "<% is used for 20 languages but we hard coded it to ASP wheeeee!".

                              FYI there are many Python based templating systems out there. Don't know  overall popularity. But used by some major sites including reddit (mako), YouTube (cheetah), Pinterest and LinkedIn (jinja). I think perhaps Netflix as well but can't recall.

                              Thanks again. Have a nice weekend.

                              6,602548
                              Grand MasterGrand Master
                              6,602548

                                Feb 07, 2021#15

                                ed_6 wrote:
                                Feb 06, 2021
                                So why in the world would UltraEdit hard code <% as ASP with no way to alter that behavior?
                                I am not an employee of IDM Computer Solutions, Inc and therefore cannot answer that question. I can only share my suppositions.

                                Multi-language syntax highlighting for HTML was introduced with UltraEdit for Windows v11.00 released January (or February) 2005. HTML 4.01, CSS 2.1 and ECMAScript Edition 3 were the newest HTML related standards in year 2005. I don't know how many web template engines existed in 2004/2005 and which one was the most popular one. I suppose the most popular was Active Server Pages by Microsoft. UE v11.00 was pretty small in comparison to UE v28.00. The files in program files folder of UE v11.00 require just 10,75 MB on disk space. There was a lot hard-coded which is nowadays more flexible coded as the computers were much slower 15 years ago.

                                In the last 15 years lots of things changed regarding to web design. Static HTML pages became rare nowadays (unfortunately in my opinion as many websites use server or client based active code although not really necessary if the HTML and CSS files would be well written). I found out that there are many server based web templates engines nowadays using multiple languages during the research of the facts before I wrote my previous answer. I was really surprised that there are so many web templates engines nowadays and many using very popular (as very good) Python as interpreter. Further I found out to my surprise that many of the web template engines use also <% %> to mark the start and end of a block to process by the web template engine.  I would not be surprised if product management of UltraEdit at IDM Computer Solutions, Inc is also not aware of those facts like me before 2021-02-05.

                                I take care of the UltraEdit forums since 2004. I can remember only about two requests about multi-language HTML syntax highlighting regarding to Smarty - see Syntax highlighting wordfile for HTML with smarty keywords? and Syntax highlighting wordfile for Smarty files with individual brackets. Smarty uses PHP which is not so problematic as PHP is supported by UltraEdit built-in for multi-language syntax highlighting of HTML files.

                                I agree with you that it would be a good idea nowadays in year 2021 to extend the multi-language syntax highlighting support of UltraEdit for HTML files with a customizable mapping of start/end tag to a syntax highlighting language after knowing all the facts found out two days ago. Hard-coding the start/end tags and the language mapping for performance reasons should not be necessary anymore nowadays.

                                I could imagine to add to syntax highlighting wordfile with the language marker HTML_LANG, which is by default the file html.uew, one or more Language Map definitions with Perl regular expression search strings in a form like:

                                Code: Select all

                                /LMBegin "PHP_LANG"
                                /LMFindStart = "<\?php|<\?"
                                /LMFindEnd = "\?>|\z"
                                /LMFindInCmts = no
                                /LMEnd
                                /LMBegin "PYTHON_LANG"
                                /LMFindStart = "<%"
                                /LMFindEnd = "%>"
                                /LMFindInCmts = no
                                /LMEnd
                                
                                UltraEdit should not only support those language markers currently supported as listed on help page Syntax Highlighting, but also new language marker strings added by a user to syntax highlighting wordfiles for being able to map start/end tags  in the HTML syntax highlighting wordfile to the appropriate language definitions in other wordfiles.

                                /LMFindInCmts = no/yes with yes being the default is for searching for the start/end tags with the Perl regular expression engine also within block and line comments no/yes.

                                Such a customizable language mapping would be useful not only for HTML files. It could be also useful for other languages like C/C++ source files with Doxygen or PC-Lint/FlexeLint comments to highlight the Doxygen/PC-Lint keywords within the C/C++ block or line comments.

                                What do you (ed_6  and others) think about my idea for a customizable language mapping? Should I request that as an enhancement with an email to IDM support?
                                Best regards from an UC/UE/UES for Windows user from Austria