Script for generating Markdown TOC

Script for generating Markdown TOC

3
NewbieNewbie
3

    May 12, 2020#1

    Dear UE users,

    I just discovered UE and I am really surprised I have not seen it around before - really loving it so far.

    I have a question. I work with Markdown files a lot and need to generate TOCs for the files that I work on. I am wondering if there is a script for this as I have not been able to find one.

    Regards,

    Igor

    6,686585
    Grand MasterGrand Master
    6,686585

      May 13, 2020#2

      There is none written and posted in UltraEdit scripts forum as far as I know. You might find a JavaScript script on other websites which creates a table of contents for markdown headings which could be most likely easily adapted for usage with UltraEdit.

      It would be no problem for me to write one without having an existing JavaScript script example for this task. Please post the example contents of markdown file which best has already also a valid table of contents for the headings in file so that I can see what the script should produce on no TOC in file at all and where to insert it respectively how to find and update an existing TOC. I know Markdown syntax very well and so you don't have to describe the syntax.

      Well, it will be a lot of work to determine if === or ---- or #, ##, ###, etc are really marking a heading or not. So please let me also know which heading style you use preferred to write just the code necessary for you and if the script must take also code blocks and quotes into account on which for example ---- needs to be ignored (like on using ---- for a horizontal rule). The markdown parsing is unfortunately not trivial, i.e. simple finds cannot be used, when the script has to really support Markdown syntax completely to identify really correct in any case the headings in current file. Coding an UltraEdit script to generate/update a table of contents in a Markdown file from scratch would take most likely many hours if the script would need to fully support Markdown syntax with all special cases on which the character sequences marking usually a heading must be interpreted as literal character sequences.

      Wait, I think it would be easy to first remove all code and quote blocks as well as all horizontal rules to avoid false positives and just parse the remaining text for headings.
      Best regards from an UC/UE/UES for Windows user from Austria

      3
      NewbieNewbie
      3

        May 14, 2020#3

        Hi, Mofi :) 

        Thank you so much for your reply.

        At the moment, all I am really looking for is a simple script that basically transforms as follows and has an update option (ie when I change the file and run the script again, it can incorporate the changes):

        Code: Select all

        <!-- toc -->
        
        - [Level 1 Heading](#level-1-heading)
        - [Level 2 Heading First](#level-2-heading-first)
        - [Level 2 Heading Second](#level-2-heading-second)
        - [Level 3 Heading First](#level-3-heading-first)
        - [Level 3 Heading Second](#level-3-heading-second)
        - [Level 3 Heading Third](#level-3-heading-third)
        - [Level 2 Heading Third](#level-2-heading-third)
        
        <!-- /toc -->
        
        
        # Level 1 Heading 
        
        Some Text
        
        ## Level 2 Heading First
        
        Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ante ipsum, volutpat quis massa sit amet, ullamcorper varius mauris. Mauris et vestibulum risus, et dictum erat. Nullam eget sapien sem. Maecenas varius eleifend vehicula. Fusce viverra vestibulum tellus sit amet molestie. Suspendisse maximus faucibus velit sed feugiat. Morbi non turpis molestie, facilisis purus sit amet, malesuada massa. 
        
        ## Level 2 Heading Second
        
        Donec luctus, massa non posuere tempor, dolor nulla dapibus purus, hendrerit tristique odio lorem a libero. Cras accumsan dolor vitae turpis viverra, nec venenatis est pulvinar. Ut quis porta tortor, sed finibus odio. Proin scelerisque purus est, ut aliquet felis porttitor nec. Pellentesque sed odio semper, pretium leo dignissim, convallis velit. Nullam dictum leo ac vulputate dapibus.
        
        ### Level 3 Heading First
        
        Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ante ipsum, volutpat quis massa sit amet, ullamcorper varius mauris. Mauris et vestibulum risus, et dictum erat. Nullam eget sapien sem. Maecenas varius eleifend vehicula. Fusce viverra vestibulum tellus sit amet molestie. 
        
        ### Level 3 Heading Second
        
        Suspendisse maximus faucibus velit sed feugiat. Morbi non turpis molestie, facilisis purus sit amet, malesuada massa. Donec luctus, massa non posuere tempor, dolor nulla dapibus purus, hendrerit tristique odio lorem a libero. Cras accumsan dolor vitae turpis viverra, nec venenatis est pulvinar. 
        
        ### Level 3 Heading Third
        
        Ut quis porta tortor, sed finibus odio. Proin scelerisque purus est, ut aliquet felis porttitor nec. Pellentesque sed odio semper, pretium leo dignissim, convallis velit. Nullam dictum leo ac vulputate dapibus. 
        
        ## Level 2 Heading Third
        
        Donec luctus, massa non posuere tempor, dolor nulla dapibus purus, hendrerit tristique odio lorem a libero. Cras accumsan dolor vitae turpis viverra, nec venenatis est pulvinar. Ut quis porta tortor, sed finibus odio. Proin scelerisque purus est, ut aliquet felis porttitor nec. Pellentesque sed odio semper, pretium leo dignissim, convallis velit. Nullam dictum leo ac vulputate dapibus.
        My writing is fairly straight forward - all headings are marked with # and this can go into many levels (ie ##, ###, #### etc) and are always on the line of their own. There can be a heading with the same name.

        I had looked into finding and adapting a JS script and they are all node.js dependent. I am going to have to resurrect my own JS knowledge to write the script, it looks like - would not want you to spend hours of your time on this. Will also have a look at the scripts documentation to see how the scripts work in the UE.

        Thank you for looking into this.

        6,686585
        Grand MasterGrand Master
        6,686585

          May 24, 2020#4

          Here is an UltraEdit script to create or update a table of contents for a Markdown file according to your specifications.

          Code: Select all

          if (UltraEdit.document.length > 0) // Is any file opened?
          {
              // There must be at least one string in this array.
              // The strings can be all just "-" for a flat list.
              // See below the comment about #{1,7} on how to reduce the number of
              // heading levels for being taken into account for table of contents.
              var asTocLevels = [
                  "-",                        // Heading 1 in TOC
                  "    +",                    // Heading 2 in TOC
                  "        *",                // Heading 3 in TOC
                  "            -",            // Heading 4 in TOC
                  "                +",        // Heading 5 in TOC
                  "                    *",    // Heading 6 in TOC
                  "                        -" // Heading 7 in TOC
              ];
          
              var sTocBlockStart = "<!-- toc -->";
              var sTocBlockEnd   = "<!-- /toc -->";
          
             // Define environment for this script.
             UltraEdit.insertMode();
             if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
             else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();
          
             // Get initial caret position on running this script.
             var nLineNumber = UltraEdit.activeDocument.currentLineNum;
             var nColumnNumber = UltraEdit.activeDocument.currentColumnNum;
          
             // Load entire file contents into memory of JavaScript core engine.
             UltraEdit.activeDocument.selectAll();
             if (UltraEdit.activeDocument.isSel())
             {
                var sText = UltraEdit.activeDocument.selection;
          
                // Remove all blocks with preformatted text which can contain
                // lines starting with one or more hash characters like headings.
                sText = sText.replace(/(^|\r?\n)[\t ]*```[\s\S]+?```[\t ]*(?:\r?\n|$)/g,"$1");
                sText = sText.replace(/[\t ]*<pre\b[\s\S]+?<\/pre>[\t ]*/g,"");
                sText = sText.replace(/[\t ]*<code\b[\s\S]+?<\/code>[\t ]*/g,"");
          
                // Remove all blocks commented out with an HTML comment.
                sText = sText.replace(/<!--[\s\S]+?-->/g,"");
          
                // Determine line termination type (DOS/Windows or UNIX).
                var sLineTerm = (UltraEdit.activeDocument.lineTerminator != 1) ? "\r\n" : "\n";
                var sTocList = "";
                var nHeadings = 0;
          
                // Search for headings in copy them into an array of headings. #{1,7}
                // in search expression results in searching for headings level 1 to 7.
                // The heading levels for table of contents can be reduced by reducing
                // the second number like using #{1,3} for just heading level 1 to 3.
                var asHeadings = sText.match(/(^|\r?\n) {0,3}(?:#{1,7}[\t ]+[^\r\n]+|[^\r\n\t *+\->][^\r\n]*\r?\n {0,3}[=-])/g);
          
                // Is there at least one heading found?
                if (asHeadings)
                {
                   var nLowestLevel = asTocLevels.length;
                   nHeadings = asHeadings.length;
                   // Reformat all found headings to entries in table of contents.
                   for (var nHeading = 0; nHeading < nHeadings; nHeading++)
                   {
                      // Remove all carriage returns, line-feeds, horizontal tabs
                      // and normal spaces from the beginning of each found heading.
                      var sHeading = asHeadings[nHeading].replace(/^[\r\n\t ]+/,"");
          
                      // Convert heading 1 defined with one or more equal signs on
                      // next line to heading 1 defined with one hash at beginning.
                      sHeading = sHeading.replace(/(.+)\r\n {0,3}=$/,"# $1");
          
                      // Convert heading 2 defined with one or more hyphens on next
                      // line to heading 2 defined with two hashes at beginning.
                      sHeading = sHeading.replace(/(.+)\r\n {0,3}-$/,"## $1");
          
                      // Replace all tabs and spaces after the hash(es) by one space.
                      sHeading = sHeading.replace(/^(#+)[\t ]+/,"$1 ");
          
                      // Remove all carriage returns, line-feeds, horizontal tabs
                      // and normal spaces from the end of each found heading.
                      sHeading = sHeading.replace(/[\r\n\t ]+$/,"");
          
                      // Convert heading into an entry in table of contents according to level.
                      var sLevel = " ";
                      for (var nLevel = 0; nLevel < asTocLevels.length; nLevel++)
                      {
                         sLevel = "#" + sLevel;
                         if (sHeading.substr(0,sLevel.length) == sLevel)
                         {
                            if (nLevel < nLowestLevel) nLowestLevel = nLevel;
                            asHeadings[nHeading] = asTocLevels[nLevel] + sHeading.substr(nLevel+1);
                            break;
                         }
                      }
          
                      // Is heading level higher than highest level in array asTocLevels?
                      if (nLevel == asTocLevels.length)
                      {
                         // Use the highest level defined in array asTocLevels for this heading.
                         asHeadings[nHeading] = sHeading.replace(/^#+/,asTocLevels[asTocLevels.length-1]);
                      }
                   }
          
                   // Is the lowest heading level not 1 because of file contains,
                   // for example, just headings 2 and higher and there is more
                   // than one listing level defined for table of contents?
                   if (nLowestLevel && (asTocLevels.length > 1))
                   {
                      // Does the second TOC listing level and most likely all others
                      // start with an indenting space for a hierarchical list?
                      if (asTocLevels[1].charAt(0) == ' ')
                      {
                         // Remove the leading spaces according to lowest level as
                         // otherwise the table of contents would be formatted as
                         // preformatted text which is not wanted for the TOC.
                         var nSpaceCount = nLowestLevel * 4;
                         for (nHeading = 0; nHeading < nHeadings; nHeading++)
                         {
                            asHeadings[nHeading] = asHeadings[nHeading].substr(nSpaceCount);
                         }
                      }
                   }
                   // Join all entries in table of contents together to one string.
                   sTocList = asHeadings.join(sLineTerm) + sLineTerm;
                }
          
                // Defined the parameters for searching in active file for TOC block.
                UltraEdit.perlReOn();
                UltraEdit.activeDocument.findReplace.mode=0;
                UltraEdit.activeDocument.findReplace.matchCase=false;
                UltraEdit.activeDocument.findReplace.matchWord=false;
                UltraEdit.activeDocument.findReplace.regExp=true;
                UltraEdit.activeDocument.findReplace.searchDown=true;
                UltraEdit.activeDocument.findReplace.searchInColumn=false;
                UltraEdit.activeDocument.top();
          
                // Does the active file contain the table of contents block?
                if (UltraEdit.activeDocument.findReplace.find(sTocBlockStart + "[\\s\\S]+?" + sTocBlockEnd))
                {
                   UltraEdit.activeDocument.findReplace.mode=1;
                   // Are there one or more lines of table of contents?
                   if (UltraEdit.activeDocument.findReplace.find("^(?:[^\\r\\n<].+\\r?\\n)+"))
                   {
                      // Is existing TOC not identical to newly generated TOC?
                      if (UltraEdit.activeDocument.selection != sTocList)
                      {
                         // Get number of lines in existing table of contents.
                         var nTocLineCount = UltraEdit.activeDocument.selection.match(/\r?\n/g).length;
                         // Delete the existing table of contents.
                         UltraEdit.activeDocument.deleteText();
                         // Get line number of first line in table of contents.
                         var nTocFirstLine = UltraEdit.activeDocument.currentLineNum;
                         // Write new table of contents into the file.
                         UltraEdit.activeDocument.write(sTocList);
          
                         // Was the initial caret position below table of contents?
                         if (nLineNumber >= (nTocFirstLine + nTocLineCount))
                         {
                            // Update line number according to number of
                            // added or removed lines in table of contents.
                            nLineNumber += nHeadings - nTocLineCount;
                         }
                         // Was the initial caret position inside table of contents?
                         else if (nLineNumber >= nTocFirstLine)
                         {
                            // Set caret to beginning of first line of the TOC.
                            nLineNumber = nTocFirstLine;
                            nColumnNumber = 1;
                         }
                      }
                   }
                   else if (sTocList.length)
                   {
                      // The table of contents block exist, but there are
                      // no lines to update, just lines to insert here.
                      var nTocLineCount = 0;
                      // Get number of lines in existing table of contents block.
                      var asLineEndings = UltraEdit.activeDocument.selection.match(/\r?\n/g);
                      if (asLineEndings) nTocLineCount = asLineEndings.length;
                      // Delete the entire existing table of contents block.
                      UltraEdit.activeDocument.deleteText();
                      // Get line number of first line of table of contents block.
                      var nTocFirstLine = UltraEdit.activeDocument.currentLineNum;
                      // Write the entire table of contents block into the file.
                      UltraEdit.activeDocument.write(sTocBlockStart+sLineTerm+sLineTerm+
                                                     sTocList+sLineTerm+sTocBlockEnd);
          
                      // Was the initial caret position below table of contents block?
                      if (nLineNumber >= (nTocFirstLine + nTocLineCount))
                      {
                         // Update line number according to number of added lines.
                         nLineNumber += (nHeadings + 3) - nTocLineCount;
                      }
                      else if (nLineNumber >= nTocFirstLine)
                      {
                         // Set caret to beginning of first line of the TOC.
                         nLineNumber = nTocFirstLine + 2;
                         nColumnNumber = 1;
                      }
                   }
                }
                else if (sTocList.length)
                {
                   // The table of contents block is completely missing. Insert
                   // it at top of the active file and update initial line number.
                   UltraEdit.activeDocument.write(sTocBlockStart+sLineTerm+sLineTerm+
                                          sTocList+sLineTerm+sTocBlockEnd+sLineTerm);
                   nLineNumber += nHeadings + 4;
                }
          
                // Move caret to initial position before updating or inserting the
                // table of contents respectively set the caret to beginning of
                // the first line of the table of contents.
                UltraEdit.activeDocument.gotoLine(nLineNumber,nColumnNumber);
             }
          }
          
          There can be several customizations applied easily without having JavaScript coding skills as explained at top of the script file.

          The script should find most variations of Markdown formatted headings. It does not support all variations. For example, the script does intentionally not support headings like:

          Code: Select all

          **Bold formatted phrase** on heading level 1
          ===
          
          *Italic formatted phrase* on heading level 2
          ---
          
          HTML headings as defined with <h1, <h2, ... are not supported by the script at the moment. That support could be added with modifying the regular expression string searching for headings in file contents and with one or more sHeading.replace(...) lines to reformat HTML headings first to Markdown syntax and next to table of contents formatting.

          Please let me know if the script interprets lines as headings which the used Markdown library does not interpret as headings. There is unfortunately not a common standard for Markdown syntax. Various websites use their own extensions or restrictions to Markdown syntax.

          Please let me know also about headings ignored by the script on being regularly used in your Markdown formatted files and being valid according to syntax as documented by the Markdown Guide.
          Best regards from an UC/UE/UES for Windows user from Austria

          3
          NewbieNewbie
          3

            May 26, 2020#5

            Hi Mofi,

            Thank you so much for your effort. The script works well.