Function strings for PHP files working also with default values

Function strings for PHP files working also with default values

6,602548
Grand MasterGrand Master
6,602548

    Apr 22, 2012#1

    You have downloaded currently newest user-submitted wordfile from the server of IDM which is from June 2009 according to file date. The wordfiles on the wordfiles download page are all submitted by users, sent by the users to IDM by email and not checked in any way by IDM before uploading them to their server. Every user is encouraged to enhance a wordfile, fix mistakes and sent the updated wordfile to IDM by email for replacing the wordfile on the server.

    The only wordfiles managed by IDM are those installed with UE/UES. And also this wordfiles are mainly enhanced by suggestions of users sent to IDM support by email.

    I had attached to the post a ZIP file with the wordfile php.uew installed with UE v18.00.0.1034 if you are not using currently latest version of UE. Please note that the function strings in this wordfile work only with UE v16.00 as they are defined for the hierarchical function list view not supported by previous versions of UltraEdit.

      Apr 24, 2012#2

      Hi to all PHP programmers.

      I'm not using php.uew because I'm not coding in PHP, but I looked into php.uew because of the user question above. The 4 regular expressions for functions are not really well defined in my point of view. Let's take a look on

      Code: Select all

      /TGBegin "Function"
      /TGFindStr = "%[^t ]++function[^t ]+^([a-ÿ0-9_&]+^)"
      /TGFindStr = "%[^t ]++static[^t ]+function[^t ]+^([a-ÿ0-9_&]+^)"
      /TGFindStr = "%[^t ]++function[^t ]+^([~{]+^)"
      /TGFindStr = "%[^t ]++[publicrotecdvas ]++[^t ]++function[^t ]+^([a-ÿ0-9_&]+^)"
      /TGEnd
      respectively

      Code: Select all

      /Function String = "%[^t ]++function[^t ]+^([a-ÿ0-9_&]+^)"
      /Function String 1 = "%[^t ]++static[^t ]+function[^t ]+^([a-ÿ0-9_&]+^)"
      /Function String 2 = "%[^t ]++function[^t ]+^([~{]+^)"
      /Function String 3 = "%[^t ]++[publicrotecdvas ]++[^t ]++function[^t ]+^([a-ÿ0-9_&]+^)"
      The first regular expression matches (not completely) lines starting
      • with or without preceding spaces/tabs
      • with keyword function as next string
      • with at least 1 space or tab
      • and the function name consisting of characters &, 0-9, A-Z (because of not case sensitive search), _ and a-ÿ.
      Only the function name is tagged for being displayed in function list view. I suppose that after a function name an opening round bracket follows with possible spaces/tabs between function name and round bracket character. The space or ( are therefore the function name delimiters on right side.


      The second regular expression matches nearly the same as first one. The difference is that the keyword static must exist left to function.


      The third regular expression is now very interesting. Like the first one it matches lines starting with keyword function. But instead of limiting the character set for the name of a function, it matches everything up to an opening brace including the parameters of a function with or without initialization, comments on same line as the function and line breaks. In my point of view it matches too much. But I'm not a PHP programmer and therefore don't know if this expression makes sense with such a wide character set (= all characters except { ).

      Independent on how useful the third expression is for PHP files with this regular expression, it matches always the same functions as first regular expression. And that does not make sense. Either the third regular expression should be removed, or the first one is removed, or first and third regular expression are replaced by a different single regular expression.

      In most programming languages it is exactly defined which characters the name of a function can consist of and most often which characters are allowed for the first character. PHP programmers should know this syntax definition and therefore it should be quite easy to define the regular expression which matches only the function name and nothing else. If somebody wants to see also the arguments respectively parameters of a function, they should be matched by an extra character set expression.


      The fourth regular expression is also interesting. In second [...] there is character 'c' twice present. Second occurrence of character 'c' should be removed. But remaining

      [^t ]++[publicrotedvas ]++[^t ]++

      is still interesting. Because the space character is in all three square bracket expressions, it should be no problem to combine these 3 character sets and just use

      [publicrotedvas^t ]++

      And because all characters of keyword static are also in this character set, the fourth regular expression matches the same as second regular expression and therefore the second regular expression can be removed.

      Finally because of ++ instead of just + after the character set with the letters, the fourth regular expression matches also what first regular expression matches and therefore the first regular expression is not really needed.


      Again, I'm not a PHP programmer, but the 4 regular expressions as used at the moment in php.uew could be perhaps easily replaced by a single expression. I offer to help if a PHP programmer expert can tell me of which characters a function name consist of or point me to the page in the PHP specification explaining it.

      Perhaps with knowing which keywords can exist left to keyword function it would be better to define instead of just 1 regular expression with [publicrotedvas^t ]++ multiple expressions matching only really possible keywords and keyword combinations left keyword function to reduce number of wrong function detections to 0 as far as possible.

      The regular expressions for finding PHP functions should be based on definite syntax rules and not on regular expressions found by trial and error as it looks like was done here.

      Please note that I have not executed any of the 4 regular expressions on a PHP file because I don't have one. So what I wrote above is what I think happens according to the expressions.

      2362
      MasterMaster
      2362

        Apr 24, 2012#3

        PHP Manual wrote:A valid function name starts with a letter or underscore, followed by any number of letters, numbers, or underscores. As a regular expression, it would be expressed thus: [a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*.
        As we have a regular expression from PHP.net that defines all possible function names, then it is easy to make one function definition string, without a problem.

        The official definition means the current definition in your example is incorrect, since it allows a "number" to be the start of a function name, which is not allowed.

        Of course, we still need to provide for

        Code: Select all

        function
        public static function
        static function
        public function
        protected function
        private function
        before the actual function name definition, which the 4th definition in your example seems to do.

        I could not help but notice also the Require and Include definitions in the PHP file are in error. However, include and require are not supposed to be in the function list, so I have removed those from my file, as it doesn't belong there.

        Here is what I have in my wordfile definition for Function definition:

        Code: Select all

        /TGBegin "Function"
        /TGFindStr = "%[^t ]++[publicrotecdvas ]++[^t ]++function[^t ]+^([a-ÿ0-9_&]+^)"
        /TGBegin "Parameter"
        /TGFindStr = "[ ^t^p]++^([~,]+^)"
        /TGFindBStart = "("
        /TGFindBEnd = ")"
        /TGBegin "Variable"
        /TGFindStr = "%[ ^t]++^(^$[a-z_0-9]+^)[ ^t]++=*;"
        /TGFindBStart = "{"
        /TGFindBEnd = "}"
        /TGEnd
        /TGEnd
        /TGEnd
        Mofi, I welcome any corrections you see need to be made. I am using UEStudio 12, and have enabled Use Function Tips Data for Function List. When I disable this, the function list is populated with variables and parameters, which is undesirable. Mofi, can you help me fix this? I don't have a definition on hand on how to use the new /TG keywords properly. I'd like to read over something on that.

        6,602548
        Grand MasterGrand Master
        6,602548

          Apr 26, 2012#4

          Many thanks darkdragon-001 for this EXCELLENT post describing the syntax rules for PHP functions.

          Regarding 1.)

          I think it is very uncommon that a function is defined on same line as end of a block or another statement. Although the PHP parser allows this because of eliminating whitespace characters outside of strings like most interpreters and compilers do during parsing, I'm quite sure that a programmer usually starts a function on a new line. In C/C++ it would be possible to write a whole program with several functions in one very long line, but that is not readable for any programmer and therefore not done most of the time.

          We have to take into account that usually functions within a line comment should not be listed in the function list. Also functions on same line as begin of a block comment should be ignored. Examples:

          Code: Select all

          // function My_PHP_Function_01() {}
          
          /* function My_PHP_Function_02() {} */
          What can't be excluded are functions within a block comment where the block comment starts in one of the lines above like in this example.

          Code: Select all

          /* Block comment starts here
          function My_PHP_Function_03() {}
          Block comment ends here. */
          UltraEdit is not a PHP parser and therefore does not evaluate in which context a string is found with the defined regular expressions.

          In first [...] expressions below it would be possible to include ;}{
          But I think for a common php.uew this should not be done.


          Regarding 2.) to 5.)

          The easiest method to ignore all these keywords would be to simple start the regular expression with the keyword function. But that would result in finding also functions within comments as first example above shows. Also the word function could be problematic. So it is best to define which characters or words can be between start of a line and keyword function.

          I think nobody uses comments between the start of a function definition line and the round bracket after function name. This would be very hard to read which is the opposite effect usually wanted with comments - make something clear. So we can ignore comments left to opening round bracket.

          Now we have to make our first decision. Should we define simply a character set containing all the characters of the possible keywords or should we define multiple regular expressions with entire words?

          The UltraEdit regular expression is very limited regarding using an OR expression. Versions of UltraEdit not supporting hierarchical function list are limited to a maximum of 6 function string definitions. That would be a reason to use a simple character set. On the other hand with Perl regular expression engine it would be no problem to define a regular expression which allows those 6 keywords optionally in any combination.

          I had attached a small PHP file which just function definitions which I had used to test the following expressions.

          The 6 keywords which can be left to keyword function can be summarized with the expression [abcdefilnoprstuv] or even shorter [a-filnopr-v]. Following UltraEdit regular expressions could be used for finding the My_PHP_Functions 1 to 21 and display just the name of the functions in function list view.

          /Function String = "%[a-filnopr-v ^t]++function[^t^p ]+^([a-z_-ÿ][a-z0-9_-ÿ]++^)[^t^p ]++("

          But this regular expression produces wrong result for most common function definition

          Code: Select all

          function My_PHP_Function_01()
          {
          }
          as this is displayed in the function list view as y_PHP_Function_01(

          This is caused by the fact that all letters of keyword function are also matched by the character set for the preceding keywords. If final would be ignored, the function string definition could be

          /Function String = "%[^t a-eilopr-v]++function[^t^p ]+^([a-z_-ÿ][a-z0-9_-ÿ]++^)[^t^p ]++("

          This expression matches also My_PHP_Function_01 perfect, but all function definitions containing keyword final are now missing. I don't know how common is the usage of keyword final, so I try to find an expression taking also this keyword into account.

          Therefore it is better to use 2 regular expressions, one for functions without a preceding keyword and one more for those with keywords. If we list the possible preceding keywords as follows

          abstract
          final
          private
          protected
          public
          static


          we can see that first 6 characters of first keyword can have only a very limited list of characters.

          /Function String = "%[^t ]++function[^t^p ]+^([a-z_-ÿ][a-z0-9_-ÿ]++^)[^t^p ]++("
          /Function String 1 = "%[^t ]++[afps][birtu][abinos][altv][aeilr][^t act][a-filnopr-v ^t]++function[^t^p ]+^([a-z_-ÿ][a-z0-9_-ÿ]++^)[^t^p ]++("


          Using these two expressions results in a perfect list for functions in file test.php5.


          Now lets look on usage of Perl regular expression for the function string definition. The following case sensitive Perl regexp would work for the functions in test.php5.

          /Regexp Type = Perl
          /Function String = "(?-i)^[\t ]*(?:(?:abstract|final|private|protected|public|static)[\t ]*){0,3}function\s+([A-Za-z_\x7F-\xFF][0-9A-Za-z_\x7F-\xFF]*)\s*\("


          But we have to take into account that Perl regular expressions for function strings are not supported by UltraEdit prior v13.10 and UES prior v6.30. Also some versions of UE/UES have problems with Perl regular expression function strings on Unicode files. PHP is often used in HTML files which are often encoded in UTF-8. Therefore many users of older versions of UE/UES could be not happy with the Perl regular expression function string definition.

          I stop here for first approach. Now it would be good if you are testing the UltraEdit function string set as well as the Perl function string on your PHP files. If there are problems like functions not found or lines misinterpreted as function definition lines, please add them to test.php5 and attach this file packed with ZIP to your reply.

          The next step would be to take the arguments into account for those users of UE/UES not using a hierarchical function list view because not supported by used version of UE/UES. On the other hand for those users of UE/UES already using a hierarchical function list view we have to take a look on the function parameters expression string. But one step after the other. First check the expressions as posted here for just find and display the function names.

          2362
          MasterMaster
          2362

            Apr 26, 2012#5

            It should be noted that if you comment out an entire function (as when testing) using block comments and not line comments, then the function will still be found. This also should be rare or a non-issue.

            I have a large number of PHP files, with many thousands of lines and hundreds of functions. I have not yet found functions that are not being displayed properly, nor am I having problems with functions displaying that should not.

            I've already updated my php.uew file with what you've provided.

            Good work on the function string so far. I still need the function parameters expression string.

            6,602548
            Grand MasterGrand Master
            6,602548

              Apr 27, 2012#6

              @rhapdog
              IDM introduced with UE v16.00 and UES v10.00 the hierarchical function list with unlimited regular expressions. With such a version the function strings can be configured directly in UE/UES using a dialog. It is not necessary to open the wordfile and edit the regular expressions there. Just open a PHP file, move caret into a PHP section so that the wordfile for PHP becomes active, open the function list view if not already visible, right click into the view and click on context menu item Configuration.

              In the dialog the regular expressions can be configured in a tree. Multiple top level groups can be defined. In standard php.uew there are the top level groups Require, Include, GlobalVariable and Function. As you can see the function list view can provide more informations than just the functions. The top level group Function contains a subgroup definition named Parameter listing the parameters of a function and a second one named Variable for local variables of a function. A subgroup can contain another subgroup definition to build up a real tree. So it would be possible to have a group for classes on top level, a function subgroup within the class group and the 2 subgroups for function parameters and local variables within the function subgroup.

              The unlimited regular expressions of top level groups are always executed on entire file. The unlimited regular expressions of subgroups are always executed only on part of a file. The regular expressions of top level groups are usually UltraEdit regular expressions, but can be also Perl regular expressions if manually adding /Regexp Type = Perl to the wordfile. Unfortunately the regular expressions of subgroups are always executed with the UltraEdit regular expression engine. I have reported this issue to IDM support by email long time ago and I have not checked since this report if the IDM developers have changed something regarding this issue.

              On which part of a file the UltraEdit regular expressions of subgroup are executed depend on Open tag and Close tag. Both tags are interpreted also as UltraEdit regular expressions. For most programming languages that does not matter because the opening and closing tags are usually simple characters like { and }.

              The search for the opening and closing tag starts in the file at the position of first character of the entire string found by the top level regular expression. So this position is usually the start of the function definition line.

              The end position for the search for the tags is unfortunately not the start of the next function definition line, but end of file. I think the reason is caused by the fact that UE/UES is not first running the regular expressions of top level groups to build up internally a database with the information block 1 from line a to line b belongs to group 1, block 2 from line c to line d belongs group 2, block 3 from line e to line f belongs group 1, ... That would be good for files where a block ends where next block starts like INI files. Instead when a string is found with a top level regular expression, the search branches to the subgroups and therefore the searches for the opening and closing tags are executed from start of the found string to end of file because not knowing where is the next top level string found by a top level regular expression search.

              However, because common programming languages for which the hierarchical function list view is mainly designed usually have defined a block beginning and block ending tag, the hierarchical search work for most programming languages, when taking nesting of opening and closing tags into account. And that's what UE/UES does by searching for the closing tag really matching the opening tag. (But tags found in strings and comments are not ignored!)

              What I have found also is that UE/UES often does not like it if the expressions for open / close tag or the other regular expressions of a subgroup contain % to limit the search to start of a line. Sometimes there are no problems with %, but sometimes using % in subgroup regular expressions causes a wrong behavior of UE/UES, mainly the line number on which the found string exists is wrongly remembered. I have reported also this issue with some examples, but have not checked if that was fixed in the meantime because those definitions were made for others and not for myself and I always found an acceptable workaround.

              However, if a hierarchical function list is not wanted, right clicking into function list view and left clicking on context menu item Flat List changes the function list view to flat list as before UE v16.00 and UES v10.00.

              If some groups are not wanted like local variables, they can be simply removed using the configuration dialog.

              I wanted in the function list view always only the function names without the parameters. If a flat function list is preferred with showing also the function parameters, the regular expressions for finding the functions must be enhanced by adding expressions to continue selecting characters up to closing ) of the function and the ^) in the UltraEdit regular expressions respectively ) in the Perl regular expression must be moved to right.

              There are several problems with parameters.

              Strings can contain all characters including those having a special meaning outside a string like ), { or /. It is an unresolvable problem if a function definition line contains a string default value containing such characters as a regular expression search is not working like a compiler or interpreter taking the context into account.

              Another common problem is putting each parameter on a separate line with a line or block comment on right side of the parameter definition. The regular expression does not know that characters on the line are out of scope for the compiler / interpreter and therefore should be ignored. With the hierarchical function list view it is sometimes possible to find expressions ignore comments within the argument list of a function, but for a flat list this is not possible. To avoid such problems I comment function parameters always either above the function definition lines or in case of public functions in the *.h file as those files are not searched for function strings.

              Weekend is coming and I will surely find time to look on what can be done for arguments lists respectively parameters of PHP functions.

              2362
              MasterMaster
              2362

                Apr 28, 2012#7

                Mofi wrote: @rhapdog
                IDM introduced with UE v16.00 and UES v10.00 the hierarchical function list with unlimited regular expressions. With such a version the function strings can be configured directly in UE/UES using a dialog. It is not necessary to open the wordfile and edit the regular expressions.....
                I know that. I believe I've received a full tutorial on how to do something I already know how to do. What I was needing was the actual expressions to put there in the dialog. I'm having trouble getting it right.

                6,602548
                Grand MasterGrand Master
                6,602548

                  Apr 29, 2012#8

                  About functions within a block comment:

                  As I wrote, code in comments are scanned by the regular expressions as well. It would simply make the function string search done in background slower if first the entire file must be scanned for block and line comments and then the remaining parts are scanned by the regular expressions. In other words the users have to wait longer before the functions are displayed in function list view. Do we want that just to not see commented functions?

                  My C/C++ files contain many, many comments, but no commented functions. I do not want to wait longer to see the functions after opening a file or switching a file just because UltraEdit first runs a comment recognition scan. For my smaller projects I have also often enabled showing functions of all project files. Showing this list would take even longer with first scanning for comments.

                  Also UltraEdit is a general text editor not only used for programming languages. Others like me too use it for other text files and have created also syntax highlighting languages with function strings for those files. Some of those files can be very large (several hundred MB or even GB). Scanning first those file for comments (which also exist in those files) and then run the regular expressions on the remaining text is a challenge for such large files and would result in a dramatic decrease in speed.

                  I know for C/C++ compilers that they remove in preprocessor step all comments before really interpret the code. But compilers usually do not have to deal with files of several 100 MB as UltraEdit has to do (not for source files, but for other file types). And nesting of block comments is also often possible making it quite more difficult (= more time consuming) to find comments and remove them before scanning for function names (or for whatever the function strings are defined). Sure, UltraEdit finds already comments as they are highlighted as block and line comments and therefore it would be possible to run the regular expression searches only on remaining code with some enhancements.

                  But what if a user wants in the function list also a function within a block comment because the function is just temporarily commented?

                  In my point of view functions which are not used should be commented out just temporarily, but not permanently. If an entire function is of no use (anymore), it should be deleted completely from the file and not just commented. For C/C++ programmers writing secure code this is highly recommend according to MISRA C and MISRA C++. Not used functions can be stored in a separate file not being part of a project as good as in a project file.

                  About parameters of PHP functions:

                  First, thanks for adding parameters to test file test.php5. It has made my work really much easier.

                  As a regular expression search is not working like a compiler or interpreter taking comments, strings or hierarchical information into account (= the context of text), function definitions with string parameters with string values containing 1 or more commas, parentheses, braces or other characters with special meaning outside a string are a huge problem. It is perhaps impossible to define regular expressions taking such string values into account. I have not even tried to find one.

                  In many programming languages it is not allowed to initialize a function parameter in the function definition line. In C it is not possible at all. In C++ it is possible to define functions with default values, but the default values can be only specified on function declaration, not on function definition. It looks like PHP is different because it is possible to define default values for function parameters within the function definition. Perhaps good for PHP programmers, but bad for correct detection of function parameters with regular expression searches.

                  Also arrays within arguments list of a function defined by matching ( ... ) are a problem because of the parentheses and the additional commas. I don't know how often this occurs in PHP files. If this could be avoided by defining such functions different (= using a different coding style), it should be done different.


                  We have to take 2 different use cases into account.

                  The first one is usage of a hierarchical function list view with displaying the parameters of a function in a subgroup below the function name.

                  The advantage here is that UltraEdit is searching in this case first for the opening tag ( and next for matching closing tag ). Therefore a definition of 1 or more arrays with parentheses within the function definition is no problem. Also string values containing both parentheses are no problem. Just a string value containing only ) is a problem as this is being interpreted as matching closing tag. But I'm quite sure that a string value within a function definition containing only ) are very, very rare in PHP files. So it does not matter that the search for finding closing tag returns in this special case the wrong result.

                  Using the following function definitions results in a quite good hierarchical function list.

                  Code: Select all

                  /TGBegin "Function"
                  /TGFindStr = "%[^t ]++function[^t^p ]+^([a-z_-ÿ][a-z0-9_-ÿ]++^)[^t^p ]++("
                  /TGFindStr = "%[^t ]++[afps][birtu][abinos][altv][aeilr][^t act][a-filnopr-v ^t]++function[^t^p ]+^([a-z_-ÿ][a-z0-9_-ÿ]++^)[^t^p ]++("
                  /TGBegin "Parameter"
                  /TGFindStr = "[^t ^p]++^([~,]+^)"
                  /TGFindBStart = "("
                  /TGFindBEnd = ")"
                  /TGEnd
                  /TGBegin "Variable"
                  /TGFindStr = "%[^t ]++^(^$[a-z_-ÿ][a-z0-9_-ÿ]++^)[^t ]++=*;"
                  /TGFindBStart = "{"
                  /TGFindBEnd = "}"
                  /TGEnd
                  /TGEnd
                  The only problems I can see are those functions in test.php5 having 1 or more initialized arrays as the array values are separated also by commas like the function parameters. Especially phpArguments_13 is a function with an argument list which I think can never be correct listed in function list view because of an initialized array containing 2 initialized arrays. And that's just the easiest possible array definition with more than 1 dimension. No regular expression search will be ever possible to correct detect initialized multi-dimensional arrays.

                  BTW: One ) was missing at function phpArguments_12 in supplied test.php5. I corrected this mistake, packed test.php5 and replaced test.zip in above post with this corrected version.

                  rhapdog, please note that I have also slightly changed the function string for local variables within a PHP function according to general naming rule of PHP.

                  However, what can we do to improve the function list view result for those functions with initialized arrays. Well, we can add additional regular expressions to subgroup Parameter like following:

                  Code: Select all

                  /TGBegin "Parameter"
                  /TGFindStr = "[^t ^p]++^(^$[~,()]+(*)^)"
                  /TGFindBStart = "("
                  /TGFindBEnd = ")"
                  /TGFindStr = "[^t ^p]++^([~,]+^)"
                  /TGFindBStart = "("
                  /TGFindBEnd = ")"
                  /TGEnd
                  Now simple initialized arrays are displayed at least also complete in the function list view additional to the wrong parameter list. For example phpArguments_11 and phpArguments_12

                  Code: Select all

                  phpArguments_11
                     Parameter
                        $foo = array(1,2,3,4)
                        $foo = array(1
                        2
                        3
                        4)
                  phpArguments_12
                     Parameter
                        $foo = array('test' => 5, 'hello', 'foo' => "hehe")
                        $foo = array('test' => 5
                        'hello'
                        'foo' => "hehe")
                  Unfortunately UltraEdit does not take out already found strings within the block defined by opening and matching closing tag because that would avoid that second regular expression would find also strings.

                  I will think about redefining the second expression to be less general so that it does not find strings already found by first regular expression for parameters with a simple initialized array. Perhaps with 2 or more regular expressions not finding parameters with an array we can avoid that garbage in the parameter list in function list view.


                  The second use case is using a flat list which is the only available list for UltraEdit prior v16.00. We do not have much options here because no search for a matching ) can be done with a regular expression without additional code support like available for the hierarchical function list view.

                  I can only offer

                  Code: Select all

                  /Function String = "%[^t ]++function[^t^p ]+^([a-z_-ÿ][a-z0-9_-ÿ]++[^t^p ]++([~)]++)^)"
                  /Function String 1 = "%[^t ]++[afps][birtu][abinos][altv][aeilr][^t act][a-filnopr-v ^t]++function[^t^p ]+^([a-z_-ÿ][a-z0-9_-ÿ]++[^t^p ]++([~)]++)^)"
                  which stops matching characters after ( on first found ) instead of really matching ), or

                  Code: Select all

                  /Function String = "%[^t ]++function[^t^p ]+^([a-z_-ÿ][a-z0-9_-ÿ]++[^t^p ]++([~{]+^)"
                  /Function String 1 = "%[^t ]++[afps][birtu][abinos][altv][aeilr][^t act][a-filnopr-v ^t]++function[^t^p ]+^([a-z_-ÿ][a-z0-9_-ÿ]++[^t^p ]++([~{]+^)"
                  which matches everything after ( up to beginning of function body. UltraEdit removes line terminating characters on found string resulting in a single line string for the function list view.

                    Apr 30, 2012#9

                    darkdragon-001 wrote:In PHP (or at least the CakePHP framework) it's very common to use few parameter and instead use an $options array like follows ... So I think the empty array shoulld be recognized correctly! Further, multidimensional array declarations as default values are very rare.
                    Well, that's good because an array with () is no problem. Just arrays with initialization are difficult to correct recognize.

                    darkdragon-001 wrote:PHP has one great advantage towards other programming languages in terms of recognition:
                    All variables MUST start with a dollar sign $! I think we should use this somehow.
                    That was also my idea as I played today morning with expressions. I found quite good working expressions A very greedy one for variables with an initialized array like $foo = array('test' => 5, 'hello', 'foo' => "hehe") of phpArguments_12 or $foo = array(array(1,2), array(3,4)) of phpArguments_13 and other expressions for recognition of simple arguments.

                    But UltraEdit has some problems. The expressions produce on some PHP function arguments completely unexpected results in function list view. Running an UltraEdit regular expression Replace All with the regular expression as written in php.uew as search string and »^1« as replace string to see what is found and what is tagged produces the correct result. I thing the hierarchical function list feature has some memory problems.

                    It's not the first time that I see unexpected results for a regular expression working fine when executed manually with Find/Replace command. Such unexpected results were a pain in the neck in the past as I created a hierarchical function list view for other forum members for non well known text files. I reported those issues I could see to IDM support and I will do this here to. But I fear that I won't be able to find regular expressions working good for PHP functions with initialized arrays or strings containing () as long as the IDM developers do not look into the issues I can see and fix them.

                    For example I have as first regular expression for function parameters:

                    /TGFindStr = "[^t ^p]++^(^$[a-z_-ÿ][~,()]+(?++)[~,){^p]++^)"
                    /TGFindBStart = "("
                    /TGFindBEnd = ")"


                    This expression is very greedy because of (?++) which matches everything from first ( to last ) instead of first ) as (*) would do. But this expression results in not only showing function parameters with a string value containing () or arrays with initialized values as a manually executed Find/Replace with this expression does, it results also in listing foo1, of function phpArguments_1 in the function list view. That is complete nonsense and I don't have any idea why foo1, is displayed with this expression. Something is definitely not working correct within UltraEdit for the hierarchical function list view for subgroups.

                    darkdragon-001 wrote:For many users it would be good at least match the parameters without default declarations correctly and omit the default values completely (it also takes less space to display -> this is something to take into account for the flat lists where the parameters are not listed one under each other).
                    Hm, that makes sense and would result in using a completely different strategy. We can in this case really scan the arguments lists for variables starting with $ or references of variables starting with & $ or &$. For the special exceptions OtherClass $otherclass, array $input_array and callable $callback there can be surely also appriopriate expressions found. Maybe I will concentrate on this approach while I'm waiting on what IDM answers on the wrong results in function list view as this can take a longer time before the issues are fixed and we can really look for expressions take initialization values into account.

                    darkdragon-001 wrote:Is it possible to set /TGFindBEnd = ")[^t ]{"?
                    Good idea, but unfortunately not working and I don't know why. Also /TGFindBEnd = ")[^t ^p]++{" is not working. The result is more worse than just using /TGFindBEnd = ")" which is working for all examples in test.php5. It does not working for something like

                    Code: Select all

                    function specialExplode2($string, $delim = ")", $lineTerm) {
                    }
                    as UltraEdit interprets the single ) within the string as matching ) for starting (. Because within a string value every string is possible including a complete function definition, there will be surely never an expression which can handle all possibilities as long as UltraEdit does not ignore strings (and comments) completely on scan for matching ) like a PHP interpreter. But I'm quite sure with this limitation 99.999 % of all arguments lists in PHP functions are correct recognized by UltraEdit.

                    darkdragon-001 wrote:The flat lists you provided do no longer match functions My_PHP_Function_04() to My_PHP_Function_21() -> those with keywords before function.
                    Yes, my fault. I had not tested the regular expressions yesterday before posting. I had forgotten to remove ^) within second regular expression in both variants. I have just corrected those mistakes in my previous post.

                    35
                    Basic UserBasic User
                    35

                      May 28, 2012#10

                      Amazing thread!!! Hours for me to try and understand!

                      I am having some difficulty using Mofi's Function and Parameter strings:
                      Mofi wrote:About functions within a block comment:
                      Using the following function definitions results in a quite good hierarchical function list.

                      Code: Select all

                      ...
                      /TGBegin "Function"
                      /TGFindStr = "%[^t ]++function[^t^p ]+^([a-z_-ÿ][a-z0-9_-ÿ]++^)[^t^p ]++("
                      /TGFindStr = "%[^t ]++[afps][birtu][abinos][altv][aeilr][^t act][a-filnopr-v ^t]++function[^t^p ]+^([a-z_-ÿ][a-z0-9_-ÿ]++^)[^t^p ]++("
                      /TGBegin "Parameter"
                      /TGFindStr = "[^t ^p]++^([~,]+^)"
                      /TGFindBStart = "("
                      /TGFindBEnd = ")"
                      /TGEnd
                      /TGBegin "Variable"
                      /TGFindStr = "%[^t ]++^(^$[a-z_-ÿ][a-z0-9_-ÿ]++^)[^t ]++=*;"
                      /TGFindBStart = "{"
                      /TGFindBEnd = "}"
                      /TGEnd
                      /TGEnd
                      ...
                      However, what can we do to improve the function list view result for those functions with initialized arrays. Well, we can add additional regular expressions to subgroup Parameter like following:

                      Code: Select all

                      /TGBegin "Parameter"
                      /TGFindStr = "[^t ^p]++^(^$[~,()]+(*)^)"
                      /TGFindBStart = "("
                      /TGFindBEnd = ")"
                      /TGFindStr = "[^t ^p]++^([~,]+^)"
                      /TGFindBStart = "("
                      /TGFindBEnd = ")"
                      /TGEnd
                      Now simple initialized arrays are displayed at least also complete in the function list view additional to the wrong parameter list. For example phpArguments_11 and phpArguments_12

                      Code: Select all

                      phpArguments_11
                         Parameter
                            $foo = array(1,2,3,4)
                            $foo = array(1
                            2
                            3
                            4)
                      phpArguments_12
                         Parameter
                            $foo = array('test' => 5, 'hello', 'foo' => "hehe")
                            $foo = array('test' => 5
                            'hello'
                            'foo' => "hehe")
                      When I select "Flat List", I find that the parameters are listed BEFORE the function:

                      Code: Select all

                      $foo = array(1,2,3,4)
                      $foo = array(1
                      2
                      3
                      4)
                      phpArguments_11
                      $foo = array('test' => 5, 'hello', 'foo' => "hehe")
                      $foo = array('test' => 5
                      'hello'
                      'foo' => "hehe")
                      phpArguments_12
                      Did I do something wrong on my end or is this what I should expect from the Function and Parameter strings for a Flat List?


                      On another related matter: It seems that I can't use Perl for top group if I am going to have a subgroup... I suspect that adding "/Regexp Type = Perl" to the wordfile prevents the subgroup UE regex from being processed correctly... and per Mofi's statement below, subgroup regex needs to be UE regex...

                      Mofi -- any update on being able to use Perl regex for subgroups?
                      Mofi wrote:@rhapdog
                      ...
                      The regular expressions of top level groups are usually UltraEdit regular expressions, but can be also Perl regular expressions if manually adding /Regexp Type = Perl to the wordfile. Unfortunately the regular expressions of subgroups are always executed with the UltraEdit regular expression engine. I have reported this issue to IDM support by email long time ago and I have not checked since this report if the IDM developers have changed something regarding this issue.
                      Thanks to all who spent so much time and effort to this thread.

                      And is there any way to Expand All in the hierarchical Function List so I don't have to manually expand (and I don't want a Flat List)?

                      With Regards-
                      Sam

                      6,602548
                      Grand MasterGrand Master
                      6,602548

                        May 28, 2012#11

                        Turning on Flat List option while a hierarchical function list definition is used is in general no good idea. The result is most often not usable. Even with Sort List option turned off the flat list of a hierarchical function list is most often not useful. I have never done this before as it does not make sense to me. It looks like all strings found on same line are always sorted according to ASCII/ANSI table even with Sort List option turned off. That explains why the parameter of a function starting with character $ are listed in the unsorted flat list above the name of the function starting with a lower case letter found on same line.


                        I have no updated information about Perl regular expression usage in hierarchical function string definitions. As you have already noticed it looks like using /Regexp Type = Perl in a wordfile containing function strings for subgroups results in a completely not working function list search. I get whether a useful function list result with Perl regular expressions used for all search strings nor with using Perl regular expressions only for top level search strings and all other search strings are in UltraEdit syntax with release candidate 1 of UltraEdit v18.10. I think in some previous versions of UltraEdit at least the combination of Perl regular expressions for top level search strings and UltraEdit regular expressions for subgroups worked, but I'm not really sure. Currently I can just recommend not to use Perl regular expressions when having subgroups defined too in the wordfile.


                        Unfortunately there is no command to Expand All in a hierarchical function list view. Also updating the function list by executing Search - Function List results always in getting the updated function list view collapsed again except for top level. This behavior is usually good for really working with the function list, but is a very annoying behavior when experimenting with the function string regular expressions. I also don't know of any configuration setting which results in an always completely expanded function list view.


                        I have not posted any update regarding the project to find good function strings for a hierarchical function list view. The reason is that I was not able to find something useful because of many unexpected and unexplainable results on my approaches. I have reported in detail to IDM support what I have seen and asked 3 questions why UltraEdit produces those totally unexpected results. But the IDM developers have not yet found time to look on these problems and answer my questions as they worked on UltraEdit v18.10 this month. I will stay right with the questions as soon as UE v18.10 is released (most likely this week). But I think the IDM developers have to do a large maintenance work on the hierarchical function list feature and fix all the problems I have reported in the past on this feature before we can get a really good working hierarchical function list view for PHP code.

                        If somebody has much time and is really interested in the work on a hierarchical function list for PHP, in the attached RAR archive are my emails sent to IDM support and the replies. The RAR archive is encrypted and password protected to prevent search engines to index the files inside. The archive password is the name of the file.
                        php_function_string_mails.rar (4.72 KiB)   265
                        Test files and emails exchanged with IDM support regarding function strings for PHP.

                        35
                        Basic UserBasic User
                        35

                          May 28, 2012#12

                          Thanks Mofi -- and I appreciate you sharing the useful info / test files re: your correspondence with IDM

                          6,602548
                          Grand MasterGrand Master
                          6,602548

                            Dec 04, 2013#13

                            Hello PHP programmers!

                            I re-activate this topic as some things changed in the meantime. As this topic was created, UltraEdit supported only Perl regular expressions for the top level of a hierarchical function list. So for parameters (arguments) of a PHP function or for variables within the body of a PHP function just the UltraEdit expression was available.

                            But with UltraEdit v19.00 and UEStudio v13.00 not just an updated Perl regular expression engine with more enhanced features than before was introduced, the support for Perl regular expression function strings was enhanced, too. The Perl regular expression engine is used now for all regular expression strings in the function definition block in a syntax highlighting wordfile containing the line /Regexp Type = Perl. My knowledge about Perl regular expressions increased also in the meantime. And finally IDM decided to switch to the Perl regular expression engine for all function string definitions in the standard wordfiles installed with UltraEdit.

                            Therefore I looked on file php.uew installed with UE v20.00.0.1054, on the excellent test file created by forum member darkdragon-001 which was further extended by me, and on all the explanations given by darkdragon-001 and in the referenced documentations he provided as well as the other contributions on this topic.

                            The formerly attached ZIP file contained
                            • the enhanced PHP function list test file, and
                            • the wordfile php.uew as I want to suggest IDM as standard PHP wordfile for future versions of UE/UES.
                            The wordfile php.uew is from UE v20.00.0.1054, but with a modified and hopefully enhanced function definitions block which is:

                            Code: Select all

                            /Regexp Type = Perl
                            /TGBegin "Requires"
                            /TGFindStr = "^[\t ]*require(?:_once)*[\t "']+([0-9a-z$&*\-./:=?\\^_]+)["']*;"
                            /TGEnd
                            /TGBegin "Includes"
                            /TGFindStr = "^[\t ]*include(?:_once)*[\t "']+([0-9a-z$&*\-./:=?\\^_]+)["']*;"
                            /TGEnd
                            /TGBegin "Functions"
                            /TGFindStr = "^[\t ]*(?:(?:abstract|final|private|protected|public|static)[\t ]+)*function[\t ]+([a-z_\x7f-\xff][0-9a-z_\x7f-\xff]*)"
                            /TGBegin "Parameters"
                            /TGFindStr = "[\t ]*(\$[a-z_\x7f-\xff][0-9a-z_\x7f-\xff]*[\t ]*=[\t ]*(?:array[\t ]*\((?:[^()]*|(?:[\t ]*array[\t ]*\(.*?\))+[\t ]*)\)|".*?"(?<!\\")|'.*?'(?<!\\'))|[^,]+)"
                            /TGFindBStart = "\("
                            /TGFindBEnd = "\)\s*(?:\{|#|//|/\*)"
                            /TGEnd
                            /TGBegin "Variables"
                            /TGFindStr = "^[\t ]*\$([a-z_\x7f-\xff][0-9a-z_\x7f-\xff]*)[\t ]*="
                            /TGFindBStart = "\{"
                            /TGFindBEnd = "\}"
                            /TGEnd
                            /TGEnd
                            I don't know what the groups Requires and Includes are for and what should be displayed for those groups. So I have just simplified those 2 regular expressions copied from standard php.uew as installed with UE v20.00.0.1054.

                            The single Perl regular expression for the group Functions is my invention based on the test file provided by darkdragon-001. With this regular expression all functions in the test file are found and just their names are displayed at top level.

                            For subgroup Parameters listing the arguments of a function with initial value I found a really good working, but very complicated Perl regular expression. I will explain it later if it has proven to work good on lots of PHP files.

                            For subgroup Variables listing the variables defined in body of a function I just modified and partly simplified the expression from php.uew as installed with UE v20.00.0.1054. I do not have examples for this subgroup. So I can only hope it does not produce too many false positive.

                            The most difficult thing for correct listing the parameters of a function with their initial values are strings with a comma or a closing parenthesis and arrays. As you can see I found a solution at least for those variations listed in the test file. But PHP writers should understand that a Perl regular expression search is not a PHP interpreter. The Perl regular expression engine has no PHP language intellisense. So it is always possible to construct a string or an array on which even the complex Perl regular expression fails. My goal was to find an expression which works for 99.5% of all PHP functions with initializers. 100% would be only possible if PHP interpreter itself would produce the function list.

                            I want to offer additionally two more variants.

                            The first one is also for a hierarchical function list view. But this one shows in comparison to the above just the names of the function parameters without their context.

                            Code: Select all

                            /Regexp Type = Perl
                            /TGBegin "Requires"
                            /TGFindStr = "^[\t ]*require(?:_once)*[\t "']+([0-9a-z$&*\-./:=?\\^_]+)["']*;"
                            /TGEnd
                            /TGBegin "Includes"
                            /TGFindStr = "^[\t ]*include(?:_once)*[\t "']+([0-9a-z$&*\-./:=?\\^_]+)["']*;"
                            /TGEnd
                            /TGBegin "Functions"
                            /TGFindStr = "^[\t ]*(?:(?:abstract|final|private|protected|public|static)[\t ]+)*function[\t ]+([a-z_\x7f-\xff][0-9a-z_\x7f-\xff]*)"
                            /TGBegin "Parameters"
                            /TGFindStr = "\$([a-z_\x7f-\xff][0-9a-z_\x7f-\xff]*)"
                            /TGFindBStart = "\("
                            /TGFindBEnd = "\)\s*(?:\{|#|//|/\*)"
                            /TGEnd
                            /TGBegin "Variables"
                            /TGFindStr = "^[\t ]*\$([a-z_\x7f-\xff][0-9a-z_\x7f-\xff]*)[\t ]*="
                            /TGFindBStart = "\{"
                            /TGFindBEnd = "\}"
                            /TGEnd
                            /TGEnd
                            And the second variant is for those PHP programmers which prefer a flat function list. The subgroups Parameters and Variables are removed and instead the function name is displayed in the function list view with the parameters in a single line as defined in the PHP file.

                            Code: Select all

                            /Regexp Type = Perl
                            /TGBegin "Requires"
                            /TGFindStr = "^[\t ]*require(?:_once)*[\t "']+([0-9a-z$&*\-./:=?\\^_]+)["']*;"
                            /TGEnd
                            /TGBegin "Includes"
                            /TGFindStr = "^[\t ]*include(?:_once)*[\t "']+([0-9a-z$&*\-./:=?\\^_]+)["']*;"
                            /TGEnd
                            /TGBegin "Functions"
                            /TGFindStr = "^[\t ]*(?:(?:abstract|final|private|protected|public|static)[\t ]+)*function[\t ]+([a-z_\x7f-\xff][0-9a-z_\x7f-\xff]*[\t ]*\(.*?\))\s*(?:\{|#|//|/\*)"
                            /TGEnd
                            The regular expression for function names plus parameters is designed for single line function definitions. This could be changed if it is common for PHP files that the parameters of a function are spread over multiple lines. The search for functions is faster with this flast list variant.

                            I would highly welcome any feedback from PHP programmers as I'm not writing PHP code.

                            How good do those 3 variants work on your PHP files? Are there any functions not found? Are function parameters not correct listed?

                            Are variables defined in body of a function not listed or are there strings listed in subgroup Variables which should not be listed?

                            And can somebody bring light into my darkness about the groups Requires and Includes?


                            Some more notes about included php.uew in comparison to php.uew installed with UE v20.00.0.1054:
                            • Keyword Nocase in the first line is removed by me as it looks like PHP is case-sensitive and the words are sorted also case-sensitive.
                            • A line termination was missing between system and tan in php.uew installed with UE v20.00.0.1054.
                            • The operators <> and >> are removed as > is a word delimiter.
                            • Color group Operators sorted now also alphabetically. (That was not really necessary.)
                            Note: ?> is usually not a valid "word" definition, but in this case for PHP it is nevertheless valid as ?> is special interpreted by UltraEdit.

                              Dec 12, 2013#14

                              I read a little about the PHP statements include, include_once, require and require_once and improved the 2 regular expressions for those statements.

                              See edited post above for the improved expressions or look on the attached ZIP file with my second version of the test file and php.uew.

                              Update: This wordfile is installed with UltraEdit v21.00 and UEStudio v14.10.
                              php_enhanced_wordfile_2.zip (26.93 KiB)   108
                              This ZIP file contains the test file and second version of enhanced php.uew for investigation by PHP programmers.
                              Best regards from an UC/UE/UES for Windows user from Austria