Functions list is empty for UTF-8 encoded PHP files after update to UltraEdit v25.10.0.50 (solved)

Functions list is empty for UTF-8 encoded PHP files after update to UltraEdit v25.10.0.50 (solved)

231
Basic UserBasic User
231

    Jul 16, 2018#1

    Hi Mofi,

    My functions list of PHP files is empty after update from UltraEdit for Windows v25.10.0.16 to v25.10.0.50.

    Do you know a possible reason for this?

    My environment: Windows 10 64-bit, 64-bit UltraEdit used in Classic mode.

    Regards, Mario

      Jul 17, 2018#2

      After some advice by Mofi deleted in the meantime I copied
      %APPDATA%\IDMComp\UltraEdit\wordfiles\legacy\php.uew
      to
      %APPDATA%\IDMComp\UltraEdit\wordfiles\php.uew

      Now the function list works again on using legacy php.uew instead of standard php.uew.

      Here is the mistake. What happens?

      6,688586
      Grand MasterGrand Master
      6,688586

        Jul 17, 2018#3

        The main difference between legacy wordfiles and standard wordfiles is the used regular expression engine. Most standard wordfiles use the more powerful Perl regular expression engine while the legacy wordfiles use the UltraEdit regular expression engine. The second difference is that the legacy wordfiles are usually not updated anymore at all. The last time I could motivate IDM Computer Solutions to update the legacy wordfiles was for UE v24.00. So the list of words can differ between standard and legacy wordfile.

        UltraEdit hotfix information contains the list of fixes applied to v25.10.0.50 in comparison to v25.10.0.16. One of these fixes has obviously a negative effect on searching strings using the Perl regular expressions in standard php.uew.

        My problem is that I can't help you to find out what is the cause for not finding any string for the function list with the Perl regular expression ^[\t ]*(?:(?:abstract|final|private|protected|public|static)[\t ]+)*function[\t ]+([a-z_\x7f-\xff][0-9a-z_\x7f-\xff]*) because I don't have any PHP file. I normally run a find with the regular expression string as written in the wordfile manually on a file to check if the find works as expected with that expression. But I can't do that without having a PHP file encoded in ANSI or UTF-8 or UTF-16 containing at least three functions.

        I could imagine that \x7f-\xff are problematic because of v25.10.0.50 contains a fix for hexadecimal Perl regular expression search in ANSI encoded files, see Hexadecimal search with Perl regex partly not working anymore for ANSI encoded file in UE v24.xx and v25.00 (fixed). But I can't verify that without a PHP file.

        It looks like the regular expression to match the function name is correct according to User-defined functions taking into account that UltraEdit runs the finds for the strings for function list always case-insensitive and therefore just a-z is the same as a-zA-Z.

        What happens on using standard php.uew with both occurrences of \x7f-\xff replaced by \x{007f}-\x{00ff} (Unicode hexadecimal) or €-ÿ (0x80 to 0xFF with Windows-1252 encoding) in regular expression string for functions?

        However, I suggest to report this issue to IDM support per email. But please add the PHP file on which standard php.uew does not work anymore with v25.10.0.50 to find strings for the function list.
        Best regards from an UC/UE/UES for Windows user from Austria

        231
        Basic UserBasic User
        231

          Jul 17, 2018#4

          I have attached a typical PHP file. You should see two functions with parameters and variables. The screenshot of function list is created with the legacy php.uew

          I hope that helps.
          functionlist.png (7.08KiB)
          screenshot of function list create with legacy php.euw
          oxid.php (1.48 KiB)   27
          a typical php file for checking the function list

          6,688586
          Grand MasterGrand Master
          6,688586

            Jul 17, 2018#5

            Thanks for the ASCII encoded PHP file with UNIX line endings. That's a big help for me. But I have a problem: After opening your PHP file and opening the function list with 32-bit UltraEdit v25.10.0.50 using default configuration (uedit32u.ini just created by UltraEdit on start), the function list shows everything as expected, see attached image.
            php_function_list.png (58.88KiB)
            PHP function list as shown by 32-bit UE v25.10.0.50.
            Best regards from an UC/UE/UES for Windows user from Austria

            231
            Basic UserBasic User
            231

              Jul 17, 2018#6

              I use the 64-bit version and I have the problem on two machines.

              6,688586
              Grand MasterGrand Master
              6,688586

                Jul 18, 2018#7

                I don't think the issue exists just because of you are using 64-bit version of UltraEdit. I will nevertheless try to reproduce this issue with 64-bit UE in some hours.

                In the meantime please do following:
                1. Close in UltraEdit an opened project if there is a project opened at all.
                2. Close all opened files.
                3. Open Advanced - Settings or Configuration - Toolbars / menus - Miscellaneous and click on button Clear history to delete (nearly) all histories stored in INI file of UltraEdit. Then close configuration with button Cancel.
                4. Exit UltraEdit for example with Alt+F4.
                5. Compress just file %APPDATA%\IDMComp\UltraEdit\uedit64u.ini or even better the entire directory %APPDATA%\IDMComp\UltraEdit recursive with all files and subdirectories into a 7-Zip, ZIP or RAR archive file. You can password protect the archive file if you don't want that search engines index the files inside the archive.
                6. Post a reply and attach the archive file with your INI file or your entire UltraEdit configuration as attachment for investigation by me. Don't forget to post the password if you have password protected the archive file.
                I will download the archive file, then delete this public online file and next look on your configuration respectively try to reproduce this issue with your configuration on my machines.

                You can also contact IDM support by email with your configuration in archive file attached if you want complete private help on this issue with making your configuration not public available for some hours.

                  Jul 18, 2018#8

                  There is no difference in PHP function list behavior between 32-bit and 64-bit UltraEdit.

                  But I could reproduce the issue with your configuration and find out the cause and a workaround.

                  There are two issues:
                  1. There is configured UTF-8 as Default encoding (for new files and file open when auto-detect fails) at Advanced - Settings or Configuration - File handling - Encoding which results in getting file oxid.php containing only ASCII characters and having no BOM loaded as UTF-8 encoded file. The reason is that auto-detecting encoding fails here as a file containing only ASCII characters and no BOM could be interpreted as UTF-8 or as ANSI encoded. The PHP file would be interpreted as ANSI encoded if there would be at least one non ASCII character with a code value greater 0x7F like an ANSI encoded German umlaut.
                    The default for this configuration setting is ANSI which results in getting interpreted this file as Windows-1252 encoded file according to Windows region and language settings on my Windows computers.
                  2. The Perl regular expression engine fails to find any string with the regular expressions in standard wordfile php.uew containing \x7f-\xff on Unicode files. This is really unexpected as the file does not contain any non ASCII character. That is the real issue causing an empty function list for all PHP files being UTF-8 or UTF-16 encoded.
                  The quick solution is downloading the attached ZIP file and extract included php.uew into directory %APPDATA%\IDMComp\UltraEdit\wordfiles overwriting already existing file.
                  This wordfile contains \x{007f}-\x{00ff} instead of \x7f-\xff in all regular expressions containing this character range definition. Then the Perl regular expression search works for ANSI and Unicode encoded PHP files. I have made also some other minor improvements on some other Perl regular expressions.

                  UTF-8 selected for Default encoding (for new files and file open when auto-detect fails) can be kept because of saving UTF-8 encoded files is configured always without BOM in your configuration. So there is no difference in binary representation of the characters for PHP files containing only ASCII characters and you obviously prefer UTF-8 encoding anyhow.

                  I will report the issue to IDM support by email.

                  What is unclear for me is the basic syntax description for PHP labels (function names, variable names, ...). PHP manual for user-defined functions explains:
                  Function names follow the same rules as other labels in PHP. A valid function name starts with a letter or underscore, followed by any number of letters, numbers, or underscores. As a regular expression, it would be expressed thus: [a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*.
                  Is the regular expression for the byte stream read by PHP interpreter or for characters taking character encoding into account?

                  A function name like OmegaΩ in a UTF-8 encoded file name would be encoded with the bytes 0x4F 0x6D 0x65 0x67 0x61 0xE2 0x84 and so the case-sensitive regular expression [a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]* matches this function name on being run on byte stream. But the same regular expression does not match OmegaΩ on being run on character stream as also case-insensitive [a-z_\x{007f}-\x{00ff}][0-9a-z_\x{007f}-\x{00ff}]* used in php.uew in attached ZIP file.

                  I think the right expression for php.uew would be [a-z_\x{007f}-\x{fffd}][0-9a-z_\x{007f}-\x{fffd}]* to match any Unicode character with a code value in range U+007F to U+FFFD. I know this is more or less theoretical because most labels in PHP files are most likely mainly using only ASCII characters with a code value lower than 0x7F. However, when we change the Perl regular expressions in php.uew now, we should change it to correct expression to match also unusual function and variable names.

                  Could a PHP programmer test if a function name OmegaΩ in a UTF-8 encoded PHP file really works by writing a function with that name outputting something on being called.

                  PS: I have added also function name mysqli_query to color group 6 as this name of built-in function mysqli_query available since PHP 5.0 was missing in wordfile.

                  Update 1: I modified php.uew once more and replaced all occurrences of \x{00ff} by \x{fffd}.
                  Update 2: The attached wordfile php.uew was once more updated on 2019-05-08 with three additional functions added to color group 6 and with some words moved to other color groups.

                  Update 3: The wordfile php.uew in attached ZIP file is installed by default with UltraEdit for Windows since v26.10.0.72 and UEStudio since v19.10.0.46 in subdirectory wordfiles in program files directory of UltraEdit/UEStudio. This updated php.uew must be just copied manually on using default paths after an update or upgrade of UltraEdit/UEStudio from
                  • %ProgramFiles%\IDM Computer Solutions\UltraEdit\wordfiles
                    or
                    %ProgramFiles(x86)%\IDM Computer Solutions\UltraEdit\wordfiles
                    or
                    %ProgramFiles%\IDM Computer Solutions\UEStudio\wordfiles
                    or
                    %ProgramFiles(x86)%\IDM Computer Solutions\UEStudio\wordfiles
                    to
                  • %APPDATA%\IDMComp\UltraEdit\wordfiles
                    or
                    %APPDATA%\IDMComp\UEStudio\wordfiles
                  php_uew.zip (41.54 KiB)   2
                  This ZIP file contains php.uew with improved Perl regular expression strings last updated on 2019-05-08.
                  Best regards from an UC/UE/UES for Windows user from Austria

                  231
                  Basic UserBasic User
                  231

                    Jul 19, 2018#9

                    @mofi: The new file work as expected. Thanks a lot.

                    To your question about UTF-8 characters in function names:

                    It seemed to have been possible in PHP5:

                    Unicode identifiers (function names) for non-localization purposes advisable
                    Exotic names for methods, constants, variables and fields - Bug or Feature

                    This means that your approach is correct and that the Unicode regex should stay in place for backwards compatibility. PHP5.6 is still in LTS.

                    6,688586
                    Grand MasterGrand Master
                    6,688586

                      Jul 20, 2018#10

                      Thanks Mario for the two links. Artefacto confirms my supposition. It is possible to use other characters then 0-9A-Za-z_ in function names, variable names, etc., but whoever do that is playing with fire or is looking for trouble as Artefacto expressed it.

                      I updated the wordfile attached to my previous post once more and replaced all occurrences of \x{00ff} by \x{fffd} to match every label being possible even those which are trouble makers.
                      Best regards from an UC/UE/UES for Windows user from Austria