Tapatalk

Regex to find words with at least 1 alphabetic and 1 digit?

Regex to find words with at least 1 alphabetic and 1 digit?

81
Advanced UserAdvanced User
81

    Feb 21, 2016#1

    Is is possible to search only for texts which are alphanumeric.

    Sample:

    Code: Select all

    I need to know 12abd is a man of in7egri7y or not.
    The regex should only find the text's 12abd and in7egri7y.

    6,685587
    Grand MasterGrand Master
    6,685587

      Feb 21, 2016#2

      Your task description has a serious problem: alphanumeric means consisting of alphabetic or numeric characters. This is true for all space separated sequences of characters in your example.

      A case-insensitive regular expression search with any regexp engine with search string [a-z]+ and with Match Whole Word enabled finds just words consisting of alphabetic characters in ASCII ranges A-Za-z. As macro code:

      Code: Select all

      Find RegExp MatchWord "[a-z]+"
      A case-insensitive Perl regular expression search with search string (?:[a-z]+[0-9]|[0-9]+[a-z])[a-z0-9]* and with Match Whole Word enabled finds just words consisting of alphanumeric characters in ASCII ranges 0-9A-Za-z and containing at least one numeric and one alphabetic character. As macro code:

      Code: Select all

      PerlReOn
      Find RegExp MatchWord "(?:[a-z]+[0-9]|[0-9]+[a-z])[a-z0-9]*"
      Also working would be:

      Code: Select all

      PerlReOn
      Find RegExp "\b(?:[a-z]+[0-9]|[0-9]+[a-z])[a-z0-9]*\b"
      Find RegExp "\<(?:[a-z]+[0-9]|[0-9]+[a-z])[a-z0-9]*\>"
      
      \b ... any word boundary, does not match a character.

      \< ... beginning of word, does not match a character.

      \> ... end of word, does not match a character.

      (?:...) ... non-capturing group for the OR expression.

      | ... OR

      [a-z]+ ... one or more alphabetic characters.

      [0-9]+ ... one or more numeric characters.

      [a-z] ... one alphabetic character.

      [0-9] ... one numeric character.

      [a-z0-9]* ... 0 or more alphanumeric characters.

      To include also non ASCII alphabetical characters from entire Unicode table use as search string (?:[[:alpha:]]+[[:digit:]]|[[:digit:]]+[[:alpha:]])[[:alnum:]]* which excludes the underscore which is a word character matched by \w. \w and [[:alnum:]] are therefore not equal. \w is equivalent to [[:word:]]. But \d is equivalent to [[:digit:]].

      Are you confused now? Yes, read Boost Perl Regular Expression Syntax from top to bottom. The Boost C++ RegExp library is included in UltraEdit. Which version of the library depends on version of UltraEdit. And UltraEdit does not support everything offered by the Boost C++ RegExp library. For example back-references with \g... are not yet supported by UltraEdit v22.20.0.49.
      Best regards from an UC/UE/UES for Windows user from Austria

      81
      Advanced UserAdvanced User
      81

        Re: Regex to find hexadecimal coded Unicode characters in HTML/XML file?

        Feb 22, 2016#3

        Actually, I wanted to make a macro which will find Unicode (Hex) characters e.x. &#x22ef;, &#x21a0; and convert those lower case letters to upper case i.e. &#x22EF;, &#x21A0;, so I needed to know if there is a regex which will find those expression and not those which are completely numeric e.x. &#x2013;, &#x2026; etc.

        If I search with

        Code: Select all

        &#x[0-9a-z]+;
        it will find all of them.

        6,685587
        Grand MasterGrand Master
        6,685587

          Re: Regex to find hexadecimal coded Unicode characters in HTML/XML file?

          Feb 22, 2016#4

          Well, for converting hexadecimal values a-f to upper case it does not really matter if values are found consisting only of digits.

          The following case-sensitive Perl regexp Replace All finds hexadecimal Unicode values with digits and/or lower case letters a-f and convert them to upper case on replace.

          Code: Select all

          PerlReOn
          Find MatchCase RegExp "(?<=&#x)([0-9a-f]{4})(?=;)"
          Replace All "\U\1\E"
          Same as above but without a positive lookbehind and positive lookahead:

          Code: Select all

          PerlReOn
          Find MatchCase RegExp "&#x([0-9a-f]{4});"
          Replace All "&#x\U\1\E;"
          Remove MatchCase if you want to find also &#x22eF; and modify this string to &#x22EF;
          Best regards from an UC/UE/UES for Windows user from Austria

          81
          Advanced UserAdvanced User
          81

            Re: Regex to find hexadecimal coded Unicode characters in HTML/XML file?

            Feb 22, 2016#5

            When I use your replace, the expressions e.x. "&#x2013;", "&#x22ef;" becomes "&#x<IDM-RE1>;", every single one of them becomes "&#x<IDM-RE1>;" in the entire file :(

            6,685587
            Grand MasterGrand Master
            6,685587

              Re: Regex to find hexadecimal coded Unicode characters in HTML/XML file?

              Feb 22, 2016#6

              I tested both macros with English UE v22.20.0.49 and now also with v14.10.0.1025 on an ASCII and a Unicode file with your posted text. Both macros produced the expected result. The second macro version worked even with UE v13.20a+1 for ASCII and Unicode file. So I don't know why the replace does not work on your computer with whatever version of UltraEdit used by you.

              It would be possible to run just the Find with any regular expression engine and use command ToUpper executed in a loop until nothing found anymore. But the Perl regexp Replace All should work as is.
              Best regards from an UC/UE/UES for Windows user from Austria