Regex to find words with at least 1 alphabetic and 1 digit?

don_bradman · Feb 21, 2016#12016-02-21T13:51+00:00

Is is possible to search only for texts which are alphanumeric.

Sample:

I need to know 12abd is a man of in7egri7y or not.

The regex should only find the text's 12abd and in7egri7y.

Mofi · Feb 21, 2016#22016-02-21T18:08+00:00

Your task description has a serious problem: alphanumeric means consisting of alphabetic or numeric characters. This is true for all space separated sequences of characters in your example.

A case-insensitive regular expression search with any regexp engine with search string [a-z]+ and with Match Whole Word enabled finds just words consisting of alphabetic characters in ASCII ranges A-Za-z. As macro code:

Code: Select all

Find RegExp MatchWord "[a-z]+"

A case-insensitive Perl regular expression search with search string (?:[a-z]+[0-9]|[0-9]+[a-z])[a-z0-9]* and with Match Whole Word enabled finds just words consisting of alphanumeric characters in ASCII ranges 0-9A-Za-z and containing at least one numeric and one alphabetic character. As macro code:

Code: Select all

PerlReOn
Find RegExp MatchWord "(?:[a-z]+[0-9]|[0-9]+[a-z])[a-z0-9]*"

Also working would be:

Code: Select all

PerlReOn
Find RegExp "\b(?:[a-z]+[0-9]|[0-9]+[a-z])[a-z0-9]*\b"
Find RegExp "\<(?:[a-z]+[0-9]|[0-9]+[a-z])[a-z0-9]*\>"

\b ... any word boundary, does not match a character.

\< ... beginning of word, does not match a character.

\> ... end of word, does not match a character.

(?:...) ... non-capturing group for the OR expression.

| ... OR

[a-z]+ ... one or more alphabetic characters.

[0-9]+ ... one or more numeric characters.

[a-z] ... one alphabetic character.

[0-9] ... one numeric character.

[a-z0-9]* ... 0 or more alphanumeric characters.

To include also non ASCII alphabetical characters from entire Unicode table use as search string (?:[[:alpha:]]+[[:digit:]]|[[:digit:]]+[[:alpha:]])[[:alnum:]]* which excludes the underscore which is a word character matched by \w. \w and [[:alnum:]] are therefore not equal. \w is equivalent to [[:word:]]. But \d is equivalent to [[:digit:]].

Are you confused now? Yes, read Boost Perl Regular Expression Syntax from top to bottom. The Boost C++ RegExp library is included in UltraEdit. Which version of the library depends on version of UltraEdit. And UltraEdit does not support everything offered by the Boost C++ RegExp library. For example back-references with \g... are not yet supported by UltraEdit v22.20.0.49.

don_bradman · Feb 22, 2016#32016-02-22T02:03+00:00

Actually, I wanted to make a macro which will find Unicode (Hex) characters e.x. ⋯, ↠ and convert those lower case letters to upper case i.e. ⋯, ↠, so I needed to know if there is a regex which will find those expression and not those which are completely numeric e.x. –, … etc.

If I search with

Code: Select all

&#x[0-9a-z]+;

it will find all of them.

Mofi · Feb 22, 2016#42016-02-22T05:44+00:00

Well, for converting hexadecimal values a-f to upper case it does not really matter if values are found consisting only of digits.

The following case-sensitive Perl regexp Replace All finds hexadecimal Unicode values with digits and/or lower case letters a-f and convert them to upper case on replace.

Code: Select all

PerlReOn
Find MatchCase RegExp "(?<=&#x)([0-9a-f]{4})(?=;)"
Replace All "\U\1\E"

Same as above but without a positive lookbehind and positive lookahead:

Code: Select all

PerlReOn
Find MatchCase RegExp "&#x([0-9a-f]{4});"
Replace All "&#x\U\1\E;"

Remove MatchCase if you want to find also ⋯ and modify this string to ⋯

don_bradman · Feb 22, 2016#52016-02-22T14:54+00:00

When I use your replace, the expressions e.x. "–", "⋯" becomes "&#x<IDM-RE1>;", every single one of them becomes "&#x<IDM-RE1>;" in the entire file

Mofi · Feb 22, 2016#62016-02-22T17:41+00:00

I tested both macros with English UE v22.20.0.49 and now also with v14.10.0.1025 on an ASCII and a Unicode file with your posted text. Both macros produced the expected result. The second macro version worked even with UE v13.20a+1 for ASCII and Unicode file. So I don't know why the replace does not work on your computer with whatever version of UltraEdit used by you.

It would be possible to run just the Find with any regular expression engine and use command ToUpper executed in a loop until nothing found anymore. But the Perl regexp Replace All should work as is.

UltraEdit, UltraCompare, UEStudio forums