User to user discussion and support for UltraEdit, UEStudio, UltraCompare, and other IDM applications.

Find, replace, find in files, replace in files, regular expressions
6 posts Page 1 of 1
Is is possible to search only for texts which are alphanumeric.

Sample:

Code: Select all
I need to know 12abd is a man of in7egri7y or not.

The regex should only find the text's 12abd and in7egri7y.
Your task description has a serious problem: alphanumeric means consisting of alphabetic or numeric characters. This is true for all space separated sequences of characters in your example.

A case-insensitive regular expression search with any regexp engine with search string [a-z]+ and with Match Whole Word enabled finds just words consisting of alphabetic characters in ASCII ranges A-Za-z. As macro code:

Code: Select all
Find RegExp MatchWord "[a-z]+"

A case-insensitive Perl regular expression search with search string (?:[a-z]+[0-9]|[0-9]+[a-z])[a-z0-9]* and with Match Whole Word enabled finds just words consisting of alphanumeric characters in ASCII ranges 0-9A-Za-z and containing at least one numeric and one alphabetic character. As macro code:

Code: Select all
PerlReOn
Find RegExp MatchWord "(?:[a-z]+[0-9]|[0-9]+[a-z])[a-z0-9]*"

Also working would be:

Code: Select all
PerlReOn
Find RegExp "\b(?:[a-z]+[0-9]|[0-9]+[a-z])[a-z0-9]*\b"
Find RegExp "\<(?:[a-z]+[0-9]|[0-9]+[a-z])[a-z0-9]*\>"

\b ... any word boundary, does not match a character.

\< ... beginning of word, does not match a character.

\> ... end of word, does not match a character.

(?:...) ... non marking group for the OR expression.

| ... OR

[a-z]+ ... 1 or more alphabetic characters.

[0-9]+ ... 1 or more numeric characters.

[a-z] ... 1 alphabetic character.

[0-9] ... 1 numeric character.

[a-z0-9]* ... 0 or more alphanumeric characters.

To include also non ASCII alphabetical characters from entire Unicode table use as search string (?:[[:alpha:]]+[[:digit:]]|[[:digit:]]+[[:alpha:]])[[:alnum:]]* which excludes the underscore which is a word character matched by \w. \w and [[:alnum:]] are therefore not equal. \w is equivalent to [[:word:]]. But \d is equivalent to [[:digit:]].

Are you confused now? Yes, read Boost Perl Regular Expression Syntax from top to bottom. The Boost C++ RegExp library is included in UltraEdit. Which version of the library depends on version of UltraEdit. And UltraEdit does not support everything offered by the Boost C++ RegExp library. For example back-references with \g... are not yet supported by UltraEdit v22.20.0.49.
Best regards from Austria
Actually, I wanted to make a macro which will find Unicode (Hex) characters e.x. &#x22ef;, &#x21a0; and convert those lower case letters to upper case i.e. &#x22EF;, &#x21A0;, so I needed to know if there is a regex which will find those expression and not those which are completely numeric e.x. &#x2013;, &#x2026; etc.

If I search with

Code: Select all
&#x[0-9a-z]+;

it will find all of them.
Well, for converting hexadecimal values a-f to upper case it does not really matter if values are found consisting only of digits.

The following case-sensitive Perl regexp Replace All finds hexadecimal Unicode values with digits and/or lower case letters a-f and convert them to upper case on replace.

Code: Select all
PerlReOn
Find MatchCase RegExp "(?<=&#x)([0-9a-f]{4})(?=;)"
Replace All "\U\1\E"

Same as above but without a positive lookbehind and positive lookahead:

Code: Select all
PerlReOn
Find MatchCase RegExp "&#x([0-9a-f]{4});"
Replace All "&#x\U\1\E;"

Remove MatchCase if you want to find also &#x22eF; and modify this string to &#x22EF;
Best regards from Austria
When I use your replace, the expressions e.x. "&#x2013;", "&#x22ef;" becomes "&#x<IDM-RE1>;", every single one of them becomes "&#x<IDM-RE1>;" in the entire file :(
I tested both macros with English UE v22.20.0.49 and now also with v14.10.0.1025 on an ASCII and a Unicode file with your posted text. Both macros produced the expected result. The second macro version worked even with UE v13.20a+1 for ASCII and Unicode file. So I don't know why the replace does not work on your computer with whatever version of UltraEdit used by you.

It would be possible to run just the Find with any regular expression engine and use command ToUpper executed in a loop until nothing found anymore. But the Perl regexp Replace All should work as is.
Best regards from Austria
6 posts Page 1 of 1
cron