User to user discussion and support for UltraEdit, UEStudio, UltraCompare, and other IDM applications.

Find, replace, find in files, replace in files, regular expressions
4 posts Page 1 of 1
Hi, all. I am struggling with a replace and I cannot figure out why. I have a large number of text files wherein I need to remove a specific number format/phrase and replace with a newline from all of them in one go. Some have a 1- OR 2-digit number on line before the phrase which also need to be removed, but some do not. (NO MORE THAN 2 DIGITS - I CANNOT USE [0-9]+ HERE AS THERE ARE OTHERS THAT HAVE MORE DISGITS THAT WE NEED TO LEAVE THE NUMBER ON, but remove the CONFIDENTIAL TS* PORTION) See below for example of what needs to be removed in RED. I can do it running three different Replace in UltraEdit REgEx, but cannot seem to craft one for a "one stop shop". Here's my three I've been running one after the other. I'd like to get all three instances covered one foul swoop, if possible. Any advice would be appreciated!

FIND: ^p[1-9][0-9]^pCONFIDENTIAL TS*^p
REPLACE:^p

FIND: ^p[0-9]^pCONFIDENTIAL TS*^p
REPLACE: ^p

FIND: ^pCONFIDENTIAL TS*^p
REPLACE: ^p

EXAMPLE:
Office: 123-456-7899
Fax: 3555-000-000
Email: firstname.smith@domain.com
1
CONFIDENTIAL TSl00000001


Office: 123-456-7899
Fax: 3555-000-000
Email: firstname.smith@domain.com
7
CONFIDENTIAL TSl00000776

Office: 123-456-7899
Fax: 3555-000-000
Email: firstname.smith@domain.com
15
CONFIDENTIAL TS2000099876



Office: 123-456-7899
Fax: 3555-000-000
Email: firstname.smith@domain.com
123-667-6666
CONFIDENTIAL TS7000099876
----------------------------------------------------------------------------------------------------------------------------------------------------
SHOULD RESULT IN:
EXAMPLE:
Office: 123-456-7899
Fax: 3555-000-000
Email: firstname.smith@domain.com


Office: 123-456-7899
Fax: 3555-000-000
Email: firstname.smith@domain.com
Office: 123-456-7899
Fax: 3555-000-000
Email: firstname.smith@domain.com


Office: 123-456-7899
Fax: 3555-000-000
Email: firstname.smith@domain.com
123-667-6666
With UltraEdit regular expression the search string could be: ^{^p[1-9][0-9]++^pCONFIDENTIAL TS*$^}^{^pCONFIDENTIAL TS*$^}

With Perl regular expression the search string could be: (?:\r\n\d{1,2})?\r\nCONFIDENTIAL TS.*$

Both need an empty replace string. Let me know if I should explain one or both of these two expressions.

Note: The UltraEdit regular expression search string requires that the line with the number contains additionally also at least 1 non digit character or it will match the entire line with the number independent on number of digits in that line.
Best regards from Austria
Thanks, Mofi. I see you used an "OR" for the UltraEdit one, which I had tried to do but couldn't get the first expression right to find either a 1 or 2 digit number. It looks like it's the [1-9][0-9]++^ part of it. However, I'm not sure I understand your note: Note: The UltraEdit regular expression search string requires that the line with the number contains additionally also at least 1 non digit character or it will match the entire line with the number independent on number of digits in that line.

I will try the Perl one. I had something similar in Perl but did not have the (?. beginning part with the closing ? If you can explain what that did?

Thank you for your assistance.

LBurr
[0-9]++ in UltraEdit regex search string means zero or more and not zero or one. This means that if there is a DOS line termination, then for example the number 123, and next again a DOS line termination, the expression ^p[1-9][0-9]++^p is true although the number has 3 digits and not just 1 or 2 digits. With any non digit character in the line above the line with string CONFIDENTIAL TS the expression ^p[1-9][0-9]++^p is definitely always false.

Explanation for Perl search string: (?:\r\n\d{1,2})?\r\nCONFIDENTIAL TS.*$

(?:...) ... non marking / non capturing group. The ?: after opening parenthesis changes the group from a marking to a non marking group. The Perl regular expression engine remembers the string found by the expression inside a marking group for back referencing in search or replace string. This is not done for a non marking group. A non marking group should be always used if the group is defined for something else than for back referencing as done in this search string.

\r\n ... carriage return and line-feed.

\d{1,2} ... any digit at least 1 but not more than 2 times.

? ... the question mark has many meanings in Perl regex syntax depending on preceding character. After the multipliers + and * the question mark changes matching behavior from greedy to non greedy. After an opening parenthesis the question mark defines together with next 1 or 2 characters the type of the group: non marking with :, positive lookahead with =, negative lookahead with !, positive lookbehind with <=, negative lookbehind with <! or flag value with - or + and valid flag letter before closing parenthesis. Here after the non marking group the question mark is interpreted as multiplier with meaning zero or one times. So the string found by the expression inside the non marking group can exist one times, but must not exist at all. In other words the match for expression in non marking group is optional. It would be also possible to use {0,1} after closing parenthesis instead of question mark.

.* ... any character except new line characters 0 or more times.

$ ... end of line without matching line termination or end of file if last line of file has no line termination.

One more note about UE regular expression search string: It requires that last line of file has also a line termination because in UltraEdit and Unix syntax $ returns true only if a newline character is found. End of file is not interpreted as end of line by UltraEdit/Unix regex engine.
Best regards from Austria
4 posts Page 1 of 1