Find 1 OR 2 digit numbers along with phrase

LBurr · Oct 01, 2015#12015-10-01T14:49+00:00

Hi, all. I am struggling with a replace and I cannot figure out why. I have a large number of text files wherein I need to remove a specific number format/phrase and replace with a newline from all of them in one go. Some have a 1- OR 2-digit number on line before the phrase which also need to be removed, but some do not. (NO MORE THAN 2 DIGITS - I CANNOT USE [0-9]+ HERE AS THERE ARE OTHERS THAT HAVE MORE DIGITS THAT WE NEED TO LEAVE THE NUMBER ON, but remove the CONFIDENTIAL TS* PORTION). See below for example of what needs to be removed in RED. I can do it running three different Replace in UltraEdit regular expression, but cannot seem to craft one for a "one stop shop". Here's my three I've been running one after the other. I'd like to get all three instances covered one foul swoop, if possible. Any advice would be appreciated!

FIND: ^p[1-9][0-9]^pCONFIDENTIAL TS*^p
REPLACE:^p

FIND: ^p[0-9]^pCONFIDENTIAL TS*^p
REPLACE: ^p

FIND: ^pCONFIDENTIAL TS*^p
REPLACE: ^p

EXAMPLE:
Office: 123-456-7899
Fax: 3555-000-000
Email: [email protected]
1
CONFIDENTIAL TSl00000001

Office: 123-456-7899
Fax: 3555-000-000
Email: [email protected]
7
CONFIDENTIAL TSl00000776
Office: 123-456-7899
Fax: 3555-000-000
Email: [email protected]
15
CONFIDENTIAL TS2000099876

Office: 123-456-7899
Fax: 3555-000-000
Email: [email protected]
123-667-6666
CONFIDENTIAL TS7000099876
----------------------------------------------------------------------------------------------------------------------------------------------------
SHOULD RESULT IN:
EXAMPLE:
Office: 123-456-7899
Fax: 3555-000-000
Email: [email protected]

Office: 123-456-7899
Fax: 3555-000-000
Email: [email protected]
Office: 123-456-7899
Fax: 3555-000-000
Email: [email protected]

Office: 123-456-7899
Fax: 3555-000-000
Email: [email protected]
123-667-6666

Mofi · Oct 01, 2015#22015-10-01T15:41+00:00

With UltraEdit regular expression the search string could be: ^{^p[1-9][0-9]++^pCONFIDENTIAL TS*$^}^{^pCONFIDENTIAL TS*$^}

With Perl regular expression the search string could be: (?:\r\n\d{1,2})?\r\nCONFIDENTIAL TS.*$

Both need an empty replace string. Let me know if I should explain one or both of these two expressions.

Note: The UltraEdit regular expression search string requires that the line with the number contains additionally also at least 1 non-digit character or it will match the entire line with the number independent on number of digits in that line.

LBurr · Oct 01, 2015#32015-10-01T16:07+00:00

Thanks, Mofi. I see you used an "OR" for the UltraEdit one, which I had tried to do but couldn't get the first expression right to find either a 1 or 2 digit number. It looks like it's the [1-9][0-9]++^ part of it. However, I'm not sure I understand your note: Note: The UltraEdit regular expression search string requires that the line with the number contains additionally also at least 1 non digit character or it will match the entire line with the number independent on number of digits in that line.

I will try the Perl one. I had something similar in Perl but did not have the (?. beginning part with the closing ? If you can explain what that did?

Thank you for your assistance.

LBurr

Mofi · Oct 02, 2015#42015-10-02T05:31+00:00

[0-9]++ in UltraEdit regex search string means zero or more and not zero or one. This means that if there is a DOS line termination, then for example the number 123, and next again a DOS line termination, the expression ^p[1-9][0-9]++^p is true although the number has 3 digits and not just 1 or 2 digits. With any non digit character in the line above the line with string CONFIDENTIAL TS the expression ^p[1-9][0-9]++^p is definitely always false.

Explanation for Perl search string: (?:\r\n\d{1,2})?\r\nCONFIDENTIAL TS.*$

(?:...) ... non-capturing group. The ?: after opening parenthesis changes the group from a capturing to a non-capturing group. The Perl regular expression engine remembers the string found by the expression inside a capturing group for back-referencing in search or replace string. This is not done for a non-capturing group. A non-capturing group should be always used if the group is defined for something else than for back-referencing as done in this search string.

\r\n ... carriage return and line-feed.

\d{1,2} ... any digit at least 1 but not more than 2 times.

? ... the question mark has many meanings in Perl regex syntax depending on preceding character. After the multipliers + and * the question mark changes matching behavior from greedy to non greedy. After an opening parenthesis the question mark defines together with next 1 or 2 characters the type of the group: non-capturing with :, positive lookahead with =, negative lookahead with !, positive lookbehind with <=, negative lookbehind with <! or flag value with - or + and valid flag letter before closing parenthesis. Here after the non-capturing group the question mark is interpreted as multiplier with meaning zero or one times. So the string found by the expression inside the non-capturing group can exist one times, but must not exist at all. In other words the match for expression in non-capturing group is optional. It would be also possible to use {0,1} after closing parenthesis instead of question mark.

.* ... any character except new line characters 0 or more times.

$ ... end of line without matching line termination or end of file if last line of file has no line termination.

One more note about UE regular expression search string: It requires that last line of file has also a line termination because in UltraEdit and Unix syntax $ returns true only if a newline character is found. End of file is not interpreted as end of line by UltraEdit/Unix regular expression engine.