Regex find phone number

Regex find phone number

8
NewbieNewbie
8

    Jul 05, 2023#1

    I'm looking for a Perl regular expression to find a 10-digit number with optional spaces or other characters inside it. The regex should find these examples:

    Code: Select all

    1234567890
    123-456-7890
    123/456-7890
    123.456.7890
    123 456 7890
    This UltraEdit regex works: [0-9][~0-9]++[0-9][~0-9]++[0-9][~0-9]++[0-9][~0-9]++[0-9][~0-9]++[0-9][~0-9]++[0-9][~0-9]++[0-9][~0-9]++[0-9][~0-9]++[0-9].

    But I can't figure out the equivalent Perl regular expression.

    19476
    MasterMaster
    19476

      Jul 05, 2023#2

      Hi,

      for example this Perl regexp:

      (?<!\d)(\d[^\r\n\d]{0,3}){9}\d(?!\d)

      I limited the maximum count of non-digit chars to 3 but you can adjust it as you wish. I also added lookarounds to check that there are not other digits before and after a match.

      BR, Fleggy

      8
      NewbieNewbie
      8

        Jul 05, 2023#3

        Wow, thanks a lot Fleggy! That works beautifully, except in one rare case that I forgot to mention: With US/Canada numbers, people sometimes put a 1 before the 10-digit number.

        For example, 1-234-567-8901.  In that case, if I search "Previous" from the end of the line, your regex finds 234-567-8901, which is perfect.

        But if there are no non-digits, such as 12345678901, I'd like the regex to search "Previous" from the end of the line, and find 2345678901.  Is it possible to modify the regex to find it?

        6,680583
        Grand MasterGrand Master
        6,680583

          Jul 05, 2023#4

          There can be perhaps used the search expression (?:1-)?(?<!\d)(?:\d[^\r\n\d]{0,3}){9}\d(?!\d) which matches just once optionally also 1- as first part of the phone number string. That Perl regular expression works for the examples in both search directions with UltraEdit for Windows v2023.0.0.50. It could not be tested on strings which should not be matched by the expression as the example contains only strings to match.

          Why are the regular expression finds executed from end of a line in upwards direction?

          There can be appended to the expression $ to get a positive match for the phone numbers only on being found at end of a line and search downwards as usually done.

          Note: The character class \d is interpreted by the Perl regular expression engine as any digit according to Unicode which is not the same as the character class [0-9] which matches only ASCII digits. For example, ൧౨೩/౪۵6-৭൮੯ is also matched by this Perl regular expression. See the answers on the Stack Overflow question: Should I use \d or [0-9] to match digits in a Perl regex? There can be used (?:1-)?(?<![0-9])(?:[0-9][^\r\n0-9]{0,3}){9}[0-9](?![0-9]) to be more restrictive on searched file is a Unicode encoded file.
          Best regards from an UC/UE/UES for Windows user from Austria

          8
          NewbieNewbie
          8

            Jul 05, 2023#5

            Thanks a lot for your thorough information.  My files are plain ASCII, so Unicode isn't a problem and [0-9] or \d are both fine.

            I'm using this regex in a macro that cleans phone numbers by removing extra characters.  I run the macro when the cursor is at the end of the line because there might be other numeric information before the phone number which I don't want to change.  For example:
            Pat Miller, 1234 Main Street, Mudville MD 01234, US, (123) 456-7890 should change to:
            Pat Miller, 1234 Main Street, Mudville MD 01234, US, 1234567890 (change phone number and leave the rest as it is.)

            If there's a 1 before the phone number, it can be left alone or better yet it can be deleted.  Either way is fine. For example:
            Pat Miller, 1234 Main Street, Mudville MD 01234, US, 1(123) 456-7890 should change to:
            Pat Miller, 1234 Main Street, Mudville MD 01234, US, 11234567890 or
            Pat Miller, 1234 Main Street, Mudville MD 01234, US, 1234567890

            6,680583
            Grand MasterGrand Master
            6,680583

              Jul 05, 2023#6

              For the given examples can be used the macro code:

              Code: Select all

              InsertMode
              ColumnModeOff
              HexOff
              Top
              PerlReOn
              Find MatchCase RegExp "(?:1[ (\-]|\()?([0-9]{1,3})[^\r\n0-9]{1,2}([0-9]{1,3})[^\r\n0-9]{1,2}([0-9]{1,4})$"
              Replace All "\1\2\3"
              Best regards from an UC/UE/UES for Windows user from Austria

              8
              NewbieNewbie
              8

                Jul 05, 2023#7

                Thank you very much for your solutions! Now I have to figure out what the expressions mean.