Find over multiple lines in a column

Find over multiple lines in a column

4
NewbieNewbie
4

    Jan 19, 2017#1

    I need to recognize and reformat email addresses (and other similar data formats) in a text file. The information in this file is divided into 2 columns, but a single data structure may span multiple lines in one column. The columns are separated by a tab character. The last lines of each 'record' in this file may exist but will span 2 columns.

    Example:
    2017-01-19 01_55_29-Start.png (15.96KiB)
    see this attached image for an example


    I am able to recognize the email address by using the UltraEdit regular expression [^.^-_0-9a-z]+^@[^.^-_0-9a-z^p]+.[a-z]+, even over multiple lines if there are no columns. But I'm curious if I can do a search over multiple lines in a specific column. And if possible, how do I eliminate the tab or CR/LF in the email address string?

    A straight forward approach could be to start with reformatting the file into one column (for example, moving the right column under the last line of the left column), but there is the difficulty that the last line may have free text, which makes it difficult to recognize the last line of the left column.

    Anybody have an idea?

    Art

    6,603548
    Grand MasterGrand Master
    6,603548

      Jan 19, 2017#2

      The case-insensitive UltraEdit tagged regular expression search string which might work for this task is:

      ^([0-9a-z.^-]+@[0-9a-z]*[.^-]^)^p^(*^t^)^([0-9a-z.^-]+^)$

      The replace string to use is: ^1^3^p^2

      The search string is defined to interpret a string containing @ as spanning over two lines only if the last character on line is a dot or a dash.
      Best regards from an UC/UE/UES for Windows user from Austria

      18672
      MasterMaster
      18672

        Jan 19, 2017#3

        Hi,

        just for an inspiration - you can modify following Perl regex to search in columns. And column mode must be ON!

        E.g. find abc and def starting col 10:
        (?<=^.{10})abc.*\r\n.{10}efg

        BR, Fleggy

        4
        NewbieNewbie
        4

          Jan 19, 2017#4

          Thanks Fleggy,

          I'm testing with it quite some time now. I'm not unfamiliar with regex but I can't get the perl regex to work for me.

          Code: Select all

          kjbasjbas ew8q9rfy73476134fuo13496      email:                                  
          [email protected] wqpiuqerp;b wdoiqjhwri  [email protected]                          
          fqkhwf8jnbuqrhfpqweprfqiruhfquihr       def oqrjgqeprgj[]qegrjg'j               
          qe[ghje[itojh[iqeqérgji[qerjgiqjrjíqrjgqpjergopjegopqj]]]]]                     
          
          First I had to discover that tab positions have no influence on the column mode. It is just regarded as one character, and ignores any tab setting (which is by the way understandable, this is an application setting)
          See the example above. With your solution I should be able to select the word email: (on line 1, starting at column 40) and at least the word "test" (on line 2 starting at column 40), but I cannot get it to work. My search over the internet didn't bring me any other suggestions.

          In your example, the find regex (?<=^.{40})email:.*\r\n.{40}test.*$ should select the entire string email:[email protected], but only selects email and test.
          The search regex (?<=^.{40})email:.*\r\n.{40}[email protected] doesn't find anything.

          I hope that you can help me out. Otherwise I consider a rigid solution, like copy each column of each record to a new file. Search in one column is a lot easier. :)

          Peter

          18672
          MasterMaster
          18672

            Jan 20, 2017#5

            Hi Art,

            I modified your pattern a little (not select ending whitechars):

            (?<=^.{40})email:.*\r\n.{40}test.*?(?=\s*$)

            It works for me in your sample using UE 23.20.0.43 and UE 24.00.0.10 BETA (both x64). What is your version? Try to change the setting Editor display -> Cursor/Caret -> allow positioning beyond line end (I have this option ON). Or maybe Mofi would have an idea why it doesn't work for you.

            BTW I found a bug in UE connected to this case and will report it. Always do CTRL+HOME before searching this pattern.

            BR, Fleggy


            EDIT: I don't use TABs - always SPACEs only.

            4
            NewbieNewbie
            4

              Jan 22, 2017#6

              Thanks Fleggy!

              I work with the latest Windows version.

              Your post helped me a lot, but unfortunately it didn't brought the solution I was looking for. The difficulty is that email addresses may be spanned over 1,2 or even lines within this column. I didn't succeed in finding a Perl regex pattern to solve this, but your pattern to search in specific columns helped me the in other search actions as well.

              This is how I solved the email find.

              Code: Select all

              InsertMode
              ColumnModeOff
              HexOff
              Key Ctrl+HOME
              Loop 2000
                IfEof
                  ExitLoop
                Else
                  Key Ctrl+HOME
                  PerlReOn
                  Find RegExp "(?<=^.{40})EMAIL:\r\n"
                  IfFound
                    GotoEndOfPrevWordSelect
                    GotoLine 0 41
                    Key DOWN ARROW
                    StartSelect
                      Key END
                    EndSelect
                    Cut
                    Key UP ARROW
                    Key END
                    Paste
                    ClearClipboard
                    Key END
                    StartSelect
                      Key LEFT ARROW
                    EndSelect
                    PerlReOn
                    Find RegExp SelectText "[\.\-@]" 'Line breaks in the selection are always after a ./-/@, so if this is the last character I have to consider the second line as well.
                    IfFound
                      GotoLine 0 41
                      Key DOWN ARROW
                      Key DOWN ARROW
                      StartSelect
                        Key END
                      EndSelect
                      Cut
                      Key UP ARROW
                      Key UP ARROW
                      Key END
                      Paste
                      ClearClipboard
                      Key END
                      StartSelect
                        Key LEFT ARROW
                      EndSelect
                      PerlReOn
                      Find RegExp SelectText "[\.\-@]" 'Line breaks in the selection are always after a ./-/@, so if this is the last character I have to consider the third line as well.
                      IfFound
                        GotoLine 0 41
                        Key DOWN ARROW
                        Key DOWN ARROW
                        Key DOWN ARROW
                        StartSelect
                          Key END
                        EndSelect
                        Cut
                        Key UP ARROW
                        Key UP ARROW
                        Key UP ARROW
                        Key END
                        Paste
                        ClearClipboard
                        Key END
                        StartSelect
                          Key LEFT ARROW
                        EndSelect
                        PerlReOn
                        Find RegExp SelectText "\w" 'The last character of the third must be alphanumeric. If not, the constructed email address is incorrect.
                        IfNotFound
                          'Still have to fix a rollback
                        EndIf
                      EndIf
                    EndIf
                  Else
                    ExitLoop
                  EndIf
                EndIf
              EndLoop
              Key Ctrl+HOME
              
              IMHO It is really annoying that the UE macro's have no support for remarks in the code.

              I did encounter other bugs as well, like the undo after a macro with Perl find/replace in them. It totally messes up the file and had to start over again.

              6,603548
              Grand MasterGrand Master
              6,603548

                Jan 23, 2017#7

                Art, look on sticky macro forum topic Macro examples and reference for beginners and experts how to save a macro additionally to compiled in macro file as text with comments and get this text representation syntax highlighting and indented.

                The command Top can be used for Key Ctrl+HOME.

                In the macro reference file is written that IfEof should be used only if it is guaranteed that the visible caret is reaching ever end of file. On using Find in a loop running with an indefinite number of iterations it is highly recommended to use IfFound or the opposite IfNotFound to exit the loop instead of IfEof as the caret is not moved to end of file if a searched string is not found.

                For example your macro code could be saved into a *.uem file as follows:

                Code: Select all

                InsertMode
                ColumnModeOff
                HexOff
                PerlReOn
                Top
                Clipboard 9
                Loop 0
                    Find RegExp "(?<=^.{40})EMAIL:\r\n"
                    IfNotFound
                        ExitLoop
                    EndIf
                    GotoLine 0 41
                    Key DOWN ARROW
                    StartSelect
                    Key END
                    EndSelect
                    Cut
                    Key UP ARROW
                    Key END
                    Paste
                    Key END
                    StartSelect
                    Key LEFT ARROW
                    EndSelect
                //  Line breaks in the selection are always after a ./-/@, so if this
                //  is the last character I have to consider the second line as well.
                    Find RegExp SelectText "[.\-@]"
                    IfFound
                        GotoLine 0 41
                        Key DOWN ARROW
                        Key DOWN ARROW
                        StartSelect
                        Key END
                        EndSelect
                        Cut
                        Key UP ARROW
                        Key UP ARROW
                        Key END
                        Paste
                        Key END
                        StartSelect
                        Key LEFT ARROW
                        EndSelect
                //      Line breaks in the selection are always after a ./-/@, so if this
                //      is the last character I have to consider the third line as well.
                        Find RegExp SelectText "[.\-@]"
                        IfFound
                            GotoLine 0 41
                            Key DOWN ARROW
                            Key DOWN ARROW
                            Key DOWN ARROW
                            StartSelect
                            Key END
                            EndSelect
                            Cut
                            Key UP ARROW
                            Key UP ARROW
                            Key UP ARROW
                            Key END
                            Paste
                            Key END
                            StartSelect
                            Key LEFT ARROW
                            EndSelect
                //          The last character of the third must be alphanumeric.
                //          If not, the constructed email address is incorrect.
                            Find RegExp SelectText "\w"
                            IfNotFound
                //          Still have to fix a rollback.
                            EndIf
                        EndIf
                    EndIf
                EndLoop
                Top
                ClearClipboard
                Clipboard 0
                
                I think the reformatting task could be done easier using a macro with several UltraEdit tagged regular expression or Perl regular expression using backreferences replaces. But it is very, very difficult to help you on this reformatting task without having examples showing us (nearly) real data lines before and same lines after reformatting. We have to think out our own example lines and if those example lines really represent the real data is most likely not the case as the past has proven many times. So for a better help on your reformatting task post two code blocks showing us lines before and after reformatting with all possible variations (lines not to modify, lines with email address over two lines, lines with email address over 3 lines, etc.).

                BTW: The dot in Perl/Unix syntax has no special meaning inside a character class (square brackets) and therefore must not be escaped with a backlash within the square brackets. Escaping a dot for being interpreted as literal character is necessary in a Perl/Unix regular expression only outside a character class.
                Best regards from an UC/UE/UES for Windows user from Austria

                4
                NewbieNewbie
                4

                  Feb 03, 2017#8

                  Well, actually I managed to do it without the macro but using 4 expressions for different patterns, without the column setting. For example, this expression corrects the email address in the right column (divided by TABs) when the email address and the label is separated over 3 lines:

                  Code: Select all

                  InsertMode
                  ColumnModeOff
                  HexOff
                  Top
                  PerlReOn
                  Find RegExp "\tEMAIL:\r\n^(.*?)[ \t]+([0-9a-z\-_.]{1,100}\@[0-9a-z\-_.]{1,100}\.[a-z]{2,6})"
                  Replace All "\t<emailnew>\L\2\E</emailnew>\r\n\1\r\n"
                  
                  Thanks for your help!