Find string B somewhere before string A

Find string B somewhere before string A

6
NewbieNewbie
6

    Jan 28, 2005#1

    Perhaps someone can help.

    In a group of 15,000 html files, a few contain a certain error I need to fix.

    Every file contains string A and string B.

    String B should appear only after string A, never before it.

    What I want to do is this:

    Find in Files, all instances where string B appears before string A. (That means anywhere before... it could be immediately before, or several lines before).

    UltraEdit will then generate a list of instances. (I can fix them manually since there won't be many.)

    I can't seem to figure out what expression will find this. Do concepts like "somewhere before" and "somewhere after" (maybe spanning multiple lines) translate into any regexp syntax?

    Any insight is appreciated.

    6,686585
    Grand MasterGrand Master
    6,686585

      Jan 28, 2005#2

      Try Find In Files with Results to Edit Window and following regular expression in UltraEdit style:

      string B[~|]+string A

      The character | is an example for a character, which does not exist in any of your files.
      Best regards from an UC/UE/UES for Windows user from Austria

      46
      Basic UserBasic User
      46

        Jan 28, 2005#3

        The following macro let you test if StringA is before StringB
        in a file:

        Code: Select all

        InsertMode
        ColumnModeOff
        HexOff
        UnixReOn
        EndSelect
        Top
        Find "StringA"
        IfFound
        StartSelect
        SelectToBottom
        Key Ctrl+END
        Find  "StringB"
        Replace All SelectText "StringB"
        IfFound
        EndSelect
        Do what you want, you know StringA is before StringB
        EndIf
        EndIf
        
        HTH (Hope This Help)
        Never forget: "Above the clouds, The sky is blue and the sun shine"

        6
        NewbieNewbie
        6

          Jan 31, 2005#4

          OK.. success! Thank you both for the replies.

          How funny that while looking for an arcane syntax, I didn't think of the most basic concept... use Find. Find string B, then find string A. If found, it means *error*!

          So, I wrote a macro that does this. It works off of a list of the html filenames, opens each file, searches for error, then if found puts it in a list of only the files w/ the error.

          May not be the most efficient thing to do, but it worked!

          I tried to use your regexp Mofi because it would be much simpler to execute than what I did. But no success. Entering it just as you wrote (substituting my strings) came up with nothing found. That was with UE regexp (and I tried Unix regexp just to be sure I wasn't missing something).

          What does the "~" character mean? I could not find this anywhere. If you would not mind would you elaborate in English what the expression does? Thanks!

          6,686585
          Grand MasterGrand Master
          6,686585

            Jan 31, 2005#5

            "~" is described in the help at regular expression UltraEdit style.

            [~|]+ means: Find one or more occurrences of any character (including \r\n) except |.

            The problem with string B[~|]+string A is, that UltraEdit sometimes have problems to identify where to stop [~|]+. It should stop at first occurence of "string A", but this does not work always.
            Best regards from an UC/UE/UES for Windows user from Austria

            6
            NewbieNewbie
            6

              Jan 31, 2005#6

              Ah, you're right it IS in the UE help. It didn't turn up on my search for "~". This time, I went down the list of UltraEdit syntax regular expressions & eyeballed them one by one. Yes it was there.

              I see many uses for this expression if only I could get it to work! (In my case it doesn't have a problem where to stop, it has a problem finding anything at all.)

              It must be something I'm doing wrong... which is impossible for you to see, of course :). I'll keep at it, something will come.

              --

              By the way, maybe this is dense of me but why is the "except |" needed? As I understand it, the goal is to find String B and String A connected by one or more occurrences of any characters including line breaks. That in itself would accomplish the purpose. I don't quite see what the "except |" is for.

              46
              Basic UserBasic User
              46

                Feb 01, 2005#7

                It is because the * (which means every char) doesn't match with
                line break. So an expression like this:
                "StringA*++StringB"
                only find a match on the same line.

                Regards,
                Alain
                Never forget: "Above the clouds, The sky is blue and the sun shine"

                6
                NewbieNewbie
                6

                  Feb 03, 2005#8

                  Very interesting.

                  So if I understand correctly, you're saying that the "except" expression is a workaround for the pain of specifying all the different possible characters that might be included in the range.... & that it's much easier to turn it on its head and say, "Any character that isn't expressly excluded is included!" That way, the line breaks and other various characters are automatically included without having to specify them.

                  Is that the reason for it? Just guessing but it seems to make sense.

                    Feb 03, 2005#9

                    Also I am thinking that it solves a particular dilemma around line breaks which you seemed to be getting at.... that is, of having to specify line breaks when the number of them is unknown. For some reason, this can't be expressed directly in a regexp, or at least not in this implementation of it. Do I understand that correctly?

                    6,686585
                    Grand MasterGrand Master
                    6,686585

                      Feb 03, 2005#10

                      Yes, you understand it correctly. [~c]+ with c as character surely not exist in the text is the expression to select a whole block without knowing the number of line breaks. But as I already mentioned, sometimes UltraEdit does not stop selecting at the correct position defined by the string after [~c]+.
                      Best regards from an UC/UE/UES for Windows user from Austria

                      6
                      NewbieNewbie
                      6

                        Feb 09, 2005#11

                        Thank you so much for the help.