Search specific pattern in columns with specific header

Search specific pattern in columns with specific header

17
Basic UserBasic User
17

    May 30, 2007#1

    Hi,

    I have many text files which contain columns with headers and data. The position of the header names can be different from file to file so I cannot use fixed column numbers.

    I need a macro that searches for a column header (HEADER1) and then searches for a pattern like ?.??? (all values with 3 decimals).
    All other files which do not contain this pattern should be closed.

    Is that possible?

    6,688586
    Grand MasterGrand Master
    6,688586

      May 31, 2007#2

      Yes, that is possible. I only need 1 or 2 examples of your source files (header line + some content lines), best enclosed in BBCode tags .

      Are your files CSV or fixed column files?

      That the header is not always on the same position is not a problem. The macro can search in first line of the file for "HEADER1" and when found converts a copy of the first line of the file to a regular expression string depending on the format of the file: CSV or fixed column. This regular expression string is then used to find with additional [0-9].[0-9][0-9][0-9] the number in the correct column. If not found, the file is closed and the macro continues on the next file until all files are evaluated.

      Your version of UltraEdit is also important for that macro.
      Best regards from an UC/UE/UES for Windows user from Austria

      17
      Basic UserBasic User
      17

        May 31, 2007#3

        OK, here is a typically text file:

        Code: Select all

        NAME            A11B4
        DESCRIPTION     ABC
        TABELLE
        Description     ABC     CBA     HEADER1         s-out
        A1174-3123      3       1.5     0.004455        3
        A1174-4123      4       10      1.0866          4
        A1174-5123      5       1.5     1.013459        5
        COUNT
        1174-3123       1       0
        Line 1 to 3 is the general header of the file

        Line Description to A1174-5 ist the main area where the data is
        The line description contains the header names like HEADER1,
        below is the data.

        Line COUNT to the end is the footer.

        The columns are separated by tabs. All columns with HEADER1 should be searched for values with 3 decimals (?.???). All open files not containing at least one line should be closed.

        6,688586
        Grand MasterGrand Master
        6,688586

          May 31, 2007#4

          Okay, the following macro hopefully does the job. I hope, there is always only 1 tab between the columns and so that part of the file is like a CSV file with the tab as delimiter.

          Make sure all open files are saved before running the macro. The macro must temporarily modify every file, but does not really change the contents. All files which remain open are indicated as modified although the contents of the still open files are not changed by the macro (except for missing line termination at end of the file).

          The macro property Continue if a Find with Replace not found must be checked for this macro.

          InsertMode
          ColumnModeOff
          HexOff
          UnixReOff
          Clipboard 9
          Top
          "ThIs Is ThE FiRsT FiLe!"
          NextWindow
          Loop
          Bottom
          IfColNum 1
          Else
          "
          "
          EndIf
          Top
          Find MatchCase "HEADER1"
          IfFound
          Key Ctrl+LEFT ARROW
          StartSelect
          Key HOME
          Copy
          EndSelect
          Top
          Paste
          "
          "
          Key UP ARROW
          SelectLine
          Find RegExp "[~^t^p]+^t"
          Replace All SelectText "*^^^^t"
          EndSelect
          Top
          "%"
          Key END
          "[0-9].[0-9][0-9][0-9][^t^r^n]"
          StartSelect
          Key HOME
          Cut
          EndSelect
          DeleteLine
          Find RegExp "^c"
          IfNotFound
          Top
          EndIf
          Else
          Top
          EndIf
          IfSel
          Top
          Find MatchCase "ThIs Is ThE FiRsT FiLe!"
          Replace ""
          IfFound
          Find RegExp "^c"
          ExitLoop
          EndIf
          Find RegExp "^c"
          NextWindow
          Else
          Find MatchCase "ThIs Is ThE FiRsT FiLe!"
          Replace ""
          IfFound
          CloseFile NoSave
          ExitLoop
          Else
          CloseFile NoSave
          EndIf
          EndIf
          EndLoop
          ClearClipboard
          Clipboard 0

          Here is the macro again in UEM format with comments - see Macro examples and reference for beginners and experts how to setup UltraEdit to best view a macro code in this format. I have used 4 spaces instead of every tab (used command Tabs To Spaces) to get a correct HTML output here.

          Code: Select all

          InsertMode
          ColumnModeOff
          HexOff
          UnixReOff
          Clipboard 9
          //  Mark the first file with a special string to know when to exit the loop.
          Top
          "ThIs Is ThE FiRsT FiLe!"
          /*! The first file must be evaluated as last file because it propably does not
              not contain the string of interest. The macro then could not close it to
              avoid an endless loop, although it should be close. So better evaluate
              the first file as last file. !*/
          NextWindow
          Loop
          /*! Insert a line termination at end of the file if last line is not already terminated.
              This is necessary when the column HEADER1 is the last column and so after ?.??? the
              line termination follows. !*/
              Bottom
              IfColNum 1
              Else
                  "
                  "
              EndIf
              Top
          /*! Back at top of the file search for the header. If not found, ignore this file and
              later close it, because it surely does not contain ?.??? in the requested column. !*/
              Find MatchCase "HEADER1"
              IfFound
          /*! Header found! Copy everything from start of the current line
              to beginning of HEADER1 into a new line at top of the file. !*/
                  Key Ctrl+LEFT ARROW
                  StartSelect
                  Key HOME
                  Copy
                  EndSelect
                  Top
                  Paste
                  "
                  "
                  Key UP ARROW
                  SelectLine
          /*! Convert now this part of the header line into an UltraEdit style regular expression
              with the required part to find ?.??? at end of the column or line, if the HEADER1
              column is the last column. A header line like
          
              Description     ABC     CBA     HEADER1     ...
          
              will be converted into
          
              %*^t*^t*^t[0-9].[0-9][0-9][0-9][^t^r^n]
          
          !*/
                  Find RegExp "[~^t^p]+^t"
                  Replace All SelectText "*^^^^t"
                  EndSelect
                  Top
                  "%"
                  Key END
                  "[0-9].[0-9][0-9][0-9][^t^r^n]"
          //  Copy this line into the user clipboard 9 and delete the line.
                  StartSelect
                  Key HOME
                  Cut
                  EndSelect
                  DeleteLine
          //  Search for the regular expression in the clipboard. This works only with UE style.
                  Find RegExp "^c"
          //  This useless looking code is necessary for the second Find/Replace in the Else branch.
                  IfNotFound
                      Top
                  EndIf
          //  This useless looking code is necessary for the second Find/Replace in the Else branch.
              Else
                      Top
              EndIf
              IfSel
          /*! The regular expression has found ?.??? in the correct column. So don't close
              this file, but exit the loop when this file is the first/last file to evaluate.
              But before always position the cursor to the string of interest. !*/
                  Top
                  Find MatchCase "ThIs Is ThE FiRsT FiLe!"
                  Replace ""
                  IfFound
          			Find RegExp "^c"
                      ExitLoop
                  EndIf
          		Find RegExp "^c"
          		NextWindow
              Else
          /*! No HEADER1 or no ?.??? in the column of HEADER1 - close the file. But first check
              if this file is the first/last file to evaluate and exit the loop if this is true. !*/
                  Find MatchCase "ThIs Is ThE FiRsT FiLe!"
                  Replace ""
                  IfFound
                      CloseFile NoSave
                      ExitLoop
                  Else
                      CloseFile NoSave
                  EndIf
              EndIf
          EndLoop
          ClearClipboard
          Clipboard 0
          Best regards from an UC/UE/UES for Windows user from Austria

          17
          Basic UserBasic User
          17

            Jun 04, 2007#5

            Thanks Mofi,
            I tried your macro but it doesn't work for me. It always closes all files, also those files which contain the search string (?.???).

            To make things hopefully easier, the following conditions are given:

            - All open files already contain the required header (HEADER1), so after the data of HEADER1 is a tab (?.???^t)

            - HEADER1 is never the first or last column

            - The columns are not of fixed size

            - The column number of HEADER1 can differ from file to file

            - Column separator is one TAB between data, but can be several tabs between headers (headers can be missing, but not HEADER1)

            262
            MasterMaster
            262

              Jun 04, 2007#6

              Strange. I see no problems with Mofis macro. I reproduced three files from the specs above: 2 that is supposed to be closed and one that stays open because it contain the pattern ?.???. And the macro did exactly what it was supposed to do.

              So maybe the next step is for you to zip 2 files: One with the ?.??? pattern and one without. Upload to a server or service of your choice and post a link to the zip file. (Zip file cannot be uploaded to this forum).

              6,688586
              Grand MasterGrand Master
              6,688586

                Jun 04, 2007#7

                HansFink wrote:- All open files already contain the required header (HEADER1), so after the data of HEADER1 is a tab (?.???^t)
                That is already handled by the macro. 4 lines could be removed from the macro, but for security I would not do that.
                HansFink wrote:- HEADER1 is never the first or last column
                Then you can remove ^r^n and the code part

                Bottom
                IfColNum 1
                Else
                "
                "
                EndIf

                HansFink wrote:- The columns are not of fixed size

                - The column number of HEADER1 can differ from file to file
                That is what the macro is designed for. It uses the tabs to identify the correct column.
                HansFink wrote:- Column separator is one TAB between data, but can be several tabs between headers (headers can be missing, but not HEADER1)
                I think, this is the problem because I thought there is always at least 1 character between the tabs in the header line and no column with an empty column header.

                Insert following below the command SelectLine to handle also empty column headers (not tested):

                Find "^t"
                Replace All SelectText "#^t"
                Top
                SelectLine


                As you can see there is now inserted always 1 character before every tab and then the line is reselected for the following regular expression replace to convert the header line into a regular expression string with now correct *^t for every column even for those with no column header.

                If the macro is still not working, please upload some example files in a zip-archive and post a link to it as suggested by jorrasdk.
                Best regards from an UC/UE/UES for Windows user from Austria

                17
                Basic UserBasic User
                17

                  Jun 04, 2007#8

                  Thanks again, but it still closes all files. Maybe my version of UltraEdit is not compatible (11.20b) or my example file is not good enough.

                  Here are two better file examples.

                  S_1.txt contains search string and should remain open
                  There is no header for column 4 in line 6

                  Code: Select all

                  NAME          S_1.5 2-3x
                  SHORTNAME     S_1
                  TRANS         0
                  TABELLE
                  tName         A             ABC           D_IC
                  Description   A             Abc                         HEADER1       s-out
                  S_100 1010    3             26            0             0.018         4
                  S_125 1010    3.25          28            0             0.03          4
                  S_130 1010    3.3           32            0             0.0186        4
                  CONT
                  S_100 1010    0
                  S_125 1010    1
                  S_130 1010    1
                  
                  R_1.txt does not contain search string and should be closed

                  Code: Select all

                  NAME          T_1.5 2-3x
                  SHORTNAME     T_1
                  TRANS         0
                  TABELLE
                  typeName      A             ABC           XYZ
                  Description   A             Abc           Xyz           HEADER1       s-out
                  T_1-4         4             5.7           17            0.2           4
                  T_1-5         5             5             18.5          0.2           5
                  T_1-1/4IN     6.35          6.35          21.175        0.2           6.35
                  CONT
                  T_1-4         1
                  T_1-5         1
                  T_1-1/4IN     1
                  

                  6,688586
                  Grand MasterGrand Master
                  6,688586

                    Jun 04, 2007#9

                    I have tested the macro with the suggested modifications as you can see now below and it worked perfect. S_1.txt remains open and the line with 0.018 is marked - from start of the line to tab after 0.018. I have tested it with UltraEdit v11.20b too.

                    InsertMode
                    ColumnModeOff
                    HexOff
                    UnixReOff
                    Clipboard 9
                    Top
                    "ThIs Is ThE FiRsT FiLe!"
                    NextWindow
                    Loop
                    Top
                    Find MatchCase "HEADER1"
                    IfFound
                    Key Ctrl+LEFT ARROW
                    StartSelect
                    Key HOME
                    Copy
                    EndSelect
                    Top
                    Paste
                    "
                    "
                    Key UP ARROW
                    SelectLine
                    Find "^t"
                    Replace All SelectText "#^t"
                    Top
                    SelectLine

                    Find RegExp "[~^t^p]+^t"
                    Replace All SelectText "*^^^^t"
                    EndSelect
                    Top
                    "%"
                    Key END
                    "[0-9].[0-9][0-9][0-9]^t"
                    StartSelect
                    Key HOME
                    Cut
                    EndSelect
                    DeleteLine
                    Find RegExp "^c"
                    IfNotFound
                    Top
                    EndIf
                    Else
                    Top
                    EndIf
                    IfSel
                    Top
                    Find MatchCase "ThIs Is ThE FiRsT FiLe!"
                    Replace ""
                    IfFound
                    Find RegExp "^c"
                    ExitLoop
                    EndIf
                    Find RegExp "^c"
                    NextWindow
                    Else
                    Find MatchCase "ThIs Is ThE FiRsT FiLe!"
                    Replace ""
                    IfFound
                    CloseFile NoSave
                    ExitLoop
                    Else
                    CloseFile NoSave
                    EndIf
                    EndIf
                    EndLoop
                    ClearClipboard
                    Clipboard 0

                    The 2 files looks as follows after converting the spaces into tabs with a very simple regular expression. » is a tab, is a DOS line termination and · is a normal space.

                    NAME»S_1.5·2-3x
                    SHORTNAME»S_1
                    TRANS»0
                    TABELLE
                    tName»A»ABC»D_IC
                    Description»A»Abc»»HEADER1»s-out
                    S_100·1010»3»26»0»0.018»4
                    S_125·1010»3.25»28»0»0.03»4
                    S_130·1010»3.3»32»0»0.0186»4
                    CONT
                    S_100·1010»0
                    S_125·1010»1
                    S_130·1010»1

                    NAME»T_1.5·2-3x
                    SHORTNAME»T_1
                    TRANS»0
                    TABELLE
                    typeName»A»ABC»XYZ
                    Description»A»Abc»Xyz»HEADER1»s-out
                    T_1-4»4»5.7»17»0.2»4
                    T_1-5»5»5»18.5»0.2»5
                    T_1-1/4IN»6.35»6.35»21.175»0.2»6.35
                    CONT
                    T_1-4»1
                    T_1-5»1
                    T_1-1/4IN»1
                    Best regards from an UC/UE/UES for Windows user from Austria

                    17
                    Basic UserBasic User
                    17

                      Jun 04, 2007#10

                      Now it works, thanks. That saves a lot of time.