Search for lines not of a certain length?

Search for lines not of a certain length?

2
NewbieNewbie
2

    Sep 30, 2004#1

    Hoping someone can help me:

    We have a Unix file, 10 million records, record length = 147. Normally, we'll see the Line Feed at position 148 for all records. There are a number of records that are sporadically affected throughout the file where the Line Feed is in another column besides 148. This has affected positioning of fields in subsequent records.

    How do I search using UE to determine the count of records affected?

    How do I identify those records to the client to show them that they've incorrectly set the the Line Feeds?

    Thanks in advance.

    6,606548
    Grand MasterGrand Master
    6,606548

      Oct 01, 2004#2

      Make a copy of the file. Open the copy and execute a regular UltraEdit style regular expression replace with

      Find: %???...???^p
      Replace:

      ???...??? means 147 '?' for 147 single characters except new line.
      Execute Replace All. Redo Replace All until found 0.

      The result is a file, which contains only lines, which have less or more than 147 characters in the line.

      Update:

      With the Perl regular expression introduced with UE v12.00 it is very easy to find lines not of a certain length. For example if all lines should have a length of 76 characters and lines should be found which have less or more than 76 characters, the following Perl regular expression search string can be used:

      ^(?:.{0,75}|.{77,})$

      Explanation:

      ^ ... start search at beginning of a line.

      (?:...) ... non-capturing group needed here for the OR expression.

      .{0,75} ... any character except the new line characters carriage return and line-feed zero to 75 times.

      | ... OR

      .{77,} ... any character except the new line characters carriage return and line-feed 77 or more times.

      $ ... end of line (not matched).
      Best regards from an UC/UE/UES for Windows user from Austria

      2
      NewbieNewbie
      2

        Oct 04, 2004#3

        Thanks for the info.
        I don't want to seem lazy but as a novice UE user what are the specific steps to run when searching for line feeds in positions other than 148 - don't forget, there are some line feeds that appear beyond position 147?
        Thanks again! :oops:

        6,606548
        Grand MasterGrand Master
        6,606548

          Oct 04, 2004#4

          OK, step by step:

          Make a copy of your file with Windows Explorer.

          Open the copy with UltraEdit.

          Press Ctrl+R to open the replace dialog.

          Enter in the field "Find What:"
          • 1 percentage character '%'
          • 147 question mark characters '?' (enter 10, copy and append it 13 times, add 7)
          • 1 '^' followed by 1 'p' for newline character (DOS)
          Clear field "Replace With:".

          Activate "Regular Expressions".

          Check if "Current File" is selected and "Close after replace" is not selected.

          Press button "Replace All" a few times, until you get the message "Search string not found".

          Now you have a file, where only those lines exists, where the linefeed is not in column 148. You now can search for those lines in the huge file. You maybe can automate this with a macro.
          Best regards from an UC/UE/UES for Windows user from Austria

          6
          NewbieNewbie
          6

            Apr 12, 2007#5

            Hello

            We have to make statistical returns to government bodies in huge fixed length text files. Sometimes these files can have structural errors, where some lines are not the correct length. We then edit these lines manually to correct them.

            Problem is, finding these incorrect lines in huge files is difficult! Does anyone know of a way of searching for a line that is either greater or less than a certain length?

            We have UE version 13.

            236
            MasterMaster
            236

              Apr 12, 2007#6

              How about the Perl compatible regular expression

              Code: Select all

              ^(?:.{0,30}|.{50,100})$
              meaning "match a line that either is less than 31 characters or 50-100 characters long"?

              HTH,
              Tim

              6
              NewbieNewbie
              6

                Apr 12, 2007#7

                pietzcker, thanks, but I cannot get your regex suggestion to work. To help you I need to search for lines containing any text that are longer than 723 chars in length

                I need to document this solution for other UE users here and it would be a neater solution than Mofi's

                Regards,
                Jon

                262
                MasterMaster
                262

                  Apr 12, 2007#8

                  To follow up on pietzckers suggestion:

                  From Advanced menu, activate "configuration". Find "Search" and then "Regular expression engine".

                  Choose "Perl compatible regular expressions".

                  Return to your file.

                  Activate find (ctrl+F) - check "regular expression" and now find using:

                  Code: Select all

                  ^.{724,}$
                  Note: The regular expression engine can be selected since UE v14.00 and UES v6.50 directly in the Find/Replace dialog in the advanced section opened by clicking on button Advanced if not already visible.

                  6
                  NewbieNewbie
                  6

                    Apr 12, 2007#9

                    Thank you gentlemen, that works a treat, turning on Perl expressions does help!
                    Regards,
                    Jon