Extract some lines to new file to build new, sorted list with placeholders for missing lines?

Extract some lines to new file to build new, sorted list with placeholders for missing lines?

3
NewbieNewbie
3

    Apr 07, 2011#1

    I'm working with a list, which is about 5 mil lines in this format
    Sample:

    Code: Select all

    ----------
    1.Ticker: AAA
    2.Company name: abcde
    3.Current Price: 12345
    4.EPS: 23456
    5.P/E: 45678
    6.Adjusted P/E: 23124
    7.Book Value: 56789
    8.ROA: 67890
    9.ROE: 78901
    10.RSI: 89.12
    11.MA50: 80.890
    12.MA100: 89.879
    ----------
    1.Ticker: CCC
    2.Company name: adsad bcde corp.
    3.Current Price: 1234563
    4.EPS: 
    5.P/E: 458
    6.Adjusted P/E: 89852
    8.ROA: 
    9.ROE: 896785
    11.MA50: 87764
    12.MA100: 12386
    ----------
    ....so on
    Information about Ticker is listed between the two "----------". Now I want to extract some lines, and copy them to the next document, but remain the format. If there are any missing information, then return N/A.

    Say I need to extract Ticker, Company name, Current Price, Book Value, RSI, then the final result would look like this:
    Result

    Code: Select all

    ----------
    1.Ticker: AAA
    2.Company name: abcde
    3.Current Price: 12345
    7.Book Value: 56789
    10.RSI: 89.12
    ----------
    1.Ticker: CCC
    2.Company name: adsad bcde corp.
    3.Current Price: 1234563
    N/A
    N/A
    ----------
    ....so on
    What is the best solution to deal with this case :cry: ? Thanks in advance :lol:

    6,686585
    Grand MasterGrand Master
    6,686585

      Apr 07, 2011#2

      If you need this only once, run a Perl regular expression Find searching for

      Ticker|Company name|Current Price|Book Value|RSI|----------

      with advanced find option List Lines Containing String enabled before pressing button Next. Then press button Clipboard, close the dialog, open a new file and paste the found and copied lines. Press Ctrl+Home to set caret to top of the file.

      Now you need to add only the lines with N/A. That can be done with a few additional Perl regular expression Replace All commands.

      Search for ^(--.*\r\n)([^1]) and use \11.N/A\r\n\2 as replace string.
      Search for ^(1\..*\r\n)([^2]) and use \12.N/A\r\n\2 as replace string.
      Search for ^(2\..*\r\n)([^3]) and use \13.N/A\r\n\2 as replace string.
      Search for ^(3\..*\r\n)([^7]) and use \17.N/A\r\n\2 as replace string.
      Search for ^(7\..*\r\n)([^1]) and use \1N/A\r\n\2 as replace string.

      And finally search for ^[1237]\.N/A and use N/A as replace string.

      3
      NewbieNewbie
      3

        Apr 08, 2011#3

        Thanks Mofi! But I still have some question.
        If between the two "----------" is about 20 lines, and they are not in the same order, like this:

        Code: Select all

        ----------
        1.Ticker: AAA
        3.Current Price: 12345
        11.MA50: 80.890
        5.P/E: 45678
        6.Adjusted P/E: 23124
        7.Book Value: 56789
        8.ROA: 67890
        16.MACD: 89
        4.EPS: 23456
        2.Company name: abcde
        10.RSI: 89.12
        12.MA100: 89.879
        14.UpBB: 768
        9.ROE: 78901
        15.BoBB: 657
        13.ADX: 23
        ----------
        1.Ticker: CCC
        2.Company name: adsad bcde corp.
        3.Current Price: 1234563
        4.EPS: 3434
        5.P/E: 458
        6.Adjusted P/E: 89852
        8.ROA: 42342
        9.ROE: 896785
        11.MA50: 87764
        12.MA100: 12386
        ----------
        Say I want to extract from line 1 to 10, then Is there any ways to do these two things:
        + Sort these lines based on the number at the start of line, and
        + Return N/A if these is any missing line
        Thanks in advance :lol:

        6,686585
        Grand MasterGrand Master
        6,686585

          Apr 08, 2011#4

          That new requirement now results in a real need of a macro or script. A good written script would do the entire job definitely faster, but writing macros is faster for a macro expert like me. The macro property Continue if search string not found must be checked for this macro. It does the entire job including copying the lines starting with -- or 1. or 2. or 3. ... or 10.

          InsertMode
          ColumnModeOff
          HexOff
          PerlReOn
          Bottom
          IfColNumGt 1
          InsertLine
          EndIf
          Top
          Clipboard 9
          ClearClipboard
          Loop 0
          Find RegExp "^(--|1\.|2\.|3\.|4\.|5\.|6\.|7\.|8\.|9\.|10\.).*\r\n"
          IfNotFound
          ExitLoop
          EndIf
          CopyAppend
          EndLoop
          NewFile
          Paste
          ClearClipboard
          Clipboard 0
          Key UP ARROW
          Key END
          IfColNum 1
          ExitMacro
          EndIf
          Top
          Loop 0
          Find RegExp "^(?:\d+\..*\r\n)+"
          IfNotFound
          ExitLoop
          EndIf
          SortAsc Numeric 1 -1 0 0 0 0 0 0
          Find "----------"
          EndLoop
          Bottom
          Key BACKSPACE
          Top
          Find RegExp "^(--.*\r\n)(?!1\.)"
          Replace All "\11.N/A\r\n"
          Find RegExp "^(1\..*\r\n)(?!2\.)"
          Replace All "\12.N/A\r\n"
          Find RegExp "^(2\..*\r\n)(?!3\.)"
          Replace All "\13.N/A\r\n"
          Find RegExp "^(3\..*\r\n)(?!4\.)"
          Replace All "\14.N/A\r\n"
          Find RegExp "^(4\..*\r\n)(?!5\.)"
          Replace All "\15.N/A\r\n"
          Find RegExp "^(5\..*\r\n)(?!6\.)"
          Replace All "\16.N/A\r\n"
          Find RegExp "^(6\..*\r\n)(?!7\.)"
          Replace All "\17.N/A\r\n"
          Find RegExp "^(7\..*\r\n)(?!8\.)"
          Replace All "\18.N/A\r\n"
          Find RegExp "^(8\..*\r\n)(?!9\.)"
          Replace All "\19.N/A\r\n"
          Find RegExp "^(9\..*\r\n)(?!10\.)"
          Replace All "\110.N/A\r\n"
          Find RegExp "^\d+\.(?=N/A)"
          Replace All ""
          Bottom
          InsertLine
          Top

            Apr 08, 2011#5

            In just modified in above macro the second loop to make the macro a little faster. Instead of using

            Loop 0
            Find "----------^p"
            IfNotFound
            ExitLoop
            EndIf
            Key HOME
            StartSelect
            Find Select "----------"
            IfSel
            Key HOME
            SortAsc Numeric 1 -1 0 0 0 0 0 0
            EndSelect
            Else
            EndSelect
            ExitLoop
            EndIf
            EndLoop


            the second loop is now

            Loop 0
            Find RegExp "^(?:\d+\..*\r\n)+"
            IfNotFound
            ExitLoop
            EndIf
            SortAsc Numeric 1 -1 0 0 0 0 0 0
            Find "----------"
            EndLoop


            Then I thought that collecting all lines of interest in clipboard and copying them to a new file is horrible slow because of all the display updates. Therefore I thought, why not doing it reverse: copy all lines and then delete all lines in new file which are of no interest. That is also possible and avoids all the display updates. Here is the macro working with this method. The changed part is with blue color.

            InsertMode
            ColumnModeOff
            HexOff
            PerlReOn
            Clipboard 9
            SelectAll
            Copy
            NewFile
            Paste
            ClearClipboard
            Clipboard 0
            IfColNumGt 1
            InsertLine
            EndIf
            Top
            Find RegExp "^(?!(--|1\.|2\.|3\.|4\.|5\.|6\.|7\.|8\.|9\.|10\.)).*\r\n"
            Replace All ""
            Key END
            IfColNum 1
            ExitMacro
            EndIf

            Top
            Loop 0
            Find RegExp "^(?:\d+\..*\r\n)+"
            IfNotFound
            ExitLoop
            EndIf
            SortAsc Numeric 1 -1 0 0 0 0 0 0
            Find "----------"
            EndLoop
            Bottom
            Key BACKSPACE
            Top
            Find RegExp "^(--.*\r\n)(?!1\.)"
            Replace All "\11.N/A\r\n"
            Find RegExp "^(1\..*\r\n)(?!2\.)"
            Replace All "\12.N/A\r\n"
            Find RegExp "^(2\..*\r\n)(?!3\.)"
            Replace All "\13.N/A\r\n"
            Find RegExp "^(3\..*\r\n)(?!4\.)"
            Replace All "\14.N/A\r\n"
            Find RegExp "^(4\..*\r\n)(?!5\.)"
            Replace All "\15.N/A\r\n"
            Find RegExp "^(5\..*\r\n)(?!6\.)"
            Replace All "\16.N/A\r\n"
            Find RegExp "^(6\..*\r\n)(?!7\.)"
            Replace All "\17.N/A\r\n"
            Find RegExp "^(7\..*\r\n)(?!8\.)"
            Replace All "\18.N/A\r\n"
            Find RegExp "^(8\..*\r\n)(?!9\.)"
            Replace All "\19.N/A\r\n"
            Find RegExp "^(9\..*\r\n)(?!10\.)"
            Replace All "\110.N/A\r\n"
            Find RegExp "^\d+\.(?=N/A)"
            Replace All ""
            Bottom
            InsertLine
            Top

            3
            NewbieNewbie
            3

              Apr 08, 2011#6

              Mofi, that macro solved all my problems. That's great. Many thank :lol: :lol: :lol: