How to split 500 MB file (250,000 invoices) into manageable chunks?

How to split 500 MB file (250,000 invoices) into manageable chunks?

3
NewbieNewbie
3

    Dec 07, 2018#1

    I use currently UltraEdit Professional Text/HEX Editor version 18.20.0.1021.

    I have a large invoice file I need to split & would appreciate some pointers and ideas please.

    Each invoice follows this structure with the first one/two characters of each line acting as a LINE TYPE. 
    All invoices start with a LINE_TYPE of H1.
    The H lines hold the invoice header information (document_number, name & address).
    The P & L lines (of which there can be any number) hold product detail.

    Code: Select all

    H1TEXTTEXTEXTEXT...............Up to 500 characters - terminated by CRLF
    H2TEXTTEXTEXTEXTETC
    H3TEXTTEXTEXTEXTETC
    H4TEXTTEXTEXTEXTETC
    H5TEXTTEXTEXTEXTETC
    H6TEXTTEXTEXTEXTETC
    H7TEXTTEXTEXTEXTETC
    H8TEXTTEXTEXTEXTETC
    H9TEXTTEXTEXTEXTETC
    P TEXTTEXTEXTEXTETC
    L TEXTTEXTEXTEXTETC
    I would like to split the source in to files that would contain 'x' number of invoices (e.g. 25000). 

    Regards

    6,687587
    Grand MasterGrand Master
    6,687587

      Dec 07, 2018#2

      This can be done with a script. But I have some questions before I start coding it.
      1. Are there more H1 lines after P and L lines or are all H1 lines at top of the file?
        I think, there are more H1 lines after P and L lines, but this is not really clear for me from the example and its description.
      2. Should the P and L lines between H1 lines and below last H1 line of a block also stored in created files?
      3. What should be the file name of every created file?
        For example the created files could be saved in same directory as active file on starting script and with name of active file with _x appended after file name before file extension with x being fixed a two or a three digit number in range 01 to 99 or 001 to 999.
      Best regards from an UC/UE/UES for Windows user from Austria

      3
      NewbieNewbie
      3

        Dec 07, 2018#3

        Firstly, thank you very much for taking the time to reply and offer assistance.

        I'll do my best to answer your questions.

        Question 1: You are right in that all the lines of each invoice will be grouped together followed by the next invoice and so on (H1, H2....,L,L,P,P,H1, H2....,L,P,P,H1...)

        Question 2: Yes, all lines should be output to the created files in the same order. You will also see in the attachment that there are several additional LINE_TYPES, not just L & P, but really we're only interested in the H1 LINE_TYPE as it will always signal the start of a new invoice. It is possible than an invoice may only consist of a header (lines H1-H9) and that no product lines (L, P, etc.) exist. In this case the invoice should still be counted & any lines found output to the created file.

        Question 3: Something along the lines of DMS_HISTORY_SRC_x.dat would be nice with x being the incremented number. The inclusion of preceding zeros would not be necessary. Whatever is easiest to code will be perfectly fine.

        Thank you again.

        Regards

        Data example with a small number of dummy invoices:

        Code: Select all

        H1Aaaaaaaaaaaa   C9999     999999         20170418ABCDE TNDOL       UAB
        H2W9999          Raine Wannikid                                                   1122334455667N               1
        H3MMM EEEEEEE EEEE-EEEEEEEE               EE EEEEEE EEEE                          EEEEEEEE
        H4SSSSSSS                                 HHHHHHHHHHHHH                           HHH HHH                                    9999999
        H5ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ      W            W           W                W        WW        W
        H6    W    W    W    WWWWWWWWW                  WW        WW                                                                 9999999
        H7LLLLLLL   20000101        20000101VF7NC5FS0AY562024FHCB097063 1598     16754pC  1CB7    200001012000010120000101200001012000010120000101   8888888888888   8888888888888   88888
        H8JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ             8.00     88.00      0.00     88.00      0.00      0.00NOT FOUND NOT FOUND
        H9                                                            Y
        LWHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH                     0040     88.00          E W10000030     88.00     88.00  1
        PW      8.00       1000000000000            GGGGGGGGGGGG                                      E 100W            8.00   4
        CMSOME TEXT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
        D  1     123456  1
        VA                      SOME MORE TEXT                            88888.00  1
        VB                      SOME MORE TEXT                            88888.00  1
        H1Aaaaaaaaaaaa   C9999     999999         20170418ABCDE TNDOL       UAB
        H2W9999          Raine Wannikid                                                   1122334455667N               1
        H3MMM EEEEEEE EEEE-EEEEEEEE               EE EEEEEE EEEE                          EEEEEEEE
        H4SSSSSSS                                 HHHHHHHHHHHHH                           HHH HHH                                    9999999
        H5ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ      W            W           W                W        WW        W
        H6    W    W    W    WWWWWWWWW                  WW        WW                                                                 9999999
        H7LLLLLLL   20000101        20000101VF7NC5FS0AY562024FHCB097063 1598     16754pC  1CB7    200001012000010120000101200001012000010120000101   8888888888888   8888888888888   88888
        H8JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ             8.00     88.00      0.00     88.00      0.00      0.00NOT FOUND NOT FOUND
        H9                                                            Y
        LWHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH                     0040     88.00          E W10000030     88.00     88.00  1
        PW      8.00       1000000000000            GGGGGGGGGGGG                                      E 100W            8.00   4
        CMSOME TEXT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
        D  1     123456  1
        VA                      SOME MORE TEXT                            88888.00  1
        VB                      SOME MORE TEXT                            88888.00  1
        H1Aaaaaaaaaaaa   C9999     999999         20170418ABCDE TNDOL       UAB
        H2W9999          Raine Wannikid                                                   1122334455667N               1
        H3MMM EEEEEEE EEEE-EEEEEEEE               EE EEEEEE EEEE                          EEEEEEEE
        H4SSSSSSS                                 HHHHHHHHHHHHH                           HHH HHH                                    9999999
        H5ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ      W            W           W                W        WW        W
        H6    W    W    W    WWWWWWWWW                  WW        WW                                                                 9999999
        H7LLLLLLL   20000101        20000101VF7NC5FS0AY562024FHCB097063 1598     16754pC  1CB7    200001012000010120000101200001012000010120000101   8888888888888   8888888888888   88888
        H8JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ             8.00     88.00      0.00     88.00      0.00      0.00NOT FOUND NOT FOUND
        H9                                                            Y
        LWHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH                     0040     88.00          E W10000030     88.00     88.00  1
        PW      8.00       1000000000000            GGGGGGGGGGGG                                      E 100W            8.00   4
        CMSOME TEXT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
        D  1     123456  1
        VA                      SOME MORE TEXT                            88888.00  1
        VB                      SOME MORE TEXT                            88888.00  1
        H1Aaaaaaaaaaaa   C9999     999999         20170418ABCDE TNDOL       UAB
        H2W9999          Raine Wannikid                                                   1122334455667N               1
        H3MMM EEEEEEE EEEE-EEEEEEEE               EE EEEEEE EEEE                          EEEEEEEE
        H4SSSSSSS                                 HHHHHHHHHHHHH                           HHH HHH                                    9999999
        H5ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ      W            W           W                W        WW        W
        H6    W    W    W    WWWWWWWWW                  WW        WW                                                                 9999999
        H7LLLLLLL   20000101        20000101VF7NC5FS0AY562024FHCB097063 1598     16754pC  1CB7    200001012000010120000101200001012000010120000101   8888888888888   8888888888888   88888
        H8JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ             8.00     88.00      0.00     88.00      0.00      0.00NOT FOUND NOT FOUND
        H9                                                            Y
        LWHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH                     0040     88.00          E W10000030     88.00     88.00  1
        PW      8.00       1000000000000            GGGGGGGGGGGG                                      E 100W            8.00   4
        CMSOME TEXT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
        D  1     123456  1
        VA                      SOME MORE TEXT                            88888.00  1
        VB                      SOME MORE TEXT                            88888.00  1
        H1Aaaaaaaaaaaa   C9999     999999         20170418ABCDE TNDOL       UAB
        H2W9999          Raine Wannikid                                                   1122334455667N               1
        H3MMM EEEEEEE EEEE-EEEEEEEE               EE EEEEEE EEEE                          EEEEEEEE
        H4SSSSSSS                                 HHHHHHHHHHHHH                           HHH HHH                                    9999999
        H5ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ      W            W           W                W        WW        W
        H6    W    W    W    WWWWWWWWW                  WW        WW                                                                 9999999
        H7LLLLLLL   20000101        20000101VF7NC5FS0AY562024FHCB097063 1598     16754pC  1CB7    200001012000010120000101200001012000010120000101   8888888888888   8888888888888   88888
        H8JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ             8.00     88.00      0.00     88.00      0.00      0.00NOT FOUND NOT FOUND
        H9                                                            Y
        LWHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH                     0040     88.00          E W10000030     88.00     88.00  1
        PW      8.00       1000000000000            GGGGGGGGGGGG                                      E 100W            8.00   4
        CMSOME TEXT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
        D  1     123456  1
        VA                      SOME MORE TEXT                            88888.00  1
        VB                      SOME MORE TEXT                            88888.00  1
        H1Aaaaaaaaaaaa   C9999     999999         20170418ABCDE TNDOL       UAB
        H2W9999          Raine Wannikid                                                   1122334455667N               1
        H3MMM EEEEEEE EEEE-EEEEEEEE               EE EEEEEE EEEE                          EEEEEEEE
        H4SSSSSSS                                 HHHHHHHHHHHHH                           HHH HHH                                    9999999
        H5ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ      W            W           W                W        WW        W
        H6    W    W    W    WWWWWWWWW                  WW        WW                                                                 9999999
        H7LLLLLLL   20000101        20000101VF7NC5FS0AY562024FHCB097063 1598     16754pC  1CB7    200001012000010120000101200001012000010120000101   8888888888888   8888888888888   88888
        H8JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ             8.00     88.00      0.00     88.00      0.00      0.00NOT FOUND NOT FOUND
        H9                                                            Y
        LWHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH                     0040     88.00          E W10000030     88.00     88.00  1
        PW      8.00       1000000000000            GGGGGGGGGGGG                                      E 100W            8.00   4
        CMSOME TEXT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
        D  1     123456  1
        VA                      SOME MORE TEXT                            88888.00  1
        VB                      SOME MORE TEXT                            88888.00  1
        H1Aaaaaaaaaaaa   C9999     999999         20170418ABCDE TNDOL       UAB
        H2W9999          Raine Wannikid                                                   1122334455667N               1
        H3MMM EEEEEEE EEEE-EEEEEEEE               EE EEEEEE EEEE                          EEEEEEEE
        H4SSSSSSS                                 HHHHHHHHHHHHH                           HHH HHH                                    9999999
        H5ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ      W            W           W                W        WW        W
        H6    W    W    W    WWWWWWWWW                  WW        WW                                                                 9999999
        H7LLLLLLL   20000101        20000101VF7NC5FS0AY562024FHCB097063 1598     16754pC  1CB7    200001012000010120000101200001012000010120000101   8888888888888   8888888888888   88888
        H8JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ             8.00     88.00      0.00     88.00      0.00      0.00NOT FOUND NOT FOUND
        H9                                                            Y
        LWHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH                     0040     88.00          E W10000030     88.00     88.00  1
        PW      8.00       1000000000000            GGGGGGGGGGGG                                      E 100W            8.00   4
        CMSOME TEXT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
        D  1     123456  1
        VA                      SOME MORE TEXT                            88888.00  1
        VB                      SOME MORE TEXT                            88888.00  1
        H1Aaaaaaaaaaaa   C9999     999999         20170418ABCDE TNDOL       UAB
        H2W9999          Raine Wannikid                                                   1122334455667N               1
        H3MMM EEEEEEE EEEE-EEEEEEEE               EE EEEEEE EEEE                          EEEEEEEE
        H4SSSSSSS                                 HHHHHHHHHHHHH                           HHH HHH                                    9999999
        H5ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ      W            W           W                W        WW        W
        H6    W    W    W    WWWWWWWWW                  WW        WW                                                                 9999999
        H7LLLLLLL   20000101        20000101VF7NC5FS0AY562024FHCB097063 1598     16754pC  1CB7    200001012000010120000101200001012000010120000101   8888888888888   8888888888888   88888
        H8JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ             8.00     88.00      0.00     88.00      0.00      0.00NOT FOUND NOT FOUND
        H9                                                            Y
        LWHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH                     0040     88.00          E W10000030     88.00     88.00  1
        PW      8.00       1000000000000            GGGGGGGGGGGG                                      E 100W            8.00   4
        CMSOME TEXT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
        D  1     123456  1
        VA                      SOME MORE TEXT                            88888.00  1
        VB                      SOME MORE TEXT                            88888.00  1
        H1Aaaaaaaaaaaa   C9999     999999         20170418ABCDE TNDOL       UAB
        H2W9999          Raine Wannikid                                                   1122334455667N               1
        H3MMM EEEEEEE EEEE-EEEEEEEE               EE EEEEEE EEEE                          EEEEEEEE
        H4SSSSSSS                                 HHHHHHHHHHHHH                           HHH HHH                                    9999999
        H5ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ      W            W           W                W        WW        W
        H6    W    W    W    WWWWWWWWW                  WW        WW                                                                 9999999
        H7LLLLLLL   20000101        20000101VF7NC5FS0AY562024FHCB097063 1598     16754pC  1CB7    200001012000010120000101200001012000010120000101   8888888888888   8888888888888   88888
        H8JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ             8.00     88.00      0.00     88.00      0.00      0.00NOT FOUND NOT FOUND
        H9                                                            Y
        LWHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH                     0040     88.00          E W10000030     88.00     88.00  1
        PW      8.00       1000000000000            GGGGGGGGGGGG                                      E 100W            8.00   4
        CMSOME TEXT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
        D  1     123456  1
        VA                      SOME MORE TEXT                            88888.00  1
        VB                      SOME MORE TEXT                            88888.00  1
        H1Aaaaaaaaaaaa   C9999     999999         20170418ABCDE TNDOL       UAB
        H2W9999          Raine Wannikid                                                   1122334455667N               1
        H3MMM EEEEEEE EEEE-EEEEEEEE               EE EEEEEE EEEE                          EEEEEEEE
        H4SSSSSSS                                 HHHHHHHHHHHHH                           HHH HHH                                    9999999
        H5ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ      W            W           W                W        WW        W
        H6    W    W    W    WWWWWWWWW                  WW        WW                                                                 9999999
        H7LLLLLLL   20000101        20000101VF7NC5FS0AY562024FHCB097063 1598     16754pC  1CB7    200001012000010120000101200001012000010120000101   8888888888888   8888888888888   88888
        H8JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ             8.00     88.00      0.00     88.00      0.00      0.00NOT FOUND NOT FOUND
        H9                                                            Y
        LWHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH                     0040     88.00          E W10000030     88.00     88.00  1
        PW      8.00       1000000000000            GGGGGGGGGGGG                                      E 100W            8.00   4
        CMSOME TEXT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
        D  1     123456  1
        VA                      SOME MORE TEXT                            88888.00  1
        VB                      SOME MORE TEXT                            88888.00  1
        H1Aaaaaaaaaaaa   C9999     999999         20170418ABCDE TNDOL       UAB
        H2W9999          Raine Wannikid                                                   1122334455667N               1
        H3MMM EEEEEEE EEEE-EEEEEEEE               EE EEEEEE EEEE                          EEEEEEEE
        H4SSSSSSS                                 HHHHHHHHHHHHH                           HHH HHH                                    9999999
        H5ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ      W            W           W                W        WW        W
        H6    W    W    W    WWWWWWWWW                  WW        WW                                                                 9999999
        H7LLLLLLL   20000101        20000101VF7NC5FS0AY562024FHCB097063 1598     16754pC  1CB7    200001012000010120000101200001012000010120000101   8888888888888   8888888888888   88888
        H8JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ             8.00     88.00      0.00     88.00      0.00      0.00NOT FOUND NOT FOUND
        H9                                                            Y
        LWHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH                     0040     88.00          E W10000030     88.00     88.00  1
        PW      8.00       1000000000000            GGGGGGGGGGGG                                      E 100W            8.00   4
        CMSOME TEXT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
        D  1     123456  1
        VA                      SOME MORE TEXT                            88888.00  1
        VB                      SOME MORE TEXT                            88888.00  1
        H1Aaaaaaaaaaaa   C9999     999999         20170418ABCDE TNDOL       UAB
        H2W9999          Raine Wannikid                                                   1122334455667N               1
        H3MMM EEEEEEE EEEE-EEEEEEEE               EE EEEEEE EEEE                          EEEEEEEE
        H4SSSSSSS                                 HHHHHHHHHHHHH                           HHH HHH                                    9999999
        H5ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ      W            W           W                W        WW        W
        H6    W    W    W    WWWWWWWWW                  WW        WW                                                                 9999999
        H7LLLLLLL   20000101        20000101VF7NC5FS0AY562024FHCB097063 1598     16754pC  1CB7    200001012000010120000101200001012000010120000101   8888888888888   8888888888888   88888
        H8JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ             8.00     88.00      0.00     88.00      0.00      0.00NOT FOUND NOT FOUND
        H9                                                            Y
        LWHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH                     0040     88.00          E W10000030     88.00     88.00  1
        PW      8.00       1000000000000            GGGGGGGGGGGG                                      E 100W            8.00   4
        CMSOME TEXT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
        D  1     123456  1
        VA                      SOME MORE TEXT                            88888.00  1
        VB                      SOME MORE TEXT                            88888.00  1
        H1Aaaaaaaaaaaa   C9999     999999         20170418ABCDE TNDOL       UAB
        H2W9999          Raine Wannikid                                                   1122334455667N               1
        H3MMM EEEEEEE EEEE-EEEEEEEE               EE EEEEEE EEEE                          EEEEEEEE
        H4SSSSSSS                                 HHHHHHHHHHHHH                           HHH HHH                                    9999999
        H5ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ      W            W           W                W        WW        W
        H6    W    W    W    WWWWWWWWW                  WW        WW                                                                 9999999
        H7LLLLLLL   20000101        20000101VF7NC5FS0AY562024FHCB097063 1598     16754pC  1CB7    200001012000010120000101200001012000010120000101   8888888888888   8888888888888   88888
        H8JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ             8.00     88.00      0.00     88.00      0.00      0.00NOT FOUND NOT FOUND
        H9                                                            Y
        LWHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH                     0040     88.00          E W10000030     88.00     88.00  1
        PW      8.00       1000000000000            GGGGGGGGGGGG                                      E 100W            8.00   4
        CMSOME TEXT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
        D  1     123456  1
        VA                      SOME MORE TEXT                            88888.00  1
        VB                      SOME MORE TEXT                            88888.00  1
        H1Aaaaaaaaaaaa   C9999     999999         20170418ABCDE TNDOL       UAB
        H2W9999          Raine Wannikid                                                   1122334455667N               1
        H3MMM EEEEEEE EEEE-EEEEEEEE               EE EEEEEE EEEE                          EEEEEEEE
        H4SSSSSSS                                 HHHHHHHHHHHHH                           HHH HHH                                    9999999
        H5ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ      W            W           W                W        WW        W
        H6    W    W    W    WWWWWWWWW                  WW        WW                                                                 9999999
        H7LLLLLLL   20000101        20000101VF7NC5FS0AY562024FHCB097063 1598     16754pC  1CB7    200001012000010120000101200001012000010120000101   8888888888888   8888888888888   88888
        H8JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ             8.00     88.00      0.00     88.00      0.00      0.00NOT FOUND NOT FOUND
        H9                                                            Y
        LWHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH                     0040     88.00          E W10000030     88.00     88.00  1
        PW      8.00       1000000000000            GGGGGGGGGGGG                                      E 100W            8.00   4
        CMSOME TEXT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
        D  1     123456  1
        VA                      SOME MORE TEXT                            88888.00  1
        VB                      SOME MORE TEXT                            88888.00  1
        H1Aaaaaaaaaaaa   C9999     999999         20170418ABCDE TNDOL       UAB
        H2W9999          Raine Wannikid                                                   1122334455667N               1
        H3MMM EEEEEEE EEEE-EEEEEEEE               EE EEEEEE EEEE                          EEEEEEEE
        H4SSSSSSS                                 HHHHHHHHHHHHH                           HHH HHH                                    9999999
        H5ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ      W            W           W                W        WW        W
        H6    W    W    W    WWWWWWWWW                  WW        WW                                                                 9999999
        H7LLLLLLL   20000101        20000101VF7NC5FS0AY562024FHCB097063 1598     16754pC  1CB7    200001012000010120000101200001012000010120000101   8888888888888   8888888888888   88888
        H8JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ             8.00     88.00      0.00     88.00      0.00      0.00NOT FOUND NOT FOUND
        H9                                                            Y
        H1Aaaaaaaaaaaa   C9999     999999         20170418ABCDE TNDOL       UAB
        H2W9999          Raine Wannikid                                                   1122334455667N               1
        H3MMM EEEEEEE EEEE-EEEEEEEE               EE EEEEEE EEEE                          EEEEEEEE
        H4SSSSSSS                                 HHHHHHHHHHHHH                           HHH HHH                                    9999999
        H5ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ      W            W           W                W        WW        W
        H6    W    W    W    WWWWWWWWW                  WW        WW                                                                 9999999
        H7LLLLLLL   20000101        20000101VF7NC5FS0AY562024FHCB097063 1598     16754pC  1CB7    200001012000010120000101200001012000010120000101   8888888888888   8888888888888   88888
        H8JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ             8.00     88.00      0.00     88.00      0.00      0.00NOT FOUND NOT FOUND
        H9                                                            Y
        LWHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH                     0040     88.00          E W10000030     88.00     88.00  1
        PW      8.00       1000000000000            GGGGGGGGGGGG                                      E 100W            8.00   4
        CMSOME TEXT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
        D  1     123456  1
        VA                      SOME MORE TEXT                            88888.00  1
        VB                      SOME MORE TEXT                            88888.00  1

        6,687587
        Grand MasterGrand Master
        6,687587

          Dec 08, 2018#4

          Here is the UltraEdit script for this task:

          Code: Select all

          if (UltraEdit.document.length)   // Is any file opened?
          {
             var nInvoicesPerFile = 25000; // Number of invoices (H1 lines) per file.
             var sLeadingZeros = "00";     // Leading zeros for file number in file names.
          
             // Set working environment required for this job.
             UltraEdit.ueReOn();
             UltraEdit.insertMode();
             UltraEdit.columnModeOff();
          
             // Move cursor to top of active file and run the initial search.
             UltraEdit.activeDocument.top();
             var nDataFile = UltraEdit.activeDocumentIdx;
             UltraEdit.document[nDataFile].findReplace.mode=0;
             UltraEdit.document[nDataFile].findReplace.matchCase=true;
             UltraEdit.document[nDataFile].findReplace.matchWord=false;
             UltraEdit.document[nDataFile].findReplace.regExp=true;
             UltraEdit.document[nDataFile].findReplace.searchDown=true;
             UltraEdit.document[nDataFile].findReplace.searchInColumn=false;
          
             // Do nothing if there is no line starting with H1 in this file.
             if (UltraEdit.document[nDataFile].findReplace.find("%H1"))
             {
                // This file is probably the correct file for this script.
          
                // Get file name of current file with path, but without extension to
                // save the new files in the same directory as the current file with
                // same name, but with an underscore and file number appended before
                // the file extension. That is a quick and dirty solution working
                // only for files not opened via FTP from within UltraEdit/UEStudio
                // on Windows using backslash as directory separator.
                var sFilePathName = "DMS_HISTORY_SRC_";
                var sFileExtension = ".dat";
                var nLastBackSlash = UltraEdit.activeDocument.path.lastIndexOf('\\');
                if (nLastBackSlash >= 0)
                {
                   nLastBackSlash++;
                   sFilePathName = UltraEdit.activeDocument.path.substr(nLastBackSlash);
                   var nLastDot = sFilePathName.lastIndexOf('.');
                   if (nLastDot > 0) // File name with file extension.
                   {
                      sFileExtension = sFilePathName.substr(nLastDot);
                      sFilePathName = UltraEdit.activeDocument.path.substring(0,nLastBackSlash+nLastDot);
                   }
                   else              // File name without file extension.
                   {
                      sFilePathName = UltraEdit.activeDocument.path;
                   }
                   sFilePathName += '_';
                }
          
                // The script uses clipboard 9 to copy the blocks to the new files.
                // So the clipboard of operating system can be used for example by
                // other applications while the script is doing the file splitting.
                UltraEdit.selectClipboard(9);
          
                var nFileNumber = 0;       // Counts the number of saved files.
                var nInvoicesCount = 1;    // Counts the number of H1 lines in block.
          
                if (!UltraEdit.outputWindow.visible)
                {
                   UltraEdit.outputWindow.showWindow(true);
                }
                UltraEdit.outputWindow.write("Splitting up the file ...");
          
                // Split the file after every nInvoicesPerFile found lines starting
                // with H1 until no more line with H1 is found in large input file.
                while (1)
                {
                   // Open a new file which on being displayed maximized avoids the
                   // necessity to update the document window of large input file
                   // after every find to finish faster splitting the file.
                   UltraEdit.newFile();
          
                   // Get the line number of current line being first line of block.
                   var nLineStart = UltraEdit.document[nDataFile].currentLineNum;
          
                   // Search for lines starting with H1 and count them.
                   while (UltraEdit.document[nDataFile].findReplace.find("%H1"))
                   {
                      nInvoicesCount++;
                      if (nInvoicesCount > nInvoicesPerFile) break;
                   }
          
                   // Increment the file number by one, convert the file number to
                   // a string using decimal system and insert leading zeros if the
                   // length of the file number string is shorter than length of
                   // string containing just zero or more 0. Then concatenate the
                   // file number string with file path (except on unnamed file),
                   // file name and extension to a full file name.
                   nFileNumber++;
                   var sFileNumber = nFileNumber.toString(10);
                   if (sFileNumber.length < sLeadingZeros.length)
                   {
                      sFileNumber = sLeadingZeros.substr(sFileNumber.length) + sFileNumber;
                   }
                   var sFullFileName = sFilePathName + sFileNumber + sFileExtension;
          
                   // Is one more than maximum number of H1 lines found in
                   // this iteration of nearly endless running while loop?
                   if (nInvoicesCount > nInvoicesPerFile)
                   {
                      // Move caret to beginning of current H1 line, get line number
                      // of this line line, select everything upwards to first line
                      // of block and copy selected block into user clipboard 9.
                      var nLineEnd = UltraEdit.document[nDataFile].currentLineNum;
                      UltraEdit.document[nDataFile].gotoLine(nLineEnd,1);
                      UltraEdit.document[nDataFile].gotoLineSelect(nLineStart,1);
                      UltraEdit.document[nDataFile].copy();
          
                      // Paste the copied block into active new file, save and close it.
                      UltraEdit.activeDocument.paste();
                      UltraEdit.saveAs(sFullFileName);
                      UltraEdit.closeFile(UltraEdit.activeDocument.path,2);
          
                      // Write a small information into output window to give the
                      // user waiting for finishing of script execution a feedback.
                      UltraEdit.outputWindow.write("Saved file " + sFullFileName);
          
                      // Move caret to second column of first line of next block.
                      UltraEdit.document[nDataFile].gotoLine(nLineEnd,2);
          
                      // Prepare invoices counter for next block.
                      nInvoicesCount = 1;
                   }
                   else  // This is the last block to copy to a new file.
                   {
                      // Move caret first line of last block.
                      UltraEdit.document[nDataFile].gotoLine(nLineStart,1);
                      // Select to end of file, copy the block, cancel the selection,
                      // paste the block into active new file, make sure the last
                      // line ends also with a line termination, save and close it.
                      UltraEdit.document[nDataFile].selectToBottom();
                      UltraEdit.document[nDataFile].copy();
                      UltraEdit.document[nDataFile].bottom();
                      UltraEdit.activeDocument.paste();
                      if (UltraEdit.activeDocument.isColNumGt(1))
                      {
                         UltraEdit.activeDocument.insertLine();
                         if (UltraEdit.activeDocument.isColNumGt(1))
                         {
                            UltraEdit.activeDocument.deleteToStartOfLine();
                         }
                      }
                      UltraEdit.saveAs(sFullFileName);
                      UltraEdit.closeFile(UltraEdit.activeDocument.path,2);
                      UltraEdit.document[nDataFile].setActive();
          
                      // Write a small information into output window and exit loop.
                      UltraEdit.outputWindow.write("Saved file " + sFullFileName);
                      break;
                   }
                }
          
                // Free memory and switch back to clipboard of operating system.
                UltraEdit.clearClipboard();
                UltraEdit.selectClipboard(0);
             }
          }
          
          Some additional information:
          • It is required for this script that configuration setting Disable line numbers is not enabled at Advanced - Configuration - Editor Display - Miscellaneous on having followed the recommendations in power tip large file text editor. The script requires that UltraEdit counts the lines which unfortunately results also in updating the line number in status bar during script execution.
          • I tried to write the script for avoiding as much window updates as possible. But UltraEdit is designed as GUI text editor being used by users. For that reason UltraEdit updates the line number information on status bar at bottom of UltraEdit main window while running the script. This makes the script quite slow on doing this file splitting task. It would be much faster to split primary on number of lines instead of number of H1 lines, i.e. select a block from current line downwards to next H1 line after a specified number of lines like 350,000. That would reduce the number of executed finds dramatically and UltraEdit could finish most likely this task in a few seconds.
          • I tested the script with UltraEdit for Windows v18.20.0.1028 first on your small example data with variable nInvoicesPerFile set to 3. Then I selected all, copied and pasted the data several times until the example file had 4,988,880 lines and a file size of 520,975,520 bytes (496.84 MiB). I executed the script with nInvoicesPerFile set to 25000 and it needed 41 minutes to create 14 files, 7 files with 365,628 lines and 38,181,520 bytes, 6 files with 365,622 lines and 38,180,980 bytes and 1 file with 235,752 lines and 24,619,000 bytes. It is most likely better to use PowerShell or Python or any other script interpreter for this task on not using a different method for splitting the file like the one I have suggested in previous point.
          Best regards from an UC/UE/UES for Windows user from Austria

          3
          NewbieNewbie
          3

            Dec 08, 2018#5

            This is great! Thank you! 
            41 minutes is a small price to pay. 
            I look forward to trying it out.
            Many Kind Regards!