Splitting Big Files

Splitting Big Files

2
NewbieNewbie
2

    Mar 22, 2005#1

    Is there an easy way to split files in UltraEdit?

    I have read the posts below that show how to use macros to select and copy every 7 lines and create new files but I wondered if there was another way?

    I have a csv file of 100MB or so with over 1 million lines and I would like to break it up small enough to load it into Excel in chunks.

    Any advice gratefully accepted.

    thanks,
    Craig

    6,610550
    Grand MasterGrand Master
    6,610550

      Mar 23, 2005#2

      First you should make a copy of your big csv file. Open the copy with UltraEdit without temp file. Run following macro as often as needed. It saves always the first 65535 lines to a new file and you have always to enter the filename for the new file.

      InsertMode
      ColumnModeOff
      HexOff
      UnixReOff
      GotoLine 65536
      SelectToTop
      Cut
      NewFile
      Paste
      SaveAs ""
      CloseFile

      If the byte count of each line is constant, you could also use the split file feature of Total Commander from http://www.ghisler.com/
      Best regards from an UC/UE/UES for Windows user from Austria

      2
      NewbieNewbie
      2

        Mar 23, 2005#3

        Mofi,

        thanks for your reply. I was thinking it might be a bit inefficient to run a macro to select and paste such a big range but I was surprised how well the macro ran.

        Eventually I'll stop being so lazy and write something to handle it for myself in java but in the meantime - thank you for taking the time to reply. I appreciate your help.

        regards,
        Craig

        6
        NewbieNewbie
        6

          Mar 29, 2007#4

          OK, Mofi, I have searched the existing threads but still don't see the full answer. I have the split part up to taking the 1st 25k lines, copy to new file and save, but it leaves the "P" mark for the cut 25K lines. I wanted to delete these lines completely so that I could then select the first 25K lines again, copy to a new file, save, and so on. Then, some way to stop the process when there are no more rows with data. My file is currently about 175K lines.

          My original post:

          I am trying to write a macro that does the following:

          1) Selects the 1st 25,000 lines of the file (required by the target system)
          2) Opens a new file (must be blank) and pastes the 25000 lines in
          3) Save and close the new file
          4) Takes the next 25,000 lines in the original file and repeats the process to a seconde new file name
          5) Ends when the bottom of the original file is reached

          I managed to figure out how to cut/paste to the new file but am unable to do the following:

          a) How do I delete the special end of line character in the original file? The idea being I cut/paste, the next 25K lines move up to the top and then I run some type of loop until reaching a completely blank row.

          b) What is the best way to loop this process including new file names?

          c) Do I need to clear the newly created files each time or is there a way to replace what is already in the file?

          Any suggestions would be greatly appreciated.

          6,610550
          Grand MasterGrand Master
          6,610550

            Mar 30, 2007#5

            I used the second macro from Splitting text file and adapted it hopefully correct for your need.

            For auto saving the new files with an increasing number you must FIRST create my universal CountUp macro. The source code with description can be found at How to insert an incrementing number in a file using a counter in a macro?

            THEN create the following macro. You have to adapt the red highlighted filename with path. If you think you will not produce more than 999 files, you can modify all 0000 to 000 or less (a single 0 should be also enough for you).

            Usage of the blue highlighted column number 1 depends on your version of UE. The column number is required since UE v12.20 and must be removed for previous versions.

            Make sure you have only your source file open or it is the most right one in the file tab order because of possible problems with setting Move to nearest left tab after current tab is closed.

            InsertMode
            ColumnModeOff
            HexOff
            Bottom
            IfColNum 1
            Else
            "
            "
            EndIf
            Top
            "0000"
            SelectToTop
            Clipboard 8
            Cut
            Clipboard 9
            Loop
            GotoLineSelect 25001 1
            IfSel
            Cut
            EndSelect
            NewFile
            Paste
            Top
            "C:\Temp\Test_0000.tmp
            "
            Key UP ARROW
            Find "0000"
            Clipboard 8
            PlayMacro 1 "CountUp"
            Key HOME
            StartSelect
            Key END
            Clipboard 9
            Copy
            EndSelect
            DeleteLine
            SaveAs "^c"
            CloseFile
            Else
            ExitLoop
            EndIf
            EndLoop
            CloseFile NoSave
            Clipboard 9
            ClearClipboard
            Clipboard 8
            ClearClipboard
            Clipboard 0
            Best regards from an UC/UE/UES for Windows user from Austria

            6
            NewbieNewbie
            6

              Apr 02, 2007#6

              Thanks Mofi! This works great except for it never ends. My file is about 175,000 rows to it should end after 8 files are created. It keeps going. Should your "If's" already be taking care of it?

              Mofi, also, the exact row count for now is 171,019 so I would expect 6 full files (25000 rows each, which happens) and then one partial file with the last 21,019 rows in the 7th file. Instead, the 7th file and all the files created afterwards until I cancel the macro only shows 88 rows and then a partial 89th row. Any ideas?

              I use version 13. I am on a temp copy for now.

              6,610550
              Grand MasterGrand Master
              6,610550

                Apr 03, 2007#7

                Well, it worked perfect on my computer. I have tested it with UE v13.00+4 with a small file and a smaller line number and the last file contained less lines. GotoLineSelect with a too high line number should select everthing to end of the file and so cuts the last part of the file into the last file. After this has be done the source file is completely empty and GotoLineSelect cannot select anything anymore which should make the Else branch of IfSel active which means ExitLoop.

                I have created now also a file with 171,019 lines and the macro is really not working with UE v13.00+4. It works for the first 6 files for line 1 - 150,000, but the remaining 21,019 lines are not correct saved into the last file. This is definitely a bug of UE.

                It looks like there is a synchronization problem with last GotoLineSelect 25000 1 when there are no more 25,000 lines. The macro continues in this situation before the cursor is moved in selection mode to bottom of the file. So only a few lines (on my computer a few hundred - thousand) are only selected before the macro continues resulting in producing more files than expected (10 files too much on my computer).

                I could not find any workaround for this synchronization problem. I will sent a bug report email to IDM support with my test file and the test macro.

                The only solution I have for you is to run the loop only a specified number of times - 6 for your source file - and then save the remaining part of the file as last file. I know, this is not really good, because the loop number must be edited in the macro to the correct number (line count / 25000) before macro execution. But currently I have no better idea how to handle this UE bug and I have tried a lot.

                InsertMode
                ColumnModeOff
                HexOff
                Bottom
                IfColNum 1
                Else
                "
                "
                EndIf
                Top
                "0000"
                SelectToTop
                Clipboard 8
                Cut
                Clipboard 9
                Loop 6
                GotoLineSelect 25001 1
                IfSel
                Cut
                EndSelect
                NewFile
                Paste
                Top
                "C:\Temp\Test_0000.tmp
                "
                Key UP ARROW
                Find "0000"
                Clipboard 8
                PlayMacro 1 "CountUp"
                Key HOME
                StartSelect
                Key END
                Clipboard 9
                Copy
                EndSelect
                DeleteLine
                SaveAs "^c"
                CloseFile
                Else
                ExitLoop
                EndIf
                EndLoop
                IfEof
                CloseFile NoSave
                Else
                "
                C:\Temp\Test_0000.tmp
                "
                Key UP ARROW
                Find "
                0000"
                Clipboard 8
                PlayMacro 1 "CountUp"
                Key HOME
                StartSelect
                Key END
                Clipboard 9
                Copy
                EndSelect
                DeleteLine
                SaveAs "^c"
                CloseFile
                EndIf

                Clipboard 9
                ClearClipboard
                Clipboard 8
                ClearClipboard
                Clipboard 0
                Best regards from an UC/UE/UES for Windows user from Austria

                6
                NewbieNewbie
                6

                  Apr 03, 2007#8

                  Thanks. Nice to know I am not completely crazy. Now which version were you using before when it worked? The client I am at is using version 12 so it might actually work for them. I will try it. If not, I will use the limited macro you sent. Thanks again for all your help.

                  Ended up going with the modified Loop6 macro as they are moving to Version 13 too.

                  Only issue is I saved and emailed the macro file to the person who will maintain. When we opened the 171K source file on his PC and ran my 1st macro, half the file mod's were off 1 column. When I went to change the column # in the macro, running the macro no longer worked. The first step of that macro replaces the " marks with nothing. It appears to not be recognizing any of the " marks. If open the same file on my UltraEdit, the macro is fine. Is there some setting on his UE that is different? I have to hand over this whole process to him to maintain. We are both on Version 13. Thanks for your help.

                  6,610550
                  Grand MasterGrand Master
                  6,610550

                    Apr 04, 2007#9

                    I used UE v13.00+4 as I wrote the first version which worked. But it worked only because I have used a very small source file and GotoLineSelect 6 1 and not a large file with several MBs and line number 25001 for this command. I have not expected that there is a difference, but we and IDM now know that there is a difference because of a synchronization problem. IDM support could reproduce this and forwarded it to the developers.

                    About the macro loading/editing problem look at Selecting a block (range) to the end of file in macro.

                    In the mean time I have found a workaround. It is MUCH slower, but it really works independent of the number of lines in the source file. In my first posted version of the macro instead of GotoLineSelect 25001 1 a second submacro with for example name "Down 25k lines" must be called with PlayMacro 1 "Down 25k lines". The submacro must be created before editing the main macro and must contain following commands:

                    Loop 25000
                    Key DOWN ARROW
                    IfEof
                    ExitLoop
                    EndIf
                    EndLoop
                    SelectToTop


                    Key PGDN would be faster but how many lines a page has depends on current window size and so it is not really good practice to use it here.

                    Edit: The problem with macro continuation before cursor reaches the bottom of the file when the line number is much greater than the number of lines was fixed with UE v13.10 and UES v6.30.
                    Best regards from an UC/UE/UES for Windows user from Austria

                    1

                      Apr 15, 2010#10

                      Hello,


                      I have very large (3GB) .CSV pipe | delimited files containing millions of records. I would like to split this into chunks based on the records in the first row. The first line contains the headers and I would like to keep this on top of each split file. The files should be split based on the records in the first row containing a unique name. So in the end I would like to end up with a split for each name that occurs in the first row and have each split start with the same header as the source file. The split files should be have the filename according to same the name in the 1st row used to split the file.
                      The data looks something like this:

                      Vendor|Period|Inventory Number|Owner User Name|Owner EMP ID
                      ATTU1|JAN10|123657898446551|NAME1|NLMMCD1
                      ATTU1|JAN10|123657898446552|NAME2|NLMMCD2
                      ATTU1|JAN10|123657898446553|NAME3|NLMMCD3
                      ATTU2|JAN10|123657898446554|NAME4|NLMMCD4
                      ATTU2|JAN10|123657898446555|NAME5|NLMMCD5
                      ATTU3|JAN10|123657898446556|NAME6|NLMMCD6
                      etc.

                      I would like to split using the names in the Vendor column and this example should give me three files named ATTU1 to ATTU3 and each file should contain the same header.
                      Is this possible using UE8.0? Is there a macro available to do this?

                      6,610550
                      Grand MasterGrand Master
                      6,610550

                        Apr 29, 2010#11

                        You have done this perhaps already manually or using a different tool, but it should be possible. The reason why I did not reply earlier with a possible solution is that I do not have anymore UE v8.00. So the macros below which worked on your example are tested with UE v11.20b and I just can hope that they work also for extremly old version 8.00.

                        Both macros must have property Continue if a Find with Replace not found checked and property Show Cancel Dialog for this macro unchecked.

                        The macro you must create first is named FindVendorLines. Don't change the name. The macro with this case sensitive name is called by the second macro which must be created next. The code for this macro is:

                        Loop
                        Clipboard 8
                        Find RegExp "%^c|*^p"
                        IfFound
                        Clipboard 9
                        CopyAppend
                        Else
                        ExitLoop
                        EndIf
                        EndLoop

                        The second macro can have any name you want. The code for this macro is:

                        InsertMode
                        ColumnModeOff
                        HexOff
                        UnixReOff
                        Bottom
                        IfColNum 1
                        Else
                        "
                        "
                        EndIf
                        Top
                        SelectLine
                        Clipboard 7
                        Copy
                        EndSelect
                        Key HOME
                        Loop
                        IfEof
                        ExitLoop
                        EndIf
                        SelectWord
                        Clipboard 8
                        Copy
                        SelectLine
                        Clipboard 9
                        Copy
                        PlayMacro 1 "FindVendorLines"
                        EndSelect
                        Key HOME
                        Key DOWN ARROW
                        NewFile
                        Clipboard 7
                        Paste
                        Clipboard 9
                        Paste
                        Top
                        Clipboard 8
                        Paste
                        ".csv"
                        SelectToTop
                        Cut
                        SaveAs "^c"
                        CloseFile
                        EndLoop
                        ClearClipboard
                        Clipboard 9
                        ClearClipboard
                        Clipboard 7
                        ClearClipboard
                        Clipboard 0

                        This second macro must be executed on the huge CSV file. It is important that no vendor string contains any UltraEdit regular expression character, see the list of UltraEdit regular expression characters in the help of UltraEdit. Further it is important that no vendor string contains a character not allowed in a file name because the vendor string is used for saving the files.
                        Best regards from an UC/UE/UES for Windows user from Austria

                        2

                          Oct 25, 2010#12

                          Hi, Mofi.
                          I have a good idea for you second macro.
                          The key point is about dynamic name for "Save As...".
                          When you split a big file into some smaller files, I think you can write a macro let the file's name change automatically.
                          I am a beginner, I don't know how to implement this funtionality, but I think you can try.

                          6,610550
                          Grand MasterGrand Master
                          6,610550

                            Oct 26, 2010#13

                            I have already supplied this functionality. The explanation for the macro in my second post is mainly explaining how saving the new files with an auto-increasing number in the file name is done and most of the code of the macro in my second post is just for this functionality. However, if you want a stand alone macro for saving a file with an auto-increasing number, here it is.

                            Macro FileNameNumber

                            Code: Select all

                            InsertMode
                            ColumnModeOff
                            Top
                            Clipboard 8
                            Paste
                            "|"
                            Find RegExp Up "[0-9]"
                            EndSelect
                            Key LEFT ARROW
                            OverStrikeMode
                            Loop 0
                            IfCharIs "0"
                            "1"
                            ExitLoop
                            EndIf
                            IfCharIs "1"
                            "2"
                            ExitLoop
                            EndIf
                            IfCharIs "2"
                            "3"
                            ExitLoop
                            EndIf
                            IfCharIs "3"
                            "4"
                            ExitLoop
                            EndIf
                            IfCharIs "4"
                            "5"
                            ExitLoop
                            EndIf
                            IfCharIs "5"
                            "6"
                            ExitLoop
                            EndIf
                            IfCharIs "6"
                            "7"
                            ExitLoop
                            EndIf
                            IfCharIs "7"
                            "8"
                            ExitLoop
                            EndIf
                            IfCharIs "8"
                            "9"
                            ExitLoop
                            EndIf
                            IfCharIs "9"
                            "0"
                            Key LEFT ARROW
                            IfColNum 1
                            InsertMode
                            "1"
                            ExitLoop
                            EndIf
                            Key LEFT ARROW
                            IfCharIs "0123456789"
                            Else
                            Key RIGHT ARROW
                            InsertMode
                            "1"
                            ExitLoop
                            EndIf
                            EndIf
                            EndLoop
                            InsertMode
                            Top
                            StartSelect
                            Find Select "|"
                            Key LEFT ARROW
                            Cut
                            EndSelect
                            Delete
                            This macro consists mainly of the code from macro CountUp. Just the code at top is slightly changed and the code at bottom is simplified for the purpose of this macro.

                            The final file name is stored in user clipboard 8 after playing this macro with the command

                            PlayMacro 1 "FileNameNumber"

                            and therefore just the command SaveAs "^c" must be used to save the current file with the file name with the auto-increasing number inside.

                            Please note that first the macro FileNameNumber must be created before any other macro stored in the same macro file playing this macro can be created.

                            Further take into account that clipboard 8 is used as string variable buffer for the file name. So make sure to use another clipboard in the main macro playing macro FileNameNumber to increase the number in the file name. In other words after saving a new file with command SaveAs "^c" the next command before using any clipboard again in code execution sequence should be Clipboard x with x is 0 to 9 except 8.

                            Last the macro can't be used as is without an initialization of the file name in clipboard 8. It is necessary that the main macro contains code to copy a valid file name with or without path into user clipboard 8. I suggest to initialize clipboard 8 always with a file name with full path because a new file saved with a file name without path is saved in the current working directory of UE/UES which could be also the program directory of UE/UES which is often write-protected and therefore saving the files fail. Here is a code example for initializing clipboard 8 with a file name.

                            Top
                            "C:\Temp\Temp_00.txt"
                            SelectToTop
                            Clipboard 8
                            Cut
                            Clipboard 0


                            Of course the file name string could be also manually copied into clipboard 8 before running any macro.

                            Important is that the file name contains a number. If this number starts with 0 or for example with 2395 does not matter. The number of leadings zeros also does not matter. But it is advisable to use the right number of zeros in the initial file name string according to the expected number of files to not get files with the numbers 1, 2, 3, ..., 8, 9, 10, etc. but get instead files with 01, 02, 03, ..., 08, 09, 10, etc.

                            As an example for usage of macro FileNameNumber let us assume that a macro is needed to create 10 files with a file name entered by the user and the content for the 10 files should be the current content of the Windows clipboard.

                            Code: Select all

                            InsertMode
                            ColumnModeOff
                            HexOff
                            UnixReOff
                            NewFile
                            GetString "Please insert the file name with path."
                            Find Up "."
                            IfNotFound
                            "_00.txt"
                            Else
                            EndSelect
                            Key LEFT ARROW
                            "_00"
                            Key END
                            EndIf
                            SelectToTop
                            Clipboard 8
                            Cut
                            CloseFile NoSave
                            Loop 10
                            Clipboard 0
                            NewFile
                            Paste
                            PlayMacro 1 "FileNameNumber"
                            SaveAs "^c"
                            CloseFile NoSave
                            EndLoop
                            ClearClipboard
                            Clipboard 0
                            Best regards from an UC/UE/UES for Windows user from Austria

                            2

                              Oct 29, 2010#14

                              Exellent!
                              Thank you Mofi!
                              I think the code segment - "Save As '^c'" is miracle.
                              The string "^c" maybe include more meanings.
                              Actually, I don't know what's that meaning.

                              By the way, if there is a huge file, maybe bigger than 100MB.
                              There are two types of information in it: one is ERROR, another is INFOR.
                              I don't know whether macro can do this thing or not:
                              I hope the macro can pick up the ERROR information to another file(i.e:error.txt) and at the same time pick up the INFOR information to a different file(i.e:infor.txt).
                              That mean macro should have concurrent processing ability.

                              6,610550
                              Grand MasterGrand Master
                              6,610550

                                Oct 29, 2010#15

                                On UltraEdit help page Edit Macro command the command SaveAs is explained and there you can read what ^s and ^c mean. ^s is replaced during execution with currently selected text in the active file and ^c is replaced by content of the active clipboard.

                                If you want to copy all lines containing ERROR into a new file and you need to do this only once, you better don't use a macro, do it manually.
                                • Go to top of file with pressing Ctrl+Home.
                                • Press Ctrl+F to open the Find dialog and enter ERROR as search string. Uncheck all other standard settings.
                                • Press button Advanced if the advanced options are not already visible.
                                • Enable the option List Lines Containing String.
                                • Execute the find with pressing button Next.
                                • A dialog opens showing all lines containing the word ERROR, press button Clipboard and close the dialog.
                                • Press Ctrl+N to open a new file and Ctrl+V to paste the copied lines into the new file. That's it.
                                I have posted a macro which does the same as above, see Search string and copy all found lines to clipboard.

                                This macro adapted to your needs is below. Red highlighted are small modifications and gray formatted the line not needed because of using an UltraEdit regular expression to find entire, DOS terminated lines containing the word ERROR. These modifications would not be really necessary, but make the macro faster.

                                InsertMode
                                ColumnModeOff
                                HexOff
                                UnixReOff
                                Bottom
                                IfColNum 1
                                Else
                                "
                                "
                                EndIf
                                Top
                                Clipboard 9
                                ClearClipboard
                                Loop
                                Find MatchCase RegExp "%*ERROR*^p"
                                IfFound
                                SelectLine
                                CopyAppend
                                Else
                                ExitLoop
                                EndIf
                                EndLoop
                                NewFile
                                Paste
                                ClearClipboard
                                Clipboard 0
                                Best regards from an UC/UE/UES for Windows user from Austria

                                Read more posts (2 remaining)