Hello,
I have a problem that I could not figure out and I am really stressing with it.
I have multiple CSV files of data for a lot of stocks in the same format like the following:
I want to split one very large file into multiple files according to the value in the fourth data column as displayed below:
File 1:
File 2:
File 3:
File n: ...
More important facts:
Could you please help me with the macro or script in UltraEdit to do this.
Thank you so much!
I have a problem that I could not figure out and I am really stressing with it.
I have multiple CSV files of data for a lot of stocks in the same format like the following:
Code: Select all
DATE TIME_M EX SYM_ROOT SYM_SUFFIX SIZE PRICE TR_CORR TR_SEQNUM TR_ID TR_SOURCE TR_RF
02/01/2018 7:01:50.176780000 K APA 200 42.4 00 114501 52983525027889 C
02/01/2018 7:01:50.370307000 K APA 100 42.4 00 114601 52983525027890 C
02/01/2018 9:30:01.005430000 N AR 18462 19.24 00 814401 52983644893497 C
02/01/2018 9:30:01.988385000 Y AR 100 19.2 00 839801 52983525027899 C
02/01/2018 8:49:46.103487000 D ARCH 1434 93.16 00 420601 71675222775888 C T
02/01/2018 9:31:05.105643000 N ARCH 1190 93.84 00 1645101 52983812076865 C
02/01/2018 8:49:45.295761007 D ARLP 2777 19.7 00 5490 1 N Q
02/01/2018 8:59:16.906599111 D ARLP 30 19.85 00 6036 1 N N
02/01/2018 9:30:09.062350000 N AT 8823 2.35 00 919201 52983661088591 C
02/01/2018 9:33:26.099352000 D AT 300 2.375 00 3215501 71675240855880 C T
02/01/2018 9:30:00.089795661 Q AXAS 34473 2.52 00 9410 1 N
02/01/2018 9:30:00.089826676 Q AXAS 34473 2.52 00 9412 2 N
02/01/2018 4:31:49.886368000 T BBL 500 40.62 00 71001 62879129944153 C
02/01/2018 6:01:21.022612000 T BBL 200 40.69 00 82401 62879129948234 C
02/01/2018 9:30:00.105336000 N BC 7491 55.45 00 765401 52983594498378 C
02/01/2018 9:30:00.551813000 D BC 82 55.19 00 796801 71675222898246 C T
02/01/2018 9:30:01.052533000 N BCEI 1422 27.83 00 820801 52983594177109 C
02/01/2018 9:30:07.129540000 D BCEI 100 27.55 00 904501 71675223294802 C T
02/01/2018 13:02:48.518108378 D BKEP 100 5.3 00 1188175 51 N Q
02/01/2018 13:02:48.519017865 Q BKEP 2151 5.3 00 1188176 4 N
File 1:
Code: Select all
DATE TIME_M EX SYM_ROOT SYM_SUFFIX SIZE PRICE TR_CORR TR_SEQNUM TR_ID TR_SOURCE TR_RF
02/01/2018 7:01:50.176780000 K APA 200 42.4 00 114501 52983525027889 C
02/01/2018 7:01:50.370307000 K APA 100 42.4 00 114601 52983525027890 C
Code: Select all
DATE TIME_M EX SYM_ROOT SYM_SUFFIX SIZE PRICE TR_CORR TR_SEQNUM TR_ID TR_SOURCE TR_RF
02/01/2018 9:30:01.005430000 N AR 18462 19.24 00 814401 52983644893497 C
02/01/2018 9:30:01.988385000 Y AR 100 19.2 00 839801 52983525027899 C
Code: Select all
DATE TIME_M EX SYM_ROOT SYM_SUFFIX SIZE PRICE TR_CORR TR_SEQNUM TR_ID TR_SOURCE TR_RF
02/01/2018 8:49:46.103487000 D ARCH 1434 93.16 00 420601 71675222775888 C T
02/01/2018 9:31:05.105643000 N ARCH 1190 93.84 00 1645101 52983812076865 C
More important facts:
- I use UltraEdit version 28.20.0.70 on Windows 10 Home.
- I use UltraEdit with the default configuration settings.
- The CSV files are ASCII files with Unix line endings.
- The CSV files use the horizontal tab character as separator between the values.
- The CSV files do not contain values containing a horizontal tab character or multi-line values, i.e. there is no double quoted value in any CSV file.
- The CSV files are all very large with hundreds of MB or even some GB with around 60-70 millions of lines.
- The data rows (lines) in the CSV files are always sorted alphabetically according to the value in for data column.
So all data rows to put into a new file are in a CSV file in one more or less large block which can be millions of lines too.
Could you please help me with the macro or script in UltraEdit to do this.
Thank you so much!