User to user discussion and support for UltraEdit, UEStudio, UltraCompare, and other IDM applications.

Help with writing and playing macros
18 posts Page 1 of 2
Hello everyone!

My text file is getting larger and larger every day. So I need to split it up into smaller files. :oops:

Text file with size 10 MB: (There is no line break between the people.)

Code: Select all
1LINE_jimmy_home                  http://jimmy.blogspot.com/
1LINE_jimmy_company               https://jimmy.company.blogspot.com/
1LINE_jimmy_twi                   https://twtter.jimmy.com/
2LINE_Sam_home                    http://sam.home.com/index.php
2LINE_Sam_blog                    http://sam.blog.com
3LINE_Jane_work                   http://jane.company.net/floor2
3LINE_Jane_twitter                http://twitter.jane.com/
...
...
...
99999LINE_jenna_home              http://twitter.jenna.com/
99999LINE_jenna_work              http://workjenna.com/

=========================================================

Each text file separated like:

1LINE.txt
Code: Select all
1LINE_jimmy_home                  http://jimmy.blogspot.com/
1LINE_jimmy_company               https://jimmy.company.blogspot.com/
1LINE_jimmy_twi                   https://twtter.jimmy.com/

2LINE.txt
Code: Select all
2LINE_Sam_home                    http://sam.home.com/index.php
2LINE_Sam_blog                    http://sam.blog.com

3LINE.txt
Code: Select all
3LINE_Jane_work                   http://jane.company.net/floor2
3LINE_Jane_twitter                http://twitter.jane.com/

...
...

99999LINE.txt
Code: Select all
99999LINE_jenna_home              http://twitter.jenna.com/
99999LINE_jenna_work              http://workjenna.com/

I'm a total newbie. I don't know how to do this.
Please help me.
Do the code blocks display the real content of the large file?

What is the separator between string to identify lines of a block which should be used as filename and the lines to copy into the smaller file? Is the separator the first underscore found on each line?
It is not really clear for me how the content of the file really looks.

Does the string at beginning of a line never contain a character which is not allowed for a filename like ?*: ?

Could the string at beginning of a line contain characters with a special meaning in a Perl regular expression like .+$ ?

Which version of UltraEdit do you use?

Is it possible to use an UltraEdit script instead of an UltraEdit macro?

In scripts variables can be used making this splitting task easier to code.

Best would be you copy the first 10-30 lines of your large file into a new text file with name input.txt. Next create for those lines the smaller files manually. And then pack input.txt and the smaller files with ZIP or RAR and attach this archive file to your next post with the answers on my questions. Those example files would make it very clear what the macro or script should do and we could test the macro/script as well after coding macro/script.
Best regards from Austria
Big Thanks Mofi! And I'm so sorry for my poor explanation.
I am not even sure that I could give you more details from this post.
Version : UltraEdit v12

"10001LINE" this means like personal ID. Each person has a unique number including "LINE" after the number.

It doesn't matter how to make it. I just wish I could split one big file up into smaller files.
If possible, please help me with easy & simple way because I am a horrible newbie. :(

As you said I attached .zip file (deleted by Mofi later).

Thanks again, have a good time, Mofi
Here is a quickly recorded and next edited macro.

Code: Select all
InsertMode
ColumnModeOff
HexOff
PerlReOn
Bottom
IfColNumGt 1
InsertLine
EndIf
Top
Clipboard 9
Loop
Find MatchCase RegExp "^(\d+LINE_).*\r?\n(?:\1.*\r?\n)*"
IfNotFound
ExitLoop
EndIf
Copy
NewFile
Paste
Top
Find MatchCase RegExp "^(\d+LINE)"
SaveAs "^s.txt"
CloseFile NoSave
IfEof
ExitLoop
EndIf
EndLoop
ClearClipboard
Clipboard 0
Top
UnixReOff

This macro worked with UE v21.30.0.1016 on your input file and produced the output files as provided by you with the difference that last line in all files has also a line termination. Let me know if it does not work in UE v12.xx to find probably a different solution.

Edit:

The macro works as posted with UE v12.20b+1. It fails with UE v12.00a+1 as in this version back referencing in search string is not supported by the Perl regexp engine in v12.00a+1. The regular expression search string is valid in UE v12.10. But the macro does not work with this version because of several bugs with Perl regexp engine introduced with UE v12.00 which were fixed later in v12.20. In other words if this macro works depends on exact version of UltraEdit v12.

It would be possible to code this different using UltraEdit engine and bookmarks which would be slower, but would work also with older versions of UltraEdit. However, you should think about an upgrade to latest version of UltraEdit.
Best regards from Austria
OMG big thanks Mofi!
It works like a charm!
I really appreciate your help. I couldn't have done it without you!
But I was kind of shocked because your code was shorter and more simple than I expected. Of course, which means you are great.

BTW: Mofi, one more question.
Could I get the .txt files with URLs only? I mean, not including person's name, id etc.
Only the part that starts with URLs like "http...(https....)"
Well, it is of course no problem to remove everything left to http or ftp on all copied lines before saving the new file.

Code: Select all
InsertMode
ColumnModeOff
HexOff
Bottom
IfColNumGt 1
InsertLine
EndIf
Top
TrimTrailingSpaces
PerlReOn
Find MatchCase RegExp "(\r?\n|\r)\1+"
Replace All "\1"
Clipboard 9
Loop
PerlReOn
Find MatchCase RegExp "^(\d+LINE_).*\r?\n(?:\1.*\r?\n)*"
IfNotFound
ExitLoop
EndIf
Copy
NewFile
Paste
Top
SelectLine
Copy
EndSelect
Top
Find MatchCase RegExp "^.*?(http|ftp)"
Replace All "\1"
Paste
SelectToTop
UnixReOff
Find MatchCase RegExp "[~^r^n0-9A-Za-z]+"
Replace All SelectText "_"
EndSelect
Top
Find MatchCase RegExp "_^{http^}^{ftp^}*$"
Replace ""
SelectToTop
Copy
EndSelect
DeleteLine
SaveAs "^c.txt"
CloseFile NoSave
IfEof
ExitLoop
EndIf
EndLoop
ClearClipboard
Clipboard 0
Top
UnixReOff

Explanation of Perl regexp search string: ^(\d+LINE_).*\r?\n(?:\1.*\r?\n)*

^ ... begin each search at beginning of a line.

(...) ... is a marking group. Whatever is found in the expression inside the parentheses is stored for each find/replace and can be back referenced within search or replace string with \1 as done in this expression later.

.* ... any character except newline characters (carriage return and line-feed) zero or more times.

\r? ... a carriage return which can, but must not exist (for UNIX files).

\n ... a line-feed which must exist.

(?:...) ... a non marking group. The string found by the expression inside is not temporarily stored for back referencing. It just creates a group for other purposes as done here.

\1 ... back references the string found first at beginning of a line. So the next line(s) must start with the same string as first found line.

.*\r?\n ... once more the rest of the line with line termination.

* ... is here at end of search string a multiplier for the entire expression in the non marking group. This expression can be applied greedy zero or more times. So it matches all lines below first found line starting with the same string as the first found line. Greedy means here as much lines as possible.

One more note: A Perl regular expression cannot match an unlimited number of characters. So the result could be wrong for a smaller file if in large file thousands of lines start with same string.
Best regards from Austria
Thanks! It also works fine.
But Mofi, sorry about the my third degree, when I had to do this job for specific person's ID, I used to find the person I need, cut them line by line manually, create .txt files and paste them one by one. :oops:

Could be there any shortcut to do this job for the specific person's ID using your code?

I mean, like I enter person's id number or his/her name manually on some nag screen. => done this job for the specific person's ID automatically. (If possible, works with more than one person. If not possible, at least with one person.)
Okay, here is one more macro for individual lines based on name of a person or identification number. As in UltraEdit macros it is not possible to enter a string stored in a variable like in UltraEdit scripts, it is necessary to write the entered string into a new line at top of large file which is finally deleted.

Code: Select all
InsertMode
ColumnModeOff
HexOff
Bottom
IfColNumGt 1
InsertLine
EndIf
Top
TrimTrailingSpaces
PerlReOn
Find MatchCase RegExp "(\r?\n|\r)\1+"
Replace All "\1"
",
"
Top
Key END
GetString "Enter persons or ids separated by commas:"
Clipboard 9
UnixReOff
Loop
Find MatchCase Up ","
Delete
IfNotFound
ExitLoop
EndIf
Find MatchCase RegExp "?+$"
Cut
Find MatchCase "^c"
IfFound
Key HOME
PerlReOn
Find MatchCase RegExp "^(\d+LINE_).*\r?\n(?:\1.*\r?\n)*"
Copy
NewFile
Paste
Top
SelectLine
Copy
EndSelect
Top
Find MatchCase RegExp "^.*?(http|ftp)"
Replace All "\1"
Paste
SelectToTop
UnixReOff
Find MatchCase RegExp "[~^r^n0-9A-Za-z]+"
Replace All SelectText "_"
EndSelect
Top
Find MatchCase RegExp "_^{http^}^{ftp^}*$"
Replace ""
SelectToTop
Copy
EndSelect
DeleteLine
SaveAs "^c.txt"
CloseFile NoSave
EndSelect
Top
Key END
EndIf
EndLoop
ClearClipboard
Clipboard 0
Best regards from Austria
Mofi! Your code is awesome.
It works just like I would expect it to!
So far, I have tested it with 1 person to 7 persons, and working great.

I can't thank you enough.
Have a good time, Mofi
Hi, Mofi!

It's me again. I have another question. :o

About your second code: Could I have .txt file names as their whole profile part like ID number, name, etc. (everything left), but not including the URLs part (everything right). I mean whole left part would be their .txt file names.

And of course if a person has more than one line, the first line of profile part would be their .txt file name.

For example, as you can see above, jimmy has 3 lines. So it would be like this "1LINE_jimmy_home.txt"

Note: There could be also other characters than word characters (letters, digits, underscores) left to URL like commas and spaces.
Okay, I updated macro 2 - split entire file into smaller - and macro 3 - get individual blocks into new files - to fulfill new requirement for file name. Each string consisting of 1 or more characters not being a letter or a digit like underscore, comma, space, question mark, ... are replaced by a single underscore to get a file name consisting only of letters, digits and underscores between the words.

Recoding the macros was a hard work because of some bugs in UE v12.20b+1 all fixed years ago in later versions of UltraEdit. The task coding the macros would have been much easier for currently latest UE v21.30.0.1016. The third macro could be also optimized in size by me by some small improvements at beginning.
Best regards from Austria
Thank you is not enough to express my appreciation for all your hard work.
But, Mofi, first of all, I am So Sorry. Please forgive me. :oops:

I made a horrible typo. My version number is "v21" (the latest) not "v12". OMG could you forgive me? :oops: :oops: :oops:

I was testing your latest modified macros 2, 3 over and over again. And it didn't work at all.

So I read your instructions slowly and repeatedly. And I found out that the color of version number you wrote. Red color!

OMG

Could you make them work properly for the latest v21.
But I'd like the URL part to be saved as it is, I mean just urls without any modification, like underscores. Is it possible?
This is what I expect:

Before:
Code: Select all
1LINE_jimmy_home 3A-NHATTANVILLE, NY 100272 st.32   http://jimmy.blogspot.com/
1LINE_jimmy_company                                                https://jimmy.company.blogspot.com/
1LINE_jimmy_twi                                                        https://twtter.jimmy.com/

After:
1LINE_jimmy_home 3A-NHATTANVILLE, NY 100272 st.32.txt
Code: Select all
http://jimmy.blogspot.com/
https://jimmy.company.blogspot.com/
https://twtter.jimmy.com/

Those 3 URLs are the contents of .txt file.

Before:
Code: Select all
2LINE_Sam_pay monthly         http://sam.blog.com/pay.xls
2LINE_Sam_picture                http://sam.home.com/holidays.zip

After:
2LINE_Sam_pay monthly.txt
Code: Select all
http://sam.blog.com/pay.xls
http://sam.home.com/holidays.zip

Before:
Code: Select all
3LINE_Jane_work II              http://jane.company.net/floor2

After:
3LINE_Jane_work II.txt
Code: Select all
http://jane.company.net/floor2
smallville wrote: My version number is "v21" (the latest) not "v12".

You would have save us both a lot of time on copying version information right from About dialog of UltraEdit into edit field in browser window.

Yes, the version information can be selected with mouse and copied with Ctrl+C or by making a right click to open context menu and a left click on Copy.

smallville wrote: But I'd like the URL part to be saved as it is, I mean just urls without any modification, like underscores.

It was totally unexpected for me that the macros working for UE v12.20b+1 did not produce the same correct output files on using UE v21.30.0.1016. But the contents of the output files were indeed wrong with currently latest version of UltraEdit.

I could quickly find out why. There is a new bug in UE v21.30.0.1016 introduced with UE v19.00 as I found out later. The caret is erroneously moved to top of file after running first Perl regular expression Replace All from beginning of last line of file to end of file to remove the URL just from last line. On a Replace All the position of the caret should never change at all. Therefore the second Perl regular expression Replace All for replacing 1 or more characters not suitable for file name by an underscore was executed on entire file instead of last line only.

Of course I will report this bug to IDM support quickly as it can easily result in many macros and scripts not working correct.

I updated macro 2 and 3 once more with a workaround solution for this issue. The two replaces for preparing the file name are done now with UltraEdit regexp engine where this erroneous caret move does not occur. The macros work now with UE v21.30.0.1016 as well as with UE v12.20b+1 and most likely all other versions of UltraEdit between those 2 versions.

By the way: With UE v21.xx (any UE >= v13.00) it would have been better to code everything with a script as I mentioned already in my first post. The comma separated names or ids entered by user could have been written directly into a string variable with a script for further processing instead of the input file. And the file name could have been also directly prepared in a string variable instead of the output file. And finally with input file being not too large it would have been even possible to do also searching for the blocks in memory instead of input file which would avoid lots of display updates and would be therefore much faster. However, now we have the macros and I don't want to recode them once more as scripts.
Best regards from Austria
Thanks Mofi!

About macro 3, it's working unless I enter something in common that people have with others, like "home".

For example, with marco 3, if I enter "home", it will make .txt file for only the first person who has "home" in his/her profile (left part). In this case, only jimmy has a .txt file. Sam and jenna doesn't have a .txt file. Could I have .txt files for all of them?

And about macro 2, it's not working properly with blank/empty lines between the lines. Only for jimmy, it's working. And for others, it gives the rest of them 2 txt files per person, which have to give 1 txt file per person. (Of course in this case macro 3 won't work either.) Could the macro be adapted to work for such a file content, too?

Code: Select all
1LINE_jimmy_home, sundale         http://jimmy.blogspot.com/
1LINE_jimmy_company               https://jimmy.company.blogspot.com/
1LINE_jimmy_twi                   https://twtter.jimmy.com/
2LINE_Sam_home                    http://sam.home.com/index.php

2LINE_Sam_blog                    http://sam.blog.com
3LINE_Jane_work                   http://jane.company.net/floor2

3LINE_Jane_twitter                http://twitter.jane.com/

99999LINE_jenna_home              http://twitter.jenna.com/

99999LINE_jenna_work              http://workjenna.com/
I updated once more macro 2 and 3 with commands to remove all trailing spaces and delete all blank lines before doing searching, copying to new file and saving each new file. Also code to create file name for each new file was edited once more to produce correct results for all versions of UE since UE v12.20b+1.

Two macros are necessary for new requirement of finding and saving all blocks containing ANYWHERE (not just on left side) one of the strings entered at beginning and saving ALL lines with same ID containing the searched string. Nested loops are not possible in a macro, just in a script, which is the reason why 2 macros are necessary now.

The first macro must be created first with name SaveAllFound:

Code: Select all
Loop
Find MatchCase "^c"
EndSelect
IfNotFound
ExitLoop
EndIf
Key HOME
Find MatchCase RegExp "%[0-9]+LINE_"
Clipboard 9
Copy
EndSelect
Top
Find MatchCase "^c"
EndSelect
Key HOME
PerlReOn
Find MatchCase RegExp "^(\d+LINE_).*\r?\n(?:\1.*\r?\n)*"
Copy
EndSelect
NewFile
Paste
Top
SelectLine
Copy
EndSelect
Top
Find MatchCase RegExp "^.*?(http|ftp)"
Replace All "\1"
Paste
SelectToTop
UnixReOff
Find MatchCase RegExp "[~^r^n0-9A-Za-z]+"
Replace All SelectText "_"
EndSelect
Top
Find MatchCase RegExp "_^{http^}^{ftp^}*$"
Replace ""
SelectToTop
Copy
EndSelect
DeleteLine
SaveAs "^c.txt"
CloseFile NoSave
ClearClipboard
Clipboard 8
UnixReOff
IfEof
ExitLoop
EndIf
EndLoop
ClearClipboard

The second macro can have any name:

Code: Select all
InsertMode
ColumnModeOff
HexOff
Bottom
IfColNumGt 1
InsertLine
EndIf
Top
TrimTrailingSpaces
PerlReOn
Find MatchCase RegExp "(\r?\n|\r)\1+"
Replace All "\1"
",
"
Top
Key END
GetString "Enter persons or ids separated by commas:"
UnixReOff
Loop
Find MatchCase Up ","
Delete
IfNotFound
ExitLoop
EndIf
Find MatchCase RegExp "?+$"
Clipboard 8
Cut
PlayMacro 1 "SaveAllFound"
EndSelect
Top
Key END
EndLoop
Clipboard 0

This second macro must be executed by user to find all blocks containing ANYWHERE one of the entered strings separated by commas.

Note: This was the last time that I edited the macros because of new requirements. You have to do further edits by yourself in future or somebody else here in user-to-user forum helps you on further questions. I don't want to spend more time on doing the coding job for you.
Best regards from Austria
18 posts Page 1 of 2
cron