How to split a single file into multiple text files based on first string on each line?

How to split a single file into multiple text files based on first string on each line?

7
NewbieNewbie
7

    Jul 06, 2018#1

    Since I visit and search UltraEdit forum SO often, I need to bookmark them in the text file.
    I prefer txt file because it's very simple and easy to search and manage. My UE version is 25.10.0.32.

    So here is my question.

    Before:
    Bookmark(unique number)_TITLE    URL

    After:
    Bookmark(unique number)_TITLE.txt

    Example:

    Before:

    Code: Select all

    Bookmark0_forums.ultraedit.com index.html http://forums.ultraedit.com/
    Bookmark01_copy.filtered.Bookmarks to clipboard via macro.page1     http://forums.ultraedit.com/viewtopic.php?t=3609
    Bookmark01_copy.filtered.Bookmarks to clipboard via macro.page5 http://forums.ultraedit.com/viewtopic.php?t=3609
    Bookmark01_copy.filtered.Bookmarks to clipboard via macro.page2 http://forums.ultraedit.com/viewtopic.php?t=3609
    Bookmark01_copy.filtered.Bookmarks to clipboard via macro.page80000  http://forums.ultraedit.com/viewtopic.php?t=3609
    Bookmark01_copy.filtered.Bookmarks to clipboard via macro.page00     http://forums.ultraedit.com/viewtopic.php?t=3609
    Bookmark01_copy.filtered.Bookmarks to clipboard via macro.page010       http://forums.ultraedit.com/viewtopic.php?t=3609
    Bookmark2_filtered-Bookmarks [1]          http://forums.ultraedit.com/copy-filtered-Bookmarks-to-clipboard-via-macro-t17765.html
    Bookmark2_filtered Bookmarks (02)          http://forums.ultraedit.com/copy-filtered-Bookmarks-to-clipboard-via-macro-t17765.html
    Bookmark310005_to clipboard_via_macro                   http://forums.ultraedit.com/copy-filtered-Bookmarks-to-clipboard-via-macro-t17765.html
    Bookmark12_..(saved bookmark & added) how to remove dup!          http://forums.ultraedit.com/how-do-i-remove-duplicate-Bookmarks-t2282.html
    Bookmark05_Find, replace, find in files, replace in files, regular expressions    http://forums.ultraedit.com/find-replace-regular-expressions-f8/
    Bookmark00007_000003 next page via cliboard           https://forums.ultraedit.com/copy-filtered-BOOKMARKs-to-clipboard-via-macro-t17765.html
    Bookmark00007_01 next page        https://forums.ultraedit.com/copy-filtered-BOOKMARKs-to-clipboard-via-macro-t17765.html
    Bookmark00007_2 previous page https://forums.ultraedit.com/copy-filtered-BOOKMARKs-to-clipboard-via-macro-t17765.html
    Bookmark010_question!(@board3) from                     https://forums.ultraedit.com/regular-expression-to-extract-url-strings-from-a-x-t17790.html
    Bookmark010_extract url strings from    https://forums.ultraedit.com/regular-expression-to-extract-url-strings-from-a-x-t17790-s15.html
    Bookmark010_answer cf links http://forums.ultraedit.com/regular-expression-to-extract-url-strings-from-a-x-t17790-s15.html
    After:

    So in this case 8 txt files should be there created
    ==> and *** each text file has all the urls in it that copied from the same number lines *** (and the first lines title would be the file name of each txt file)

    Bookmark0_forums.ultraedit.com index.html.txt
    Bookmark01_copy.filtered.Bookmarks to clipboard via macro.page1.txt
    Bookmark2_filtered Bookmarks [1].txt
    Bookmark310005_to clipboard_via_macro.txt
    Bookmark12_..(saved bookmark & added) how to remove dup!.txt
    Bookmark05_Find, replace, find in files, replace in files, regular expressions.txt
    Bookmark00007_000003 next page via cliboard.txt
    Bookmark010_question!(@board3).txt

    This is my poor so called "copied and pasted poor macro"
    credit to all the users who had some questions in this forum :)

    This isn't working.
    Its funny how things never happen the way I expect them to. :)
    "Bookmark" and "the unique number_" to tell it where to break up the files.
    please help me! :)

    Code: Select all

    Loop 0
    Find MatchCase "^c"
    EndSelect
    IfNotFound
    ExitLoop
    EndIf
    Key HOME
    Find MatchCase RegExp "Bookmark+\d_"
    Clipboard 9
    Copy
    EndSelect
    Top
    Find MatchCase "^c"
    EndSelect
    Key HOME
    PerlReOn
    Find MatchCase RegExp "(Bookmark+\d_).*\r?\n(?:\1.*\r?\n)*"
    Copy
    EndSelect
    NewFile
    Paste
    Top
    SelectLine
    Copy
    Top
    Find MatchCase RegExp "^.*?(http|https)"
    Replace All "\1"
    Bottom
    Paste
    Key UP ARROW
    SelectLine
    UltraEditReOn
    Find MatchCase RegExp SelectText "^{http^}^*$"
    Replace All ""
    Bottom
    Key UP ARROW
    SelectLine
    Find MatchCase RegExp SelectText "[~^r^n0-9A-Za-z]+"
    Replace All ""
    EndSelect
    Bottom
    Key UP ARROW
    StartSelect
    Key END
    Copy
    EndSelect
    DeleteLine
    SaveAs "^c.txt"
    CloseFile NoSave
    ClearClipboard
    Clipboard 8
    UltraEditReOn
    IfEof
    ExitLoop
    EndIf
    EndLoop
    ClearClipboard

    6,686585
    Grand MasterGrand Master
    6,686585

      Jul 06, 2018#2

      The macro code for this task is:

      Code: Select all

      InsertMode
      ColumnModeOff
      HexOff
      Bottom
      IfColNumGt 1
      InsertLine
      EndIf
      Top
      PerlReOn
      Clipboard 9
      Loop 0
      Find MatchCase RegExp "^(Bookmark\d+_).+(?:\r?\n)?(?:\1.+(?:\r?\n)?)*"
      IfNotFound
      ExitLoop
      EndIf
      Copy
      NewFile
      UnixMacToDos
      Paste
      Top
      Find MatchCase RegExp "([\t ]+(?:http|ftp))"
      Replace "\r\n\1"
      Top
      StartSelect
      Key END
      EndSelect
      Copy
      Find MatchCase RegExp SelectText "[\"*/:<>?\\\|]"
      Replace All "_"
      Clipboard 8
      Copy
      Clipboard 9
      Paste
      Delete
      Clipboard 8
      SaveAs "^c.txt"
      CloseFile NoSave
      ClearClipboard
      Clipboard 9
      EndSelect
      Key HOME
      EndLoop
      ClearClipboard
      Clipboard 0
      Top
      
      I added extra code to make sure the string to use for file name does not contain characters which are not possible in file names by replacing each of them with an underscore.
      Best regards from an UC/UE/UES for Windows user from Austria

      7
      NewbieNewbie
      7

        Jul 06, 2018#3

        Thank you very much. I really appreciate it. your macro works flawlessly and fast
        and its very different than i thought it would be like :)

        btw while im studying your macro i found a word "ftp" in the code. Does that mean this macro even works with URLs starting with ftp:// ?
        thanks again, have a wonderful time !

        6,686585
        Grand MasterGrand Master
        6,686585

          Jul 06, 2018#4

          I needed something to find out where the bookmark text ends and where the URL begins. So I decided that the occurrence of either http or ftp marks the beginning of the URL. Very simple and not fail safe in case of bookmark text contains also http or ftp somewhere, but there seems to be nothing better, except using a more complex Perl regular expression to determine beginning of a URL.
          Best regards from an UC/UE/UES for Windows user from Austria

          7
          NewbieNewbie
          7

            Jul 06, 2018#5

            @Mofi Can this macro include only  URLs in the text file?

            and is there any way to stop macro running in the process?
            I often ended up with some error while testing my poor macros and theres the nag screen saying "cancel" 
            but sometimes i cant even click on the "cancel" especially when macro's looping fast
            so i had to stop UE using the window task manager :) 

            thanks mofi! im very scared of modifying your exclusive and amazing macro.  
            feels like its a sort of sin for me :)

            6,686585
            Grand MasterGrand Master
            6,686585

              Jul 06, 2018#6

              Here is the macro rewritten to have only the URLs in the text files, sorted alphabetically with removing duplicate lines.

              Code: Select all

              InsertMode
              ColumnModeOff
              HexOff
              Bottom
              IfColNumGt 1
              InsertLine
              EndIf
              Top
              PerlReOn
              Clipboard 9
              Loop 0
              Find MatchCase RegExp "^(Bookmark\d+_).+(?:\r?\n)?(?:\1.+(?:\r?\n)?)*"
              IfNotFound
              ExitLoop
              EndIf
              Copy
              NewFile
              UnixMacToDos
              Paste
              Top
              Find MatchCase RegExp "[\t ]+((?:http|ftp)s?://)"
              Replace "\r\n\1"
              Top
              StartSelect
              Key END
              EndSelect
              Find MatchCase RegExp SelectText "[\"*/:<>?\\\|]"
              Replace All "_"
              Copy
              DeleteLine
              Find MatchCase RegExp "^(?:(?!(?:http|ftp)s?://).)+"
              Replace All ""
              SortAsc RemoveDup 1 -1 0 0 0 0 0 0
              SaveAs "^c.txt"
              CloseFile NoSave
              EndSelect
              Key HOME
              EndLoop
              ClearClipboard
              Clipboard 0
              Top
              
              The macro has also an approved detection of beginning of an URL as searching for http:// or https:// or ftp:// or ftps://.

              For breaking a running macro press and hold key ESC.
              Best regards from an UC/UE/UES for Windows user from Austria

              7
              NewbieNewbie
              7

                Jul 06, 2018#7

                @Mofi, thank you so much! 

                The latter version seems to even remove duplicate urls right? Which is, I think, very useful to manage a bookmark list.
                Yet I like both of the versions!

                Can you check it out with this list of bookmark txt which I attached with this message.
                It's kinda weird cause both your macro work fine and fast with the example above.
                But in this case, it won't work properly.
                Could you confirm that is it just me or on your end either?

                If possible, please make both of them adjusted to this unusual case.
                UE_bookmark_list.txt (81.62 KiB)   39

                6,686585
                Grand MasterGrand Master
                6,686585

                  Jul 07, 2018#8

                  The first problem is caused by the empty line at 213 which breaks up the lines starting with Bookmark9502_ into two blocks according to the Perl regular expression used in the macros to select lines starting with same bookmark number string. Well, that could be easily solved by deleting first all empty lines before starting the loop.

                  But the main problem is that UltraEdit v25.10.0.50 and also former versions like v22.20.0.49 select wrong the lines starting with Bookmark9501_ or Bookmark9502_ depending on version of UltraEdit. I have just reported this bug by email to IDM support.

                  So I had to decide to work around this bug as done many years ago with UltraEdit versions not supporting Perl regular expressions making it possible at all to select multiple lines starting with same string, or writing an UltraEdit script for this task which does as much as possible in memory. I decided to write a script as doing as much as possible in memory avoids UltraEdit window refreshes resulting in finishing the task within a shorter time in comparison to a macro solution. And it is possible with a script to add better error handling and better user information.

                  The attached ZIP file contains the two commented UltraEdit scripts to split active file with bookmarks into multiple files containing either the entire bookmark lines or just the URLs alphabetically sorted case-sensitive with removing duplicate lines. The two scripts are nearly identical, just a few lines are different.

                  Both scripts write the file names with full path of active file (or no path if active file is a new, unnamed file) of saved files into the output window and append a summary line. The output window is automatically opened on at least one file created by the script. A double click on a file name in output window opens the appropriate file.
                  split_bookmarks_scripts.zip (3.11 KiB)   40
                  This ZIP file contains the two UltraEdit scripts to write the bookmarks into several text files.
                  Best regards from an UC/UE/UES for Windows user from Austria

                  7
                  NewbieNewbie
                  7

                    Jul 08, 2018#9

                    @Mofi thank you! I really appreciate it
                    you didnt have to go out of your way to make these scripts for this unusual case, 'cause the two macros you made before still works in most cases  :)
                    I could wait for the next updates for bug fixes 
                    and I hope the next version of UE works fine with the macros you had made


                    btw I would suggest that the macro section should be divided into by its version
                    I mean, I search and find some useful tips or macros in the forum but most of them are not that compatible with UE im using.
                    Of course, advanced users won't have any problem cause they are able to modify or adjust them freely
                    but newbies like me would end up disappointed as they are out of date or not available or not compatible with their whatever version
                    so how about having some section like "before version 17xxxx" "after version 25xxxx?"
                    just my 2 cents :)