Macro for large file - convert to individual chapters

Macro for large file - convert to individual chapters

8

    Sep 09, 2007#1

    Hi. I need help with a macro please.

    The file looks like this: (on a smaller scale of what I am wanting to do)

    Input:

    Code: Select all

    Psalms
    
    1 Happy is the man that has not walked in the counsel of the wicked ones,
    And in the way of sinners has not stood,
    And in the seat of ridiculers has not sat.
    
     2 But his delight is in the law of Jehovah,
    And in his law he reads in an undertone day and night.
    
     3 And he will certainly become like a tree planted by streams of water,
    That gives its own fruit in its season
    And the foliage of which does not wither,
    And everything he does will succeed.
    
     4 The wicked are not like that,
    But are like the chaff that the wind drives away.
    
     5 That is why the wicked ones will not stand up in the judgment,
    Nor sinners in the assembly of righteous ones.
    
     6 For Jehovah is taking knowledge of the way of righteous ones,
    But the very way of wicked ones will perish.
    
    2 Why have the nations been in tumult
    And the national groups themselves kept muttering an empty thing?
    
     2 The kings of earth take their stand
    And high officials themselves have massed together as one
    Against Jehovah and against his anointed one,
     3 [Saying:] “Let us tear their bands apart
    And cast their cords away from us!”

      Sep 18, 2007#2

      Thanks heaps Mofi for the macro. (And the point of .PNG file format for images.)

      Here are the results... both the input html source code and the output, once your macro has been run (attached archive file already deleted). It is almost there... any help to get it to finish off the macro would be much appreciated.
      I see that chapter 2 is in the same output file as chapter one as well as others (see examples).

      Thanks again Mofi. You're the man!

      6,686585
      Grand MasterGrand Master
      6,686585

        Sep 18, 2007#3

        We could have saved a lot of time if you would have first attached the HTML source file. I have deleted all posts except the first one and your post with the real source.

        I have completely rewritten the macros. Now there are 2 macros which you can combine to 1 macro if you want. The macro property Continue if a Find with Replace not found or Continue if search string not found must be checked for both macros.

        The first macro converts the HTML file to a text file. To later find the chapters, it does this conversion with inserting a page break (^b) immediately before every chapter number as you wanted to do manually as you have written in your first post.

        Note: The space character inside the replace command below the Find MatchCase "&bnsp;" is not a normal space. It is the non breaking space (decimal 160, hex A0). Check that before copying the macro code into the edit macro dialog with Search - Character Properties with the cursor on left side of the non breaking space.

        After this conversion the first macro selects whole file and exits. Why?

        Well, the output after deleting all the HTML elements is not very beautiful for reading. So it would be a good idea to reformat all paragraphs with command Format - Reformat Paragraph with appropriate settings which you can specify at Format - Paragraph Formatting - Paragraph Setup/Formatting. A reformating of selected paragraphs is not possible via macro or script. It must be done manually. The paragraph settings are saved in uedit32.ini. You only have to specify it once.

        If you don't want or need the paragraph reformatting, you can delete the last command from first macro and the first 4 commands of the second macro and combine the 2 macros to 1.

        Macro WordHtml2Text

        InsertMode
        ColumnModeOff
        HexOff
        Top
        Find "<meta name=Generator content="Microsoft Word"
        IfNotFound
        ExitMacro
        EndIf
        Top
        UnixReOff
        StartSelect
        Find RegExp Select "<body*>"
        Delete
        EndSelect
        Find MatchCase "&nbsp;"
        Replace All " "
        Find MatchCase RegExp "<b><span style='font-size:9.0pt;font-family:Arial'>^([0-9]+^)</span></b>"
        Replace All "^b^1"
        Find RegExp "<[~>]+>"
        Replace All ""
        StartSelect
        Find RegExp Select "[~ ^t^p]"
        Key LEFT ARROW
        Delete
        EndSelect
        Bottom
        StartSelect
        Find RegExp Up Select "[~ ^t^p]"
        Key RIGHT ARROW
        Key RIGHT ARROW
        Delete
        EndSelect
        SelectAll

        The second macro does the job you wanted first: split the now perfect marked chapters up to several files each containing 1 chapter with an appropriate name in same folder as the original HTML file or a default folder if the HTML source is not saved once.

        Macro Split2Chapters

        InsertMode
        ColumnModeOff
        HexOff
        UnixReOff
        Top
        Clipboard 9
        Find RegExp "[a-z]*$"
        Copy
        EndSelect
        Key END
        Clipboard 8
        CopyFilePath
        NewFile
        Paste
        Find Up "\"
        Replace "\"
        IfFound
        DeleteToEndofLine
        Else
        "C:\"
        EndIf
        Clipboard 9
        Paste
        TrimTrailingSpaces
        Bottom
        "_.txt"
        SelectAll
        Copy
        CloseFile NoSave
        Clipboard 8
        Loop
        Find "^b"
        IfNotFound
        ExitLoop
        EndIf
        Key RIGHT ARROW
        Key LEFT ARROW
        StartSelect
        Find Select "^b"
        IfSel
        Key LEFT ARROW
        Copy
        EndSelect
        Key RIGHT ARROW
        Key LEFT ARROW
        Else
        EndSelect
        Key RIGHT ARROW
        Key LEFT ARROW
        SelectToBottom
        Copy
        EndSelect
        Key UP ARROW
        Bottom
        EndIf
        NewFile
        Paste
        " "
        StartSelect
        Find RegExp Up Select "[~ ^t^p]"
        Key RIGHT ARROW
        Key RIGHT ARROW
        Delete
        EndSelect
        Top
        Find RegExp "[0-9]+"
        Cut
        "("
        Paste
        ")"
        Top
        Clipboard 9
        Paste
        Key LEFT ARROW
        Key LEFT ARROW
        Key LEFT ARROW
        Key LEFT ARROW
        Clipboard 8
        Paste
        Top
        Find RegExp "%*.txt"
        Cut
        SaveAs "^c"
        CloseFile
        EndLoop
        ClearClipboard
        Clipboard 9
        ClearClipboard
        Clipboard 0
        Best regards from an UC/UE/UES for Windows user from Austria

        8

          Sep 19, 2007#4

          You are the MAN Mofi!!!
          Thanks so much for that!

          I didn't even need to reformat in... was fine how it is!

          Sorry again for mucking you around. I have learnt a lot on how to post my problems, and I know the if there's a next time... I willl make it a lot more simplier for you.


          Thanks again Mofi!
          The macro king!

            Oct 04, 2007#5

            Hey Mofi.

            Is there any chance of changing the naming part of the macro when it saves it?
            From:
            "Genesis_1"
            To:
            "Genesis_001"

            So...
            "Genesis_2" becomes "Genesis_002"
            "Genesis_3" becomes "Genesis_003"
            .......................
            "Genesis_50" becomes "Genesis_050"
            and so on.

            Thanks in advance.

            6,686585
            Grand MasterGrand Master
            6,686585

              Oct 04, 2007#6

              Yes, this can be done. Normally I would do such a file renaming with Total Commander's Multi-Rename Tool which is incredible powerful, but extremly easy to use to rename thousands of files with a view mouse clicks within 20 seconds.

              However, why should not the macro save the files with the preferred naming scheme when it is possible. So here is the solution.

              First you have to create a new macro named SaveChapterFile.

              Attention: The name is case-sensitive.

              And this macro must be saved in the same file as the other 2 macros or the merged macro. It is important that you first create this sub macro.

              An additional macro is required because nesting of loops (loop inside another loop) is not possible in the macro environment. A inner loop is necessary to insert the corret number of leading zeros into the file name of the current chapter based on the number of digits of the last (=highest) chapter number.

              As the name of the new sub macro already indicates, it creates the file name for the actual file and saves it.

              Macro SaveChapterFile

              Top
              Clipboard 9
              Paste
              Paste
              Key UP ARROW
              Find RegExp "0+$"
              Replace ""
              Clipboard 8
              Paste
              Loop
              Key UP ARROW
              IfCharIs "0"
              Key DOWN ARROW
              Find Up "_"
              Replace "_0"
              Key END
              Else
              ExitLoop
              EndIf
              EndLoop
              DeleteLine
              Key END
              ".txt"
              SelectToTop
              Cut
              EndSelect
              Delete
              SaveAs "^c"
              CloseFile

              Okay, after creating this macro, the code of the existing macro Split2Chapters must be completely replaced with the following code:

              InsertMode
              ColumnModeOff
              HexOff
              UnixReOff
              Bottom
              Find Up "^b"
              EndSelect
              Key LEFT ARROW
              Key RIGHT ARROW
              SelectWord
              Clipboard 7
              Copy
              EndSelect
              Top
              Clipboard 9
              Find RegExp "[a-z]*$"
              Copy
              EndSelect
              Key END
              Clipboard 8
              CopyFilePath
              NewFile
              Clipboard 7
              Paste
              SelectToTop
              Find RegExp "[0-9]"
              Replace All SelectText "0"
              Cut
              EndSelect
              Clipboard 8
              Paste
              Find Up "\"
              Replace "\"
              IfFound
              DeleteToEndofLine
              Else
              "C:\"
              EndIf
              Clipboard 9
              Paste
              TrimTrailingSpaces
              Bottom
              "_"
              Clipboard 7
              Paste
              ClearClipboard
              Clipboard 9
              InsertLine
              SelectAll
              Copy
              CloseFile NoSave
              Clipboard 8
              Loop
              Find "^b"
              IfNotFound
              ExitLoop
              EndIf
              Key RIGHT ARROW
              Key LEFT ARROW
              StartSelect
              Find Select "^b"
              IfSel
              Key LEFT ARROW
              Copy
              EndSelect
              Key RIGHT ARROW
              Key LEFT ARROW
              Else
              EndSelect
              Key RIGHT ARROW
              Key LEFT ARROW
              SelectToBottom
              Copy
              EndSelect
              Key UP ARROW
              Bottom
              EndIf
              NewFile
              Paste
              " "
              StartSelect
              Find RegExp Up Select "[~ ^t^p]"
              Key RIGHT ARROW
              Key RIGHT ARROW
              Delete
              EndSelect
              Top
              Find RegExp "[0-9]+"
              Cut
              "("
              Paste
              ")"
              PlayMacro 1 "SaveChapterFile"
              EndLoop
              ClearClipboard
              Clipboard 9
              ClearClipboard
              Clipboard 0

              In the main loop only the file name creating and file saving part is replaced now by the command PlayMacro because sub macro SaveChapterFile does this part now. Main changes are at top of this macro where the last chapter number is searched, then all digits of it are replaced with zeros to get the correct number of digits and those zeros are appended now to the file name after the underscore. There is no file extension anymore appended in this macro. This is done later in sub macro SaveChapterFile. And the string in clipboard 9 with "path\book title_000" is now a real line with a line termination instead of only a string.

              Macro WordHtml2Text is not modified. If you again want to merge macro WordHtml2Text with Split2Chapters, you have again to delete last command (line) of WordHtml2Text and the first 4 commands (lines) of Split2Chapters.
              Best regards from an UC/UE/UES for Windows user from Austria

              8

                Oct 05, 2007#7

                It didn't work Mofi. I get the same output file name "Genesis_1".
                Don't worry about it though.

                I will also take a look at that multi-rename tool too.

                Cheers.

                6,686585
                Grand MasterGrand Master
                6,686585

                  Oct 05, 2007#8

                  You must have done something wrong because I have tested it with your "before HTML file" and the macro created Genesis_01.txt to Genesis_10.txt because 10 is the last chapter number in the HTML file.
                  Best regards from an UC/UE/UES for Windows user from Austria

                  8

                    Oct 05, 2007#9

                    Mofi wrote:You must have done something wrong because I have tested it with your "before HTML file" and the macro created Genesis_01.txt to Genesis_10.txt...
                    Fair enough, it has done that I mine too (now). But my post above didn't ask for Genesis_01.txt, it asked for Genesis_001.txt through to Genesis_050.txt (and then onto Genesis_176.txt which would be the highest chapter number).
                    With your above macro... it creates Genesis_01.txt, Genesis_02.txt, Genesis_03.txt, Genesis_04.txt...... Genesis_10.txt,
                    and then after this...
                    Genesis_11.txt, Genesis_12.txt, Genesis_13.txt, and so on.
                    It's missing a zero basically.

                    Not really any need to worry about it if the marco can't do it.
                    Thanks tonnes anyway.

                    6,686585
                    Grand MasterGrand Master
                    6,686585

                      Oct 05, 2007#10

                      I have tested the macros again with your test file where I have changed last chapter number 10 to 176. And the macro creates Genesis_001.txt, ..., Genesis_009.txt, Genesis_176.txt. So it works perfect.

                      Your HTML source hopefully contain all 176 chapters and chapter 176 is the last one in the file.
                      chapters.zip (1.37 KiB)   391
                      Archive contains Chapters.mac - the macro file with the macros SaveChapterFile, Split2Chapters, WordHtml2Text and CreateChapters (WordHtml2Text and Split2Chapters merged).
                      Best regards from an UC/UE/UES for Windows user from Austria

                      8

                        Oct 08, 2007#11

                        Mofi wrote:You must have done something wrong because I have tested it with your "before HTML file" and the macro created Genesis_01.txt to Genesis_10.txt...
                        Mofi wrote:I have tested the macros again with your test file where I have changed last chapter number 10 to 176. And the macro creates Genesis_001.txt, ..., Genesis_009.txt, Genesis_176.txt. So it works perfect.
                        Yes Mofi, we are getting the same results. But these results are slightly wrong.
                        Let me explain.
                        At the moment, if I have chapters ranging from Genesis 1 to Genesis 10 (ie: 1 digit to 2 digits) then my output is like this:

                        Code: Select all

                        Genesis_01.txt
                        Genesis_02.txt
                        Genesis_03.txt
                        Genesis_04.txt
                        Genesis_05.txt
                        Genesis_06.txt
                        Genesis_07.txt
                        Genesis_08.txt
                        Genesis_09.txt
                        Genesis_10.txt
                        But if I have chapters ranging from Genesis 1 to Genesis 176 (or just change the number 10 to 176) then the output is like this:

                        Code: Select all

                        Genesis_001.txt
                        Genesis_002.txt
                        Genesis_003.txt
                        Genesis_004.txt
                        Genesis_005.txt
                        Genesis_006.txt
                        Genesis_007.txt
                        Genesis_008.txt
                        Genesis_009.txt
                        Genesis_176.txt
                        To me (and correct me if I'm wrong) it seems, that what ever the number of digits the last chapter has, this is what determines how many digits the saved txt file has. For example:
                        If I have Genesis chapter 1 through to chapter 9 then the output is like this:

                        Code: Select all

                        Genesis_1.txt
                        Genesis_2.txt
                        Genesis_3.txt
                        Genesis_4.txt
                        Genesis_5.txt
                        Genesis_6.txt
                        Genesis_7.txt
                        Genesis_8.txt
                        Genesis_9.txt
                        If I have Genesis chapter 1 through to chapter 10 then the output is like this:

                        Code: Select all

                        Genesis_01.txt
                        Genesis_02.txt
                        Genesis_03.txt
                        Genesis_04.txt
                        Genesis_05.txt
                        Genesis_06.txt
                        Genesis_07.txt
                        Genesis_08.txt
                        Genesis_09.txt
                        Genesis_10.txt
                        And if I have Genesis chapter 1 through to chapter 100 then the output is like this:

                        Code: Select all

                        Genesis_001.txt
                        Genesis_002.txt
                        Genesis_003.txt
                        Genesis_004.txt
                        Genesis_005.txt
                        Genesis_006.txt
                        Genesis_007.txt
                        Genesis_008.txt
                        Genesis_009.txt
                        ..........................
                        Genesis_100.txt
                        Is this correct?

                        But this is not what I want...
                        As I originally asked (I'm not rubbing it in, just trying to cover my back, because I know I mucked you around earlier in this post) I want three numbers always, no matter if there is only one chapter or 176 chapters.

                        Basically:
                        If the chapter range is between 1 and 9 then.....
                        between 1 and 9 they will have two zero's preceeding them. (ie: 001, 002, 003, ..., 009)
                        If the chapter range is between 1 and 99 then.....
                        between 1 and 9 they will have two zero's preceeding them THEN between 10 - 99 they will have one zero preceeding them. (ie: 001, 002, 003, ..., 009, 010, 011, 012..., 099)
                        If the chapter range is between 1 and 999 then.....
                        between 1 and 9 they will have two zero's preceeding them THEN between 10 - 99 they will have one zero preceeding them THEN between 100 - 999 they will have no zero's preceeding them (ie: 001, 002, 003, ..., 009, 010, 011, 012..., 099, 100, 101, 102, ..., 999)


                        Sorry if you don't really understand that.
                        Genesis has 50 chapters total, whereas Psalms has 176; but I want both to be saved with 3 numbers (ie: same file name length).
                        In English: I want both a minimum and maximum of three digits ALWAYS.

                        6,686585
                        Grand MasterGrand Master
                        6,686585

                          Oct 08, 2007#12

                          Yes, the macros are written to dynamically use the number of digits of the highest (last) chpater number to store all files with the same number of digits depending on the highest number. That's what I have supposed you want and what I have written in the explanation for re-written macro Split2Chapters

                          Now you tell me the first time that you want the chapter number in the file name always with exactly 3 digits.

                          Okay, no problem. That makes macro Split2Chapters more easily. You should really try to understand the macros to adapt it to your needs by yourself when necessary. It is not so difficult to understand the macros as you know the input and the output.

                          Here is the upper part of the macro Split2Chapters till command Loop which prepares the file name now for a fixed number of digits of 3 for the chapter number in the file name.

                          InsertMode
                          ColumnModeOff
                          HexOff
                          UnixReOff
                          Top
                          Clipboard 9
                          Find RegExp "[a-z]*$"
                          Copy
                          EndSelect
                          Key END
                          Clipboard 8
                          CopyFilePath
                          NewFile
                          Paste
                          Find Up "\"
                          Replace "\"
                          IfFound
                          DeleteToEndofLine
                          Else
                          "C:\"
                          EndIf
                          Clipboard 9
                          Paste
                          TrimTrailingSpaces
                          Bottom
                          "_000
                          "
                          SelectAll
                          Copy
                          CloseFile NoSave
                          Clipboard 8
                          Loop

                          I have uploaded in my previous post an updated ZIP archive which contains macro file ChaptersFixed.mac where this modification is done in Split2Chapters and CreateChapters.
                          Best regards from an UC/UE/UES for Windows user from Austria

                          8

                            Oct 08, 2007#13

                            TheChipstar wrote:"Genesis_2" becomes "Genesis_002"
                            "Genesis_3" becomes "Genesis_003"
                            .......................
                            "Genesis_50" becomes "Genesis_050"
                            Yes... this is what I originally asked for, sorry for the confusion.
                            I just think you like the challenge and so went for the harder option. Haha.

                            And yes, I am slowly learning the language. I can recognize similar functions to VBA coding, so I'm getting there.

                            Thanks once again, you didn't even have to do any of this at all! So I appreciate it!
                            Thanks Mofi.