Looking for 2 scripts: Remove Word formatting & Remove HTML formatting.

Looking for 2 scripts: Remove Word formatting & Remove HTML formatting.

6
NewbieNewbie
6

    May 20, 2014#1

    Does anyone have UE scripts that can do these?

    Thanks!
    GeekDrop.com
    Help Community set in a Geeky Fashion ;)
    Stop By Sometime!

    6,686585
    Grand MasterGrand Master
    6,686585

      May 21, 2014#2

      What do you mean with "remove word formatting"?

      Removing the special tags and attributes Microsoft Word saves into an HTML file when using Save As with file type Web Page (*.htm;*.html)?

      Yes, then use the file type Web Page, Filtered (*.htm;*.html) in Save As dialog of Microsoft Word.

      To remove all HTML formatting open the HTML file in browser, press Ctrl+A, Ctrl+C, switch to UltraEdit, open a new file and press Ctrl+V.

      Or you search for <[^>]+?> with Perl regular expression engine and use an empty string as replace string. But be aware that any < or > in normal text not encoded with HTML entity would produce a wrong result for this regular expression replace.

      See Importing MS Word DOC files.
      Best regards from an UC/UE/UES for Windows user from Austria

      6
      NewbieNewbie
      6

        May 22, 2014#3

        Heya Mofi,

        What I meant was, I run a forum myself, and occasionally people will submit a post that was created in MS Word, then they copied it to their clipboard and paste it into the post on my site. The result is, when viewing the source of that post there's a TON of garbage code added. More often than not it also throws off the formatting of the actual site itself as well, and manually trying to edit out the MS Word code leaving only the actual post content is very .... frustrating.

        Here's an example of what it looks like: http://pastebin.com/a6ge80uH

        I would love a nice UE script that I could just paste all that code into UE, run the script, and it'll clean out all the MS Word code for me. Similar concept for stripping HTML tags, is what I was referring to for the HTML script. Sometimes a simple opening in a browser and copying the content isn't as good or efficient as a script that would strip out HTML code.
        GeekDrop.com
        Help Community set in a Geeky Fashion ;)
        Stop By Sometime!

        6,686585
        Grand MasterGrand Master
        6,686585

          May 22, 2014#4

          Here is a quickly written script for removing Word HTML formatting working with the data in the system clipboard directly.

          Code: Select all

          UltraEdit.selectClipboard(0);
          // Remove xml blocks.
          var sWordData = UltraEdit.clipboardContent.replace(/<xml.*?>[\S\s]+?<\/xml>/g,"");
          // Remove style blocks.
          sWordData = sWordData.replace(/<style.*?>[\S\s]+?<\/style>/g,"");
          // Remove conditional comment blocks.
          sWordData = sWordData.replace(/<!--\[if.*?>[\S\s]*?<!\[endif\]-->/g,"");
          // Remove all tags.
          sWordData = sWordData.replace(/<[!\/A-Za-z][^>]+?>/g,"");
          // Replace all non breaking spaces by normal spaces.
          sWordData = sWordData.replace(/&#160;/g," ");
          // Replace all horizontal tabs by normal spaces.
          sWordData = sWordData.replace(/\t/g," ");
          // Remove all spaces at end of a line.
          sWordData = sWordData.replace(/ +\r\n/g,"\r\n");
          // Replace a sequence of spaces by a single space.
          sWordData = sWordData.replace(/  +/g," ");
          // Remove a single space at beginning of a line.
          sWordData = sWordData.replace(/\n /g,"\n");
          // Replace multiple blank lines by a single blank line.
          UltraEdit.clipboardContent = sWordData.replace(/(?:\r\n){3,}/g,"\r\n\r\n");
          Best regards from an UC/UE/UES for Windows user from Austria

          6
          NewbieNewbie
          6

            May 22, 2014#5

            Very nice script, thanks Mofi! :D
            GeekDrop.com
            Help Community set in a Geeky Fashion ;)
            Stop By Sometime!