Remove everything after </html> in all *.htm and *.html files

Remove everything after </html> in all *.htm and *.html files

2
NewbieNewbie
2

    Jan 05, 2015#1

    Hello,

    I am on UltraEdit version 18.10.0.1010.

    A virus infected all .html and .htm files on my HD, by inserting a portion of bad code after the closing </html> tag, in ~20000 files.

    I would like to run a search and replace command that deletes everything after the </html> tag.

    Would anyone be so kind to indicate a proper formula for doing that, I've tried many with no luck.

    6,606548
    Grand MasterGrand Master
    6,606548

      Jan 05, 2015#2

      Run a replace on all *.htm files recursive (*.html are also automatically included) with Perl regular expression option enabled searching not case-sensitive for (</html>\s*?)\S[\s\S]* and using \1 as replace string.

      (...) ... a capturing group for back-referencing the string found inside the parentheses with \1 in replace string.

      </html> ... this string in any case.

      \s*? ... whitespace characters like spaces, tabs, carriage returns and line-feeds (and some others most likely not present in the *.html files) zero or more times non-greedy.

      \S ... a non whitespace character.

      [\s\S]* ... a whitespace character or a non whitespace character (= any character) zero or more times greedy. As this last expression matching any character is greedy, it matches everything to end of file.

      Try this Perl regular expression replace first on a single HTML file opened in UltraEdit as I have done with currently latest UltraEdit. If it works as expected, run it using Replace in Files. Best would be to run this Perl regular expression Replace All two times to verify with second run that all *.htm and *.html files were finally cleaned up from malware code.
      Best regards from an UC/UE/UES for Windows user from Austria

      2
      NewbieNewbie
      2

        Jan 05, 2015#3

        Thank you a lot for a nice and detailed description :) Worked perfectly. Regards!