Replacing text between two sets of tags

Derek · Apr 13, 2018#12018-04-13T17:51+00:00

I have 200+ webpages in a folder. What I want to do is replace the text between the <h1></h1> tags with what is between the <title></title> tags so that both are the same. Is there an easy way to do this?

Mofi · Apr 14, 2018#22018-04-14T13:20+00:00

Yes, this is possible, for example with a Perl regular expression Replace in Files with search string (?s)(?<=<title>)(.+?)(</title>.+?<h1>).*?(?=</h1>) and with replace string \1\2\1 as long as element title is in every file above element h1.

(?s) ... dot matches also newline characters, see "." (dot) in Perl regular expressions doesn't include newline characters CRLF? for details.

(?<=<title>) a positive lookbehind to find <title> without matching it as part of found string.

(...) ... first marking/capturing group. The string found by the expression inside can be back-referenced in search or replace string with \1 (or $1).

.+? ... find one or more characters non-greedy. This expression matches the string between the start and end tag of element title.

(...) ... second marking/capturing group. The string found by the expression inside can be back-referenced in search or replace string with \2 (or $2).

</title>.+?<h1> ... matches everything from beginning of end tag of element title to end of start tag of element h1 including newline characters because of (?s) at beginning of search string. The size of the block matched by this expression is not unlimited. But I suppose the block size is no problem for your task as HTML/XHTML files usually don't have several MiBs between element title and element h1.

.*? ... find zero or more characters non-greedy. This expression matches the string between the start and end tag of element h1 which can be also an empty string in case of <h1></h1> present in file.

(?=</h1>) ... a positive lookahead to stop matching zero or more characters on finding the end tag of element h1 without matching it as part of found string.

Derek · Apr 14, 2018#32018-04-14T14:15+00:00

Brilliant! Just ran it in "Replace in Files" and it worked a treat. Thank you so much.