How to check for dot, comma, colon and semicolon before a closing tag?

How to check for dot, comma, colon and semicolon before a closing tag?

81
Advanced UserAdvanced User
81

    Apr 24, 2018#1

    I'm trying to find out whether there are dot, comma, semicolon or colon just before the closing tag </title> in my file except when there is a semi-colon which does not represent a 4 digit hex entity.

    Text sample

    Code: Select all

    <title>Dhind</title>
    <title>WT.</title>
    <title>Plant Leaves:</title>
    <title>Denia;</title>
    <title>Erod&#x00E9;</title>
    The search pattern should positively match:

    Code: Select all

    <title>WT.</title>
    <title>Plant Leaves:</title>
    <title>Denia;</title>
    I tried this Perl regular expression search string: ([.,:]|((?<!&#x[0-9a-z][0-9a-z][0-9a-z][0-9a-z]);))</title>
    Can someone help me make it shorter.

    19176
    MasterMaster
    19176

      Apr 25, 2018#2

      Hi Don,

      this regex is shorter and case insensitive. It matches the whole sequence of [,:.;] before the closing tag </title> and works even for the following line

      <title>Erod&#x00E9;.;</title>

      F: (?i)(?:[,:.;](?<!&#x[0-9A-F]{4};))+(?=</title>)
      R: empty

      BR, Fleggy

      81
      Advanced UserAdvanced User
      81

        Apr 25, 2018#3

        Thanks fleggy🙂

        19176
        MasterMaster
        19176

          Apr 25, 2018#4

          And this one is faster with optimized lookbehind and possessive *  ;)

          (?i)[,:.;](?<!&#x[\dA-F]{4};)[,:.;]*+(?=</title>)