Hello,
Splitting a sentence using the full-stop/question-mark/exclamation is a common device. Whereas the question-mark / exclamation do not pose too much of a problem; the full-stop as a sentence delimiter raises certain issues because of its varied use, as shown in the examples below:
A Perl script would do the job, but since I am working on dynamic data where on the fly detection is needed, I am looking for a regex which can do the job and correctly ignore the above cases and identify only valid ones.
input:
Quote:
The temperature was 32.8 degrees Celsius. His B.Sc. degree was deemed insufficient. He owed the bank USD 4000.50 which he had not paid back. On 27.07.2004 a major earthquake occurred. It was 17.05 by the clock.
What I need is that the regex should identify only sentences delimited with a full-stop.
The expected output would be:
which stated that
Locate a full-stop followed by a word in Caps (with or without space) or a full-stop at eof. But it just didn't work.
AWK or PERL would do the job but since the data is dynamic and has to be processed on-line the only solution is a regex.
Many thanks in advance.
Sorry I had posted this earlier without the Subject and guess it did not get registered.
Splitting a sentence using the full-stop/question-mark/exclamation is a common device. Whereas the question-mark / exclamation do not pose too much of a problem; the full-stop as a sentence delimiter raises certain issues because of its varied use, as shown in the examples below:
just to name a few.The temperature was 32.8 degrees Celsius. (Temperature)
His B.Sc. degree was deemed insufficient. (Acronym)
He owed the bank USD 4000.50 which he had not paid back. (Currency)
On 27.07.2004 a major earthquake occurred. (Date)
It was 17.05 by the clock. (Time)
A Perl script would do the job, but since I am working on dynamic data where on the fly detection is needed, I am looking for a regex which can do the job and correctly ignore the above cases and identify only valid ones.
input:
Quote:
The temperature was 32.8 degrees Celsius. His B.Sc. degree was deemed insufficient. He owed the bank USD 4000.50 which he had not paid back. On 27.07.2004 a major earthquake occurred. It was 17.05 by the clock.
What I need is that the regex should identify only sentences delimited with a full-stop.
The expected output would be:
and not for exampleThe temperature was 32.8 degrees Celsius.
His B.Sc. degree was deemed insufficient.
He owed the bank USD 4000.50 which he had not paid back.
On 27.07.2004 a major earthquake occurred.
It was 17.05 by the clock.
One of the techniques I tried was the following regex:His B.
Sc.
degree was deemed insufficient.
Code: Select all
\.\w[A-Z]
Locate a full-stop followed by a word in Caps (with or without space) or a full-stop at eof. But it just didn't work.
AWK or PERL would do the job but since the data is dynamic and has to be processed on-line the only solution is a regex.
Many thanks in advance.
Sorry I had posted this earlier without the Subject and guess it did not get registered.