Script to modify linking tags

zero_cool · Jul 28, 2016#12016-07-28T21:34+00:00

My file contains multiple display equation tags in the form

<formula id="dqn1">....</formula> , <formula id="dqn2-6">....</formula>, <formula id="dqn7-8">....</formula>, <formula id="dqn9">....</formula> <formula id="dqn10">....</formula>

and cross-reference (link) them with

<xref ref-type="d-formula" rid="dqn1">(1)</xref>, <xref ref-type="d-formula" rid="dqn3">(3)</xref> , <xref ref-type="d-formula" rid="dqn5">(5)</xref> <xref ref-type="d-formula" rid="dqn7">(7)</xref>–<xref ref-type="d-formula" rid="dqn8">(8)</xref>

I would like a script which modifies the above colored tags to

<xref ref-type="d-formula" rid="dqn2-6">(3)</xref>
<xref ref-type="d-formula" rid="dqn2-6">(5)</xref>
<xref ref-type="d-formula" rid="dqn7-8">(7)–(8)</xref>

and keep the other tags intact.

How could that be done?

Mofi · Jul 29, 2016#22016-07-29T07:55+00:00

What is the rule which defines that

<xref ref-type="d-formula" rid="dqn3">(3)</xref>

should be modified to

<xref ref-type="d-formula" rid="dqn2-6">(3)</xref>

and

<xref ref-type="d-formula" rid="dqn5">(5)</xref>

should be also modified to

<xref ref-type="d-formula" rid="dqn2-6">(5)</xref>

That is completely unclear for me.

An UltraEdit regular expression Replace All can be used for the last two references.

Find what: ^(rid="dqn^)^([0-9]+^)^(">([0-9]+)^)</xref>–<xref ref-type="d-formula" rid="dqn^([0-9]+^)">^(([0-9]+)</xref>^)
Replace with: ^1^2-^4^3–^5

Same as above with Unix or Perl regular expression engine:

Find what: (rid="dqn)([0-9]+)(">\([0-9]+\))</xref>–<xref ref-type="d-formula" rid="dqn([0-9]+)">(\([0-9]+\)</xref>)
Replace with: \1\2-\4\3–\5

zero_cool · Jul 29, 2016#32016-07-29T15:08+00:00

The rule is: If there are display equation tags in the form "<formula id="dqnDIGIT1-DIGIT2">" and there are individual link ranging from the first (DIGIT) to the last (DIGIT).

If there are equations in the form say "<formula id="dqn1-4">" and there are links of equations (1), (2), (3) and (4) in the file and they are for some reason in the form

<xref ref-type="d-formula" rid="dqn1">(1)</xref>, <xref ref-type="d-formula" rid="dqn2">(2)</xref>, <xref ref-type="d-formula" rid="dqn3">(3)</xref> and <xref ref-type="d-formula" rid="dqn4">(4)</xref>

then each of rid="dqnDIGIT" portions will be changed to rid="dqn1-4" since there is no

<formula id="dqn1">, <formula id="dqn2">, ... <formula id="dqn4">

equation tag in the file because they are merged.

So the script needs to search the file for "<formula id="dqnDIGIT1-DIGIT2">" and if it finds positive results, then it will look for rid="dqnDIGIT1" and add +1 to DIGIT1 and find rid="dqnDIGIT1(+1)" until it reaches DIGIT2 and replace the links.

I hope I made a little bit clearer

Mofi · Jul 29, 2016#42016-07-29T18:15+00:00

Okay, now I have understood the rule for modification of single formula references. But there is one more case which must be taken into account.

The file contains the formulas:

<formula id="dqn1">....</formula>
<formula id="dqn2-6">....</formula>
<formula id="dqn7-8">....</formula>
<formula id="dqn9">....</formula>
<formula id="dqn10">....</formula>

And the file contains also the formula references

<xref ref-type="d-formula" rid="dqn1">(1)</xref>
<xref ref-type="d-formula" rid="dqn3">(3)</xref>
<xref ref-type="d-formula" rid="dqn5">(5)</xref>
<xref ref-type="d-formula" rid="dqn7">(7)</xref>–<xref ref-type="d-formula" rid="dqn8">(8)</xref>
<xref ref-type="d-formula" rid="dqn5">(5)</xref>–<xref ref-type="d-formula" rid="dqn10">(10)</xref>

The first single reference is kept unmodified. The value of attribute rid of single references 2 and 3 can be updated. Also the fourth range reference can be updated as described.

But how to handle the fifth range reference. Should it be modified to

<xref ref-type="d-formula" rid="dqn2-6">(5)–(10)</xref>

or should it be modified to

<xref ref-type="d-formula" rid="dqn2-6">(5)</xref>–<xref ref-type="d-formula" rid="dqn10">(10)</xref>

zero_cool · Jul 30, 2016#52016-07-30T01:19+00:00

Let's say the file contains the formulas: