Perl regular expression to list procedure names in oracle package

spaceBAR · Jun 22, 2010#12010-06-22T07:46+00:00

I created these perl regular expressions and use them in the "Function List" as a group parsing rule for listing the function or procedure names in a oracle package:

Code: Select all

(f)unction( )+([a-zA-Z0-9,_]+$)
(p)rocedure( )+([a-zA-Z0-9,_]+$)

They correctly list the names in the format I want, for example:
p procedure_name
f function_name

I would like to refine them to:
1. Skip the lines where it finds a match but the line is a comment, meaning the line has "--" anywhere before
the match on the same line.
2. Skip the lines where it finds a match but the match is in a 'comment block' meaning the match is NOT
between a "/*" and a "*/" where the "/*" could be on a line before the match and the "*/" could be
on a line after the match.

I believe #1 is possible without making it too slow, but #2 might make it too slow to be worth it...

I'm thinking #1 can be accomplished with a negative look behind but I haven't mastered the 'look-around' YET!!!

Any assistance/direction will be appreciated.

tia

bulgrien · Jun 23, 2010#22010-06-23T03:54+00:00

While it may be possible to code a regular expression to do what you are asking... my immediate thought is that you should email IDM directly and request an enhancement that provides a configurable option to ignore text identified as comments in the file format/syntax highlighting rules when identifying and listing procedure/function names. It sounds like a good idea that could benefit many other UE users.

Mofi · Jun 23, 2010#32010-06-23T06:11+00:00

To exclude function strings inside a block comment would be definitely very hard. Do you really have lots of functions or procedures in comments which you don't want to see in the function list and which is worth the slow down in function string scanning by using very complex regular expressions to exclude them?

To exclude line commented function or procedure definition lines I suggest following expression:

^[^\-\r\n]*?\<([fp])(?:unction|rocedure)\> +([0-9a-z_]+)$

From my point of view it is best practice to use as less regular expressions as possible and as simple regular expressions as possible to make function string scanning as fast as possible. The goal is to find best only one expression which finds the strings of interest in the files I really have and not what in theory is possible.

^[^\-\r\n]*? means start the search a beginning of a line and find zero or more characters which excluding hyphens, carriage returns and line-feeds. That excludes functions and procedures inside a line comment. Of course if there are functions or procedures with a single hyphen left the keyword, those are also excluded which would be not correct. However, if you don't have function or procedure definition lines with a hyphen left the keyword, this is an unimportant fact and this simple expression can be nevertheless used.

\<([fp])(?:unction|rocedure)\> is used to find next the keyword function or procedure. The non-capturing OR expression avoids a second regular expression string decreasing scan time. Using only one a little bit more complex expression is always faster than using 2 expressions. Sure, the expression as is would also accept the words punction and frocedure, but does this matter? Exist those words in your files? Probably not and therefore this expression is not perfect, but good enough for the task.

I'm using currently UltraEdit v16.10.0.1027 and it concatenates the tagged expressions with a space character and therefore it is not needed to tag one space character.

The scan for function strings is always done with a not case sensitive search. Therefore [a-zA-Z] is not needed because [a-z] is exactly the same and makes the expression a little bit shorter and therefore again a little bit faster.

^ for start of line and $ for end of line should be outside capturing brackets whenever possible. They are just anchors and do not match a character. It is not wrong to include them inside a capturing bracket, but because they don't "select" a character, they should be outside a capturing bracket.

If that expression is not working in your Oracle files, please post the lines where it misses a function or procedure or where it finds something which should be not in the function list. The lines should be inside a code block when they contain multiple spaces or spaces at start of the line.

Once again, the goal is to find as less and simple expressions as possible needed to find the strings of interest in your files and not to find expressions which work for every function or procedure definition lines in Oracle files which could exist in theory. Keep it simple because simple solutions are always faster than complex solutions.

spaceBAR · Jun 23, 2010#42010-06-23T07:07+00:00

Thanks for the input!!!

I finally figured out the perl regular expressions to use as a group parsing rule in the "Function List" to list the function or procedure names in a oracle package which ignores single line comments (i.e. lines that have "--" before the word "function" or "procedure") and it also ignores the matches in a comment block as long as the word "function" or "procedure" is not the first word on a line and that doesn't happen much at all!!!!

Code: Select all

^ *(f)unction(\s*[a-zA-Z0-9,_ ]+)
^ *(p)rocedure(\s*[a-zA-Z0-9,_]+)

I'm not sure why but this expression doesn't find a match:

Code: Select all

^[^\-\r\n]*?\<([fp])(?:unction|rocedure)\>  +([0-9a-z_]+)$

For example try it on this sample:
PROCEDURE REFRESH_BXR_OUTPUT(out_result OUT NUMBER)
IS
/******************************************************************
Author: Siddha Anra (29-Jun-2009)
procedure truncates and refreshes the data in BXR_OUTPUT table.
*****************************************************************/

Thanks for your assistance!

Mofi · Jun 23, 2010#52010-06-23T10:55+00:00

spaceBAR wrote:I'm not sure why but this expression doesn't find a match:
Code: Select all
^[^\-\r\n]*?\<([fp])(?:unction|rocedure)\>  +([0-9a-z_]+)$

There are 2 spaces after \> and before + and therefore the expression demands 2 or more spaces after keyword function or procedure.

And be careful with \s because it matches not only spaces and tabs, it matches also carriage return, linefeeds, formfeeds and vertical tabs. So \s does NOT equal [ \t], it equals [ \t\r\n\f\v]. If left to keyword function or procedure only spaces and tabs can exist, you can also use more simple

Code: Select all

^[ \t]*([fp])(?:unction|rocedure)\> +([0-9a-z_]+)