Extracting SHA1 hashes matched by a regular expression from file

fernandom · Feb 05, 2018#12018-02-05T19:26+00:00

Hi,

I wanted to extract some SHA1 hashes from random text files. So, let's say I have a file like this:

a c0c13ddb9e5ec04ab1561bfaebe4920a6f1bdda9
asklskalk
 as dsads fcaa0fed49330b5d07E1c7a300a545cf41ec8df0 8dkds8z
 sadadasda

asad 1b866886c46430c871a91ead6e607fad9e291de4 fcaa0fed49330b5d07E1c7a300a545cf41ec8df0

lll

As SHA1 hashes are always 40 character long and use only hexadecimal digits, I can extract then with a simple grep command:

Code: Select all

$ grep -Eo '[[:xdigit:]]{40}' teste.txt
c0c13ddb9e5ec04ab1561bfaebe4920a6f1bdda9
fcaa0fed49330b5d07E1c7a300a545cf41ec8df0
1b866886c46430c871a91ead6e607fad9e291de4
fcaa0fed49330b5d07E1c7a300a545cf41ec8df0

I've been trying to simulate this grep command (in particular the -o option, which makes grep print out only the matching part of the line) with UltraEdit scripting. It takes me a while because I found no easy way to know then the find command went back to the first occurrence found (as UE find is circular), so I don't know when to stop a loop as UltraEdit.activeDocument.isFound() will always return true unless I delete/change the occurrences found, which I don't want to do.

The only way I found is to save the position for the first match and then keep comparing the other ones with it. It's not elegant though. Current working code is like this:

Code: Select all

var doc = UltraEdit.activeDocument;

doc.top();
doc.findReplace.matchWord=true;
doc.findReplace.searchDown=true;
doc.findReplace.regExp=true;

UltraEdit.clearClipboard();

for (i=0, first=0; ; i++) { 
   doc.findReplace.find("[[:xdigit:]]{40}");

   if (! doc.isFound() || first == doc.currentPos)
      break;

   if (i == 0)
      first = doc.currentPos;
   else
      UltraEdit.clipboardContent += "\n";

   doc.copyAppend();
}

UltraEdit.outputWindow.write(i + " occurrences found.");

if (i > 0) {
   UltraEdit.newFile();
   doc.paste();
}

So, is there any easier way to accomplish this? I'm using the latest version of UE on Mac.
Thanks in advance,
Fernando

Mofi · Feb 06, 2018#22018-02-06T06:30+00:00

I am using only UltraEdit for Windows. So I can only hope that what I wrote below is true also for UltraEdit for Mac.

There is the setting Continue find at end of file in Configuration/Preferences at Search - Miscellaneous which you can uncheck to avoid that a find downwards continues from top of file after reaching end of file and a find upwards continues at end of file on reaching top of file.
On running a find or replace from within a macro or script UltraEdit (for Windows) never continues find/replace at top/end of file to avoid an endless running find/replace executed in a loop if the script writer evaluates the return code of find/replace functions correct.
The find/replace scripting functions itself return true/false. So the file/replace function can be directly used in an IF condition in a script. The isFound() function exists mainly for historical reasons. The set of macro commands were initially 1:1 implemented as scripting commands. I use never in scripts isFound() and isNotFound() and instead evaluate the return value of findReplace.find() and findReplace.replace(). So I don't know if isFound() and isNotFound() work as they should with returning true/false depending on last find/replace result..
I wrote already grep replacement scripts, see Find strings with a regular expression and output them to new file. Running this script on posted example with search string \b[0-9A-Fa-f]{40}\b results in creating a new file with the lines:
Code: Select all
```
c0c13ddb9e5ec04ab1561bfaebe4920a6f1bdda9
fcaa0fed49330b5d07E1c7a300a545cf41ec8df0
1b866886c46430c871a91ead6e607fad9e291de4
fcaa0fed49330b5d07E1c7a300a545cf41ec8df0
```

Note: Search string [[:xdigit:]]{40} works with Perl regular expression engine of UltraEdit (from Boost library) although it would be better to use \b[[:xdigit:]]{40}\b for correct matching really only strings of exact 40 hexadecimal characters. But the Perl regular expression support of JavaScript core used by the scripts written by me do not support [:xdigit:] which is the reason why [0-9A-Fa-f] or [\dA-Fa-f] must be used on using for example FindStringsToNewFile.js.

fernandom · Feb 07, 2018#32018-02-07T19:18+00:00

Thanks for your answer! Your script works much better :)