User to user discussion and support for UltraEdit, UEStudio, UltraCompare, and other IDM applications.

Help with writing and playing macros
11 posts Page 1 of 1
I have a problem that I hope somone can help me with :)
I have a large file, in this file I have som duplicate (an more) values for the same "primary key"- the "key" is always in the same position in the lines.
Is it possible to make (and how) a macro that delete or removes the lines that have duplicate values?
My file have more than 10.0000 lines and I would hate to do this manually :evil:
I also have to keep the file as it is, so I can't import to excel.

I have UE32 ver 11.00b.
The following macro should do the job, but only if no line exists which contains regular expression characters of UltraEdit style like +[]^%$ ... See help of UltraEdit about regular expressions in UltraEdit style. Unix style cannot be used here, because ^c is not available in a Unix style search.

InsertMode
ColumnModeOff
HexOff
UnixReOff
Bottom
IfColNum 1
Else
"
"
EndIf
Top
Clipboard 9
Loop
IfEof
ExitLoop
EndIf
Key END
IfColNumGt 1
StartSelect
Key HOME
Cut
EndSelect
Find MatchCase RegExp "%^c^p"
Replace All ""
Paste
EndIf
Key DOWN ARROW
EndLoop
ClearClipboard
Clipboard 0
Top
UnixReOn

Remove the last red command, if you use regular expression in UltraEdit style by default instead of Unix style.
For UltraEdit v11.10c and lower see Advanced - Configuration - Find - Unix style Regular Expressions.
For UltraEdit v11.20 and higher see Advanced - Configuration - Searching - Unix style Regular Expressions.
Macro commands UnixReOn/UnixReOff modifies this setting.

I have an idea how to do it without a regular expression search, but it is much more tricky and I now have no time to develop this macro set (it cannot be done with a single macro).
Best regards from Austria
Thank you :D I dont see how the macro works, but it actually does :D this will do the job

Thanx from Norway
Hi Norvegian guy, Tag Mofi,

I added some functionality to Mofis good macro:
Make a new tab and list those double(+) lines there, cause I wanna know which lines where 2+times inside.
Then I sort them (you can kick the sort line if you want)

Check it out. :D

rds Bego

Edited my Mofi: Macro source code removed - see below for the improved version.
Normally using all newest english version incl. each hotfix. Win 10 64 bit
this macro gets better and better, I'm glad thare are some helpful people out there :)

Thank you
Thanks Bego for the idea to collect the duplicate lines as additional info and for the information that IfFound and IfNotFound can also be used after a replace. That was new for me although I have written dozens of UltraEdit macros. Even an experienced user like I can learn from others. Thanks again.

I have modified the macro again. Now it also works for files with lines with UltraEdit style regular expression characters and it does not need a second macro as I first thought would be necessary. It now could be also converted to a macro with Unix style regular expressions instead of UltraEdit style. Only 5 simple regular expressions must be changed for Unix style.

The removing duplicate line replace command is now case-sensitive. Remove MatchCase parameter if it should ignore case.

The collection of the duplicate lines is done now with clipboard 8, which improves execution speed a lot. The duplicate lines are sorted. If someone wants this macro without collecting the duplicate line info, remove the red colored lines.

This macro is now added to my private collection of useful macros - see sticky forum topic Macro examples and reference for beginners and experts which contains a macro file with the macros DelDupInfo+ (macro below) and DelDupInfo- (macro below without the red lines).

The macro property Continue if a Find with Replace not found or Continue if search string not found must be checked for this macro.


InsertMode
ColumnModeOff
HexOff
UnixReOff
Bottom
IfColNum 1
Else
"
"
EndIf
Top
Find MatchCase RegExp "%^([~^p]^)"
Replace All "#MOFI_RULES#^1"
Clipboard 8
ClearClipboard

Clipboard 9
Loop
Find MatchCase RegExp "%#MOFI_RULES#*$"
IfNotFound
ExitLoop
EndIf
Cut
Find MatchCase "^c^p"
Replace All ""
IfFound
Paste
Find MatchCase Up "#MOFI_RULES#"
Key HOME
Clipboard 8
Find MatchCase RegExp "%#MOFI_RULES#*^p"
CopyAppend
EndSelect
Key HOME
Clipboard 9
Else

Paste
Key DOWN ARROW
Key HOME
EndIf
EndLoop
ClearClipboard
Top
Find MatchCase RegExp "%#MOFI_RULES#"
Replace All ""
NewFile
Clipboard 8
Paste
ClearClipboard
Top
Find MatchCase RegExp "%#MOFI_RULES#"
Replace All ""
IfNotFound
"NO DUPLICATES :-)
"
Else
SortAsc 1 -1 0 0 0 0 0 0
EndIf
NextWindow

Clipboard 0


Add UnixReOn or PerlReOn (v12+ of UE) at the end of the macro if you do not use UltraEdit style regular expressions by default - see search configuration. Macro command UnixReOff sets the regular expression option to UltraEdit style.


Edit info: Some comments added - see below!

This macro will not work for Unix files opened in Unix mode without conversion temporarily (on file load) or permanently to DOS before macro execution (^p matches CRLF!).

The macro is designed to remove duplicate lines only if a line matches another line 100%. If there are trailing spaces and the trailing spaces of 2 lines displayed identical do not match, the lines will not be removed and reported. Use the command TrimTrailingSpaces at top of the macro after the command Top, if you want to ignore trailing spaces and you can delete it.

2007-11-01: The macro has been rewritten completely because it damaged the file when there are soft-wrapped lines. The new macro works now also for a file with soft-wrapped lines. Also IfEof has been eliminated to let the macro work on Unicode files too, independent of the version of UltraEdit. IfEof works for Unicode files only since UE v13.20.
Best regards from Austria
Bego: Thanks again for this interesting infos (deleted). I will take this into consideration for future macros (see improved version of the macro above).

The "#MOFI_RULES#" string is used as replacement for the regular expression character % to be able to correct handle lines like this without a regular expression (lines with different preceding and trailing spaces):

Code: Select all
Line example
 Line example
 Line example
Another Line example

Nothing should be changed when running the macro on those 4 lines. Third line contains a trailing space, second line not! Select line 2 and 3 and you will see the difference.
Best regards from Austria
Hi sas2000,

thanks for your uedit32.ini. With your configuration I was able to reproduce the problem and find the reason why the macro failed and created a wrong output (= damaged file).

The problem was the soft-wrapping you have enabled. I normally have soft-wrapping of lines not active. I use it normally only when editing HTML files, but have it not active when running macros or scripts.

I did not know how many macro commands depend on wrapping mode on/off. Key HOME, Key END and SelectLine which I have used before for this macro to select a line with or without line ending are executed always on current displayed line which is not the entire real line if the line is currently soft-wrapped. As a result of this the previous macro worked perfect until it reached the first line which was soft-wrapped.

Additionally you have option Replace All is From Top of File active as you can see in the Replace dialog which makes the output even worse.

I have completely rewritten the macro to get correct output(s) also when soft-wrapped lines exist.

I have already deleted all of our previous posts. You can delete now the zip archives and files on your website.

As a result of turning my attention to what happens when a macro is run on soft-wrapped lines which is not designed for working in active word-wrap mode I have now to update also the macros DelDupInfo- and DelDupInfo+ in my macro collection and add many notes to my macro reference. But first I have to find out which macro commands work different depending on word-wrapp mode on/off. That will take some time.

sas2000, many thanks!
Best regards from Austria
2Mofi

It works ok now :D.

Having read your macro i think that it doesn't exist, but do you know any macro commands to switch on/off soft-wrapping & Replace All is From Top of File ?, my knowledge about macros is quite limited and this way i'll avoid problems on my own macros, i've tried :

SoftWrapOff
WrapOff
WrapWordOff
WordWrapOff

but it doesn't work, may you help me ?

Thanks. :!:
Except the active regular expression engine none of the configuration settings can be changed by a macro or script. I have already written twice to IDM support that replace option Replace All is From Top of File should be disabled internally temporarily while a macro or script is running to make the output predictable. For scripts this is the case since UE v13.20, for macros since UE v13.20a.
Best regards from Austria
The macros do not work on Unicode files with characters in file not included in system code page because ^c works only with ASCII/ANSI strings.

Here is macro DelDupInfo- converted to an UltraEdit script which converts the string to search from UTF-16 Little Endian to UTF-8 which the Find and Replace commands of UltraEdit support for a search/replace string. The script can be used also for ASCII/ANSI files.

The function IsUnicode must be additionally included for UE v14.20 to v15.20 when script is executed on Unicode files.

Code: Select all
// Please include here the function IsUnicode for UE v14.20 to v15.20.
// See https://www.ultraedit.com/forums/viewtopic.php?f=52&t=5441

function utf16to8(str)
{
   /* Copyright (C) 1999 Masanao Izumo <iz@onicos.co.jp>
   * Version: 1.0
   * LastModified: Dec 25 1999
   * This library is free.  You can redistribute it and/or modify it.
   * http://www.onicos.com/staff/iz/amuse/javascript/expert/utf.txt */

   var out, i, len, c;

   out = "";
   len = str.length;
   for(i = 0; i < len; i++)
   {
      c = str.charCodeAt(i);
      if ((c >= 0x0001) && (c <= 0x007F))
      {
         out += str.charAt(i);
      }
      else if (c > 0x07FF)
      {
         out += String.fromCharCode(0xE0 | ((c >> 12) & 0x0F));
         out += String.fromCharCode(0x80 | ((c >>  6) & 0x3F));
         out += String.fromCharCode(0x80 | ((c >>  0) & 0x3F));
      }
      else
      {
         out += String.fromCharCode(0xC0 | ((c >>  6) & 0x1F));
         out += String.fromCharCode(0x80 | ((c >>  0) & 0x3F));
      }
   }
   return out;
}

if (UltraEdit.document.length > 0)  // Is any file opened?
{
   // Define environment for this script.
   UltraEdit.insertMode();
   UltraEdit.columnModeOff();
   UltraEdit.ueReOn();

   var sSearch = "";
   var bUnicodeFile = false;
   if (typeof(UltraEdit.activeDocument.encoding) == "number")
   {
      // Is the file encoded in UTF-16 Little Endian or Big Endian or UTF-8?
      if (UltraEdit.activeDocument.encoding == 1200) bUnicodeFile = true;
      else if (UltraEdit.activeDocument.encoding == 1201) bUnicodeFile = true;
      else if (UltraEdit.activeDocument.encoding == 65001) bUnicodeFile = true;
   }
   else
   {
      if(typeof(IsUnicode) == "function")
      {
         bUnicodeFile = IsUnicode();
      }
      else
      {
         if (UltraEdit.outputWindow.visible == false) UltraEdit.outputWindow.showWindow(true);
         UltraEdit.outputWindow.write("Function isUnicode not included. Please add this function to the script.");
         UltraEdit.outputWindow.write("See https://www.ultraedit.com/forums/viewtopic.php?f=52&t=5441");
      }
   }
   var_dump(bUnicodeFile);
   // First go to end of the file and check if the last line of the file has a
   // line termination. If not insert it because the script must compare whole
   // lines. After inserting the line termination, verify if the cursor is now
   // really at column 1. With auto indent enabled and the last line in the
   // file has preceding whitespace, UE/UES has inserted those whitespace
   // also on the last line of the file and the cursor is therefore not at
   // column 1.

   UltraEdit.activeDocument.bottom();
   if (UltraEdit.activeDocument.isColNumGt(1))
   {
      UltraEdit.activeDocument.insertLine();
      if (UltraEdit.activeDocument.isColNumGt(1))
      {
         UltraEdit.activeDocument.deleteLine();
      }
   }

   // Insert at start of every line a special marker string. This is needed
   // because the script should find only whole duplicate lines without using
   // a regular expression search. Without this marker string at start of the
   // line a shorter line could completely match also with a longer line which
   // has additional characters at start of the line.

   UltraEdit.activeDocument.top();
   // UltraEdit.activeDocument.trimTrailingSpaces();

   UltraEdit.activeDocument.findReplace.mode=0;
   UltraEdit.activeDocument.findReplace.matchCase=true;
   UltraEdit.activeDocument.findReplace.matchWord=false;
   UltraEdit.activeDocument.findReplace.regExp=true;
   UltraEdit.activeDocument.findReplace.searchDown=true;
   if (typeof(UltraEdit.activeDocument.findReplace.searchInColumn) == "boolean")
   {
      UltraEdit.activeDocument.findReplace.searchInColumn=false;
   }
   UltraEdit.activeDocument.findReplace.preserveCase=false;
   UltraEdit.activeDocument.findReplace.replaceAll=true;
   UltraEdit.activeDocument.findReplace.replaceInAllOpen=false;
   UltraEdit.activeDocument.findReplace.replace("%^([~^p]^)","#MOFI_RULES#^1");

   // User clipboard 9 always contains the current line including the marker
   // string whose duplicates are searched for in the file below the line.
   UltraEdit.selectClipboard(9);

   // A regular expression find is used to select entire next line without
   // the line ending characters. This method is better than the method used
   // in previous versions of the script with Key END and Key HOME because it
   // works also for long lines which are currently wrapped. Further the
   // regular expression search automatically ignores blank lines. If this
   // regular expression find does not find something, the end of the file
   // is reached and therefore the loop must be exited.

   while (UltraEdit.activeDocument.findReplace.find("%#MOFI_RULES#*$"))
   {
      // Cut the entire line without the line ending to clipboard 9.
      // Then replace all duplicates with a case-sensitive search and
      // replace all. Remove the find option MatchCase to ignore the case.
      UltraEdit.activeDocument.cut();
      UltraEdit.activeDocument.findReplace.regExp=false;
      if (bUnicodeFile)
      {
         // Convert the UTF-16 string in clipboard to UTF-8.
         sSearch = utf16to8(UltraEdit.clipboardContent) + "^p";
      }
      else
      {  // Get the ASCII/ANSI string directly from clipboard.
         sSearch = UltraEdit.clipboardContent + "^p";
      }
      UltraEdit.activeDocument.findReplace.replace(sSearch,"");
      UltraEdit.activeDocument.findReplace.regExp=true;
      // Use below command Delete instead of Paste and Key DOWN ARROW if you also
      // want the line itself which has duplicates deleted from the source file.
      UltraEdit.activeDocument.paste();
      UltraEdit.activeDocument.key("DOWN ARROW");
      UltraEdit.activeDocument.key("HOME");
   }

   // Clipboard 9 is not needed anymore and so it can be cleared
   // to free the RAM used for the current content of clipboard 9.
   UltraEdit.clearClipboard();
   // Back at top of the file remove all marker strings inserted
   // at start of the script to mark start of a line.
   UltraEdit.activeDocument.top();
   UltraEdit.activeDocument.findReplace.replace("%#MOFI_RULES#","");
   // Last switch back to the Windows clipboard.
   UltraEdit.selectClipboard(0);
}
Best regards from Austria
11 posts Page 1 of 1