How to check for duplicate characters in a single line of text?

How to check for duplicate characters in a single line of text?

10
Basic UserBasic User
10

    Apr 30, 2018#1

    Hi.

    I have a single line of 1200+ characters.
    Is it possible for UltraEdit to detect for duplicate characters within this single line?

    Here's just a sample of the characters. The full line also contains punctuation, symbols etc..:
    ÔôÖöØøÚúÙùÜüŽžἀἁἂἃἄἅἆἇἈἉἊἋἌἍἎἏἐἑἒἓἔἕἘ

    To simplify - if the line was simply "abcda", it would detect there are two "a" characters, highlight each duplicate it found so I could delete, then press a Next dialog to move on to the next duplicate and so on.

    Thanks.

    6,680583
    Grand MasterGrand Master
    6,680583

      Apr 30, 2018#2

      Yes, this check is possible with UltraEdit.

      But it is necessary to code an UltraEdit script for this task to check with a loop each character on a line against each following character in same line, something like posted at Results of findDupLines.js from Macros & Scripts downloads page, but for duplicate characters in active line instead of duplicate lines in entire file.

      It is also possible with a quite simple UltraEdit macro to achieve no duplicate character in active line if it does not matter if the characters are sorted alphabetically in line too.
      Best regards from an UC/UE/UES for Windows user from Austria

      10
      Basic UserBasic User
      10

        Apr 30, 2018#3

        Mofi wrote:It is necessary to code an UltraEdit script for this task to check with a loop each character on a line against each following character in same line.
        That's beyond my capabilities.
        Mofi wrote:It is also possible with a quite simple UltraEdit macro to achieve no duplicate character in active line if it does not matter if the characters are sorted alphabetically in line too.
        That sounds ideal but I don't know where to begin.

        Thanks anyway.

        6,680583
        Grand MasterGrand Master
        6,680583

          May 01, 2018#4

          The macro solution was fast to create by recording the macro and then edit it to take some possible variations into account.

          Code: Select all

          InsertMode
          ColumnModeOff
          HexOff
          SelectLine
          IfSel
          Clipboard 9
          Copy
          NewFile
          UTF8ToASCII
          ASCIIToUnicode
          UnixMacToDos 
          Paste
          IfColNum 1
          Key BACKSPACE
          EndIf
          Top
          UltraEditReOn
          Find MatchCase RegExp "^(?^)"
          Replace All "^1^p"
          SortAsc RemoveDup RemDupByAllKeys RemKey1 RemKey2 RemKey3 RemKey4 1 -1 0 0 0 0 0 0
          Find MatchCase "^p"
          Replace All ""
          Bottom
          InsertLine
          SelectAll
          Copy
          CloseFile NoSave
          Paste
          ClearClipboard
          Clipboard 0
          EndIf
          
          This macro selects active line and copies the line via user clipboard 9 into a new UTF-16 encoded file with DOS line endings independent on which encoding and which line ending is configured for a new file.

          If the selected line ends with a line termination as usual except the selected line is at end of active file having no line termination, this line termination is deleted in new file using command Key BACKSPACE.

          Next a very simple UltraEdit tagged regular expression replace all is executed from top of file to insert a DOS line ending after each character in file. After this replace the new file contains multiple lines instead of a single line each with just one character in line.

          A case-sensitive sort for sorting the lines with removing duplicate lines is executed next to sort the characters and remove duplicate characters.

          A simple replace all searching for DOS line endings and removing them all converts the lines with one character per line back to a single line with no line termination.

          The line termination is added with command InsertLine before selecting and copying the line to user clipboard 9.

          After closing new file without saving it and pasting the updated line back into active file with overwriting still selected line, user clipboard 9 is cleared and clipboard of operating system is selected as active clipboard.

            May 01, 2018#5

            Here is an UltraEdit script solution which requires at least UltraEdit for Windows v24.00 or UEStudio v17.00 for a file containing Unicode characters with a Unicode code value greater 255.

            Code: Select all

            if (UltraEdit.document.length > 0)  // Is any file opened?
            {
               // Define environment for this script.
               UltraEdit.insertMode();
               if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
               else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();
            
               // Select active line and continue if there is really something selected.
               UltraEdit.activeDocument.selectLine();
               if (UltraEdit.activeDocument.isSel())
               {
                  // Copy the selected line to a string variable and determine number
                  // of characters in line with including the newline characters.
                  var sActiveLine = UltraEdit.activeDocument.selection;
                  var nCharCount = sActiveLine.length;
            
                  // Create an updated line with all duplicate characters removed.
                  var sUpdatedLine = "";
                  for (var nCharIndex = 0; nCharIndex < sActiveLine.length; nCharIndex++)
                  {
                     if (sUpdatedLine.indexOf(sActiveLine[nCharIndex]) < 0) sUpdatedLine += sActiveLine[nCharIndex];
                     if (sActiveLine[nCharIndex] == '\r') nCharCount--;
                     if (sActiveLine[nCharIndex] == '\n') nCharCount--;
                  }
            
                  // Has the length of the line changed on removing duplicate characters?
                  if (sActiveLine.length != sUpdatedLine.length)
                  {
                     // Write the updated line over still selected line in file.
                     UltraEdit.activeDocument.write(sUpdatedLine);
                     // Write an appropriate information into output window.
                     var nCharDiff = sActiveLine.length - sUpdatedLine.length;
                     UltraEdit.outputWindow.write("Removed " + nCharDiff + " duplicate character" +
                                                  ((nCharDiff  > 1) ? "s" : "") + " from line with " +
                                                  nCharCount + " characters.");
                  }
                  else  // The line does not contain any duplicate character.
                  {
                     // Cancel the selection and write an appropriate information into output window.
                     UltraEdit.activeDocument.cancelSelect();
                     UltraEdit.outputWindow.write("Found no duplicate character in line with " +
                                                  nCharCount + " character" + ((nCharCount != 1) ? "s." : "."));
                  }
               }
            }
            
            The script does not sort the characters according to their code values, it just removes duplicate characters.

            The not explicitly opened output window contains information about processed line.
            Best regards from an UC/UE/UES for Windows user from Austria