Replace UTF-8 character code sequences by ASCII characters with script

Replace UTF-8 character code sequences by ASCII characters with script

2
NewbieNewbie
2

    Sep 23, 2010#1

    Hi all,

    I'm a bit new with Javascript, but after spending a lot of time in it, I have given up.

    I have a file with about 100000 lines, in which are couples of characters which have to be replaced.
    For example: â has to be an a, é has to be an e. There are a few of this combinations, they all start with à except for one, that one start with Ä.

    For some reason I'm not able to Find/Replace two characters such as é. This code doesn't work for example:

    Code: Select all

    var abc = "é";
    var def = "e";
    UltraEdit.activeDocument.findReplace.replace(abc, def);
    I also tried to write a code which would find Ã, then select the character right of it, and then look which combination it is. But mayby it's because I'm not that good with Javascript, but that also didn't work out.

    I'm using UltraEdit version 16.00.0.1036 on a Windows XP Professional Service Pack 3.

    Thanks a lot in advance!

    Joost

    6,603548
    Grand MasterGrand Master
    6,603548

      Sep 23, 2010#2

      It is tricky to replace UTF-8 character code sequences with a script because either the script file itself is interpreted as UTF-8 encoded file or just the string to search for is then also read as UTF-8 character instead of 2 ANSI characters.

      A solution for this problem is to use a regular expression engine and use the escape character of the engine before second character to avoid interpreting the UTF-8 code sequence as UTF-8 character. For example the following script works using the Perl engine with the backslash as escape character, used twice in the variable string below because the backslash is also the escape character for Javascript strings.

      UltraEdit.activeDocument.findReplace.mode=0;
      UltraEdit.activeDocument.findReplace.matchCase=true;
      UltraEdit.activeDocument.findReplace.matchWord=false;
      UltraEdit.activeDocument.findReplace.regExp=true;
      UltraEdit.activeDocument.findReplace.searchAscii=false;
      UltraEdit.activeDocument.findReplace.searchDown=true;
      UltraEdit.activeDocument.findReplace.searchInColumn=false;
      UltraEdit.activeDocument.findReplace.preserveCase=false;
      UltraEdit.activeDocument.findReplace.replaceAll=false;
      UltraEdit.activeDocument.findReplace.replaceInAllOpen=false;

      var abc = "Ã
      \\©";
      var def = "e";
      UltraEdit.perlReOn();
      UltraEdit.activeDocument.findReplace.replace(abc,def);


      Alternatively you can also use the UltraEdit regular expression engine with character ^ as escape character.

      var abc = "Ã^©";
      var def = "e";
      UltraEdit.ueReOn();
      UltraEdit.activeDocument.findReplace.replace(abc,def);
      Best regards from an UC/UE/UES for Windows user from Austria

      2
      NewbieNewbie
      2

        Sep 27, 2010#3

        Thanks a lot for this answer! I'll go and give it a go in my script.

        Thanks a lot!