Character manipulation - Resetting an individual bit

Character manipulation - Resetting an individual bit

2

    Apr 28, 2008#1

    Hi,

    Firstly, this is not so much of a find / replace question but more a character manipulation question so apologies if it's posted in the wrong place.

    I have a number of files produced from a data feed where traditional 7-bit ASCII charcters are sent through but the top-bit (the 8th bit) is being used / abused to contain a flag. The engine that processes this data is aware of the flag and handles the data as appropriate, but the abuse of this bit flag makes the data unreadable to humans.

    E.g.
    The ASCII character for A is 0x41 (65) - 1000001
    However in the files I have this may come through as
    0x41 (65) - 01000001
    or 0xC1 (193) - 11000001

    To make the data human readable I need to be able to reset this top-bit to 0.

    Does anyone know of a simple way to do this or should I be looking to create a script to do this, or am I in the custom tools arena?

    Thanks,

    Jon

    262
    MasterMaster
    262

      Apr 29, 2008#2

      Hi schofield860 - this is my suggestion for a script that will remove bit 8 - it may not be particular fast if you are working on very large files though. Also You will need UE13 or above. Have fun!

      Code: Select all

      // This script will remove bit 8 in the active file.
      // Works only on DOS files (not UTF/Unicode)
      
      // Use perl regular expressions
      UltraEdit.perlReOn();
      
      // Start from the top
      UltraEdit.activeDocument.top();
      
      // Set up search defaults:
      UltraEdit.activeDocument.findReplace.matchCase=false;
      UltraEdit.activeDocument.findReplace.matchWord=false;
      UltraEdit.activeDocument.findReplace.regExp=true;
      UltraEdit.activeDocument.findReplace.searchDown=true;
      UltraEdit.activeDocument.findReplace.searchInColumn=false;
      
      // Search for all 8 bit chars 0x80-0xFF
      while (UltraEdit.activeDocument.findReplace.find("[\\x80-\\xFF]")) {
       // get selected char
       var char = UltraEdit.activeDocument.selection;
       
       // remove bit 8
       var newChar = to7Bit(char);
       
       // Write the 7 bit char back into the document:
       UltraEdit.activeDocument.write(newChar);
      }
      
      // Return to the top
      UltraEdit.activeDocument.top();
      
      // inline function that removes bit 8 in the input line
      function to7Bit(char) {
       // Get decimal character code 
       var dec = char.charCodeAt(0);
       
       // remove bit 8 using XOR bit operation (100000000 = 128 dec)
       var decxor = dec ^ 128;
      
       // return new char converting the decimal code with fromCharCode
       return String.fromCharCode(decxor);
      }

      2

        Apr 30, 2008#3

        Thanks for your quick reply (actually forgot to check back until today).

        Not to bothered about the performance as it's not for a live processing environment, just to enable me to read the data.

        I'm going to have a play with the script and will let you know how I get on. (I will try and add some code to check the underlying file is the correct format etc before proceeding - this will be a good excuse to learn a bit more about UE's scripting.)

        Thanks again,

        Jon