Do you know anything about
character encoding, i.e. with which bytes a character is stored in a text file?
The
trade mark sign is encoded in a
UTF-16 Little Endian encoded file with the two bytes
22 21. The same character is encoded in a UTF-16 Big Endian encoded file with the bytes
21 22 (reverse byte order in comparison to Little Endian). And in a
UTF-8 encoded file the character is encoded with the three bytes
E2 84 A2 which are displayed as
â„¢ on interpreting and displaying a UTF-8 encoded file as "ANSI" file using code page
Windows-1252. A text file using an encoding with just one byte per character like Windows-1252 contains the trade mark sign with the byte
99. The byte values posted here are the hexadecimal values of the bytes.
Microsoft called all character encodings with just one byte per character using a code page (table) for mapping 256 characters to 256 bytes used in GUI applications ANSI encoding although this is not really correct because of not all those character encodings were standardized by
American National Standards Institute (for U.S.) or
International Organization for Standardization (ISO ... defines international standards). Windows-1252 is a code page not defined as real standard, but is supported nevertheless by all applications capable interpreting text data.
So it depends on
how you search for the trade mark sign respectively for a character being NOT a trade mark sign after a sequence of other characters?
A
Find/Replace in current file should always work as UltraEdit has usually on file open detected the used character encoding automatically and knows therefore how the character
™ is represented with which bytes in current file.
On using
Find/Replace in Files it gets more complicated because of the searched text files can be encoded different. "ANSI" encoded files and UTF-8 encoded files with no byte order mark (BOM) are very hard to distinguish by an application, see
How does automatic UTF-8 encoding detection work in UltraEdit and UEStudio? UltraEdit does not analyze the entire file on using
Find/Replace in Files to automatically detect if a file not being UTF-16 LE or UTF-16 BE or UTF-8 with BOM encoded is an "ANSI" or a UTF-8 without BOM encoded file. That would dramatically slow down
Find/Replace in Files. So it is recommended that the user running a
Find/Replace in Files searching for non-ASCII characters, enables the advanced find/replace in files option
Use encoding and select the right encoding like
65001 (UTF-8) for running the
Find/Replace in Files on UTF-8 encoded files.
You have unfortunately not posted anything about version of UltraEdit used by you on which operating system and how the files are encoded on which find/replace is executed and which find/replace you execute. So I can't help with detailed instructions on how to run the finds/replaces for a better result.
In case of some text files are Windows-1252 encoded and others are UTF-8 encoded as it looks like, I recommend to run first a
Find in Files or
Replace in Files with option
Use encoding checked and
65001 (UTF-8) selected on using search string
PRODUCT[~™] because in this case UltraEdit definitely knows that it has to search for the bytes
50 52 4F 44 55 43 54 (on a case-sensitive search) with the next three bytes NOT
E2 84 A2.
Next run
Find in Files or
Replace in Files a second time with option
Use encoding checked and this time
1252 (ANSI - Latin I) selected as this makes it clear for UltraEdit to search for the bytes
50 52 4F 44 55 43 54 (on a case-sensitive search) with next byte NOT
99.
Well, the second search for Windows-1252 encoded
PRODUCT[~™] is really problematic on running on files which are UTF-8 encoded as find is positive also on
™ present in UTF-8 encoded file.
It might work also using
Use encoding with
Auto-detect selected depending on used version of UltraEdit.
Best is perhaps a
Find in Files or
Replace in Files with
Perl regular expression search string
PRODUCT(?!\xE2\x84\xA2
|\x99
) without using
Use encoding which produces a positive match on all
PRODUCT on which neither next three bytes being
E2 84 A2 nor next byte being
99.
I have added a ZIP file containing all the same two lines of text, but no file is binary equal with any other file. You can use this sample files to test the
Find/Replace in Files you want to execute on your files. It should also help to understand what character encoding means.