Finding in bytes (hex)

rseiler · Jul 24, 2016#12016-07-24T01:51+00:00

I just noticed that the Find dialog says "Hex strings should have two characters per byte," but for the purposes of searching, that's counterintuitive, at least to me. If you enter a long string and it happens to have an odd number of characters accidentally, your search will fail even if that string is actually there. It's just too easy to make that mistake.

Why does it have to be this way in UE?

Mofi · Jul 24, 2016#22016-07-24T09:02+00:00

There is a misunderstanding. A hexadecimal search must be with 2 characters per byte, not an ASCII search.

In Hex Edit mode (binary file) you can for example search for the ASCII string UltraEdit with Find ASCII option checked.

But if this string is stored in the binary file in Unicode with UTF-16 Little Endian encoding, you won't find this string. In this case you need to search for UltraEdit with a hexadecimal search with Find ASCII option not checked. The hexadecimal search string would be in this case

55 00 6C 00 74 00 72 00 61 00 45 00 64 00 69 00 74 00

or alternatively without the spaces

55006C007400720061004500640069007400

Now it should be clear what the information below the search string edit field means which is:

UE wrote:Hex strings should have two characters per byte and each byte may be separated with a space.
Example: FF FE FD or FFFEFD

Let us think you make a mistake and miss a hexadecimal digit as in the string 55006C00740072006004500640069007400. Can you quickly see where the hexadecimal digit is missing? How should UltraEdit find out where the hexadecimal digit is missing in the search string?

But what about a hexadecimal string with each byte separated by a space and one hexadecimal character is missing like in

55 00 6C 00 74 00 72 00 6 00 45 00 64 00 69 00 74 00

It would be easy to detect with code that there is a hexadecimal digit missing in this string and also where. But how to handle this? Should the search be now with 0x06 or with 0x60? Well, both would be wrong and the user who made the typing mistake would be misled to byte sequence not existing in binary file instead of getting noticed about a missing hexadecimal digit in the search string.

See also Find Unicode text in hex edit mode for a scripting solution to easily find a text encoded in UTF-16 LE in a file displayed currently in hex edit mode.

rseiler · Jul 24, 2016#32016-07-24T15:21+00:00

OK, let me give an example (and I'm talking hex searches only, the ASCII option you mentioned is unchecked).

This exists in the file:
55 00 6C 00 74 00 72 00 61 01

You copy this from elsewhere (no spaces, since that's just often the way) and paste it into Find:

55006C0074007200610

That search will fail, even though it's in the file. It fails because there are an odd, not even, number of digits.

If you somehow happen to notice that's the situation, fine. If you go through the bother of putting spaces between each byte, fine, you'll notice it right away. But why even think of such things? The more likely result is that you will conclude that that it doesn't exist in the file, which is wrong.

I understand that a byte in reality is two characters, but we're talking searching in UE here, not that.

fleggy · Jul 24, 2016#42016-07-24T15:45+00:00

Hi rseiler, the sequence 55006C0074007200610 can be found only as a plain text in a text file. If you consider this as a hexadecimal pattern then the search must fail because there is no match at the last byte (it is incomplete). You search whole bytes and not nibbles (1 nibble === 4 bits) in hexadecimal Find. You have to accept that the length of a hexadecimal search string must be even.

BR, Fleggy

rseiler · Jul 24, 2016#52016-07-24T17:56+00:00

I feel that I must grudgingly accept.

Thanks.

Jul 24, 2016#62016-07-24T23:00+00:00

Coming at this from another angle then, I thought of a way in which the mistake could never be made: as long as the ASCII options aren't checked, enforce searching only in byte chunks. In other words, "Find Next" doesn't even activate if you have a stray nibble. As opposed to it "working" but finding nothing, which is completely misleading.

Bonus: For visual flair, automatically format the contents of the "Find what" field so that, for example, "55006C" becomes "55 00 6C".

Mofi · Jul 25, 2016#72016-07-25T07:44+00:00

rseiler, your first idea is good. I suggest to send it by email to IDM support as an enhancement request, see top of the page.

I have sent similar requests in the past to IDM support like automatically checking option Find ASCII if the search string contains any other character than space and 0-9A-Fa-f on leaving input focus from edit field.

But I think, it is not good to disable button Find Next when Find ASCII is currently not enabled and the number of hexadecimal digits is not even. This would mean to disable/enable the button permanently while typing the bytes if not pasting the entire byte stream from somewhere else as you have done. That would be irritating for the user. I think it would be better to check the string on clicking on button Find Next and when Find ASCII is not enabled and the number of hexadecimal digits is odd, show an error message instead of running the search which always fails in this case.

Also separating the two hexadecimal digits of a byte with a space should be done only after button Find Next and Find ASCII is not enabled and the number of hexadecimal digits is odd to make it easier for the user to find the missing hex. digit. The user should have the freedom to enter any string and then select the appropriate options below and therefore it is not good to insert the spaces automatically while entering the search string. Just because of Find ASCII is currently not enabled should not prevent the user to enter an ASCII/ANSI string. On the other hand the automatic space inserting after clicking on Find Next and search is indeed hexadecimal and number of hex. digits is odd could be also counterproductive for the user. If there is really a hex. digit missing somewhere in the middle, the now inserted spaces makes it more difficult for the user to correct the string as lots of spaces must be moved in the search string after inserting the missing hex. digit in the middle. Also the reason for the wrong search string format could be the unintentionally omitted checking of Find ASCII option. In this case inserting the spaces would be also counterproductive for the user.

Summary: I will suggest per email to IDM support to check on click of button Find Next and option Find ASCII is not enabled that the entered search string consists of only [ 0-9A-Fa-f] and if length of search string after removing all spaces is even before running the hexadecimal search. Otherwise an error message should be displayed informing the user that either there is a non hexadecimal digit in search string or the number of hexadecimal digits is not even.

rseiler · Jul 25, 2016#82016-07-25T18:11+00:00

Yes, I think that's a great improvement on the first idea. Exactly that, or something very similar, should happen.

As for the auto-formatting one, that is a little more fraught (and certainly less important), and the issues you bring up are definite considerations. I'm not sure which way to go with that now.