Killing UltraEdit during "Save selection as file" corrupted original huge file

Killing UltraEdit during "Save selection as file" corrupted original huge file

2
NewbieNewbie
2

    Oct 14, 2016#1

    Hi,

    I was recently using UltraEdit on Windows 10 to handle a tremendously large binary file (~50 GB, a core dump from a memory-intensive process that I couldn't let run to completion). Due to the size, I used all of the suggestions here, meaning I was viewing the direct memory and not a copy.

    My aim was to extract a particular ~2 GB section for further processing. However, since this section was about 40 GB deep in the file, the "goto" command didn't work (it can seek to at most byte 0xffffffff, and my target began well past that point). As a coarse first step, I made a selection ranging from a point before the target segment to a point after this segment and ran "Save selection as file," knowing that in the saved file both endpoints would be in the first 0xffffffff bytes and I could then extract the section my program expected to analyze.

    However, the file that UE created was empty. Confused, I thought it might have had an issue with my selection, as sometimes with large files the highlighted portion isn't an accurate reflection of the true selection. I began the same procedure a second time, but before selecting anything UE froze and claimed "not responding." I let it hang for a couple minutes before closing it. It said unsaved data might be lost, but I was confident nothing I had done would actually modify the file.

    Upon reopening it, the target section was no longer there. The latter ~20 GB of the file was overwritten with 00 bytes, signaling something was corrupted. The other file was still empty. The target section was lost and likely unrecoverable (I've been trying..).

    Whether this is entirely my fault for quitting the non responsive process or it's the result of some other bug, I would at least like to suggest a change for the "Save selection as file" feature. If possible, the created file shouldn't be visible to the user until the selection has been copied in, and not created if unsuccessful.

    However, I feel it is possible that the issue lies in the handling of any file contents past byte 0xffffffff. It seems not unlikely given the behavior of "goto" that many built-ins are simply not equipped to handle bytes 0x100000000 and beyond, and that my trying somehow caused the program to misbehave and altered memory.

    After the fact, I wrote a script in python that easily extracts a specified section of arbitrarily large files with no 0xffffffff byte restriction, but unfortunately had no valid file to run it on by that point. I do like UE's environment, but this problem cost me the result of ~1 week's computation.
    :cry:

    6,602548
    Grand MasterGrand Master
    6,602548

      Oct 15, 2016#2

      In general it is never a good idea to kill a process which is not responsive to user actions or Windows system events because of running something in main thread and not in an background thread which is designed for being breakable. The process should be given the chance to become responsive again, i.e. finish current task. When known that the process is doing something on a huge amount of data, 30 minutes and more should be given the process to complete. I have seen Windows 8/8.1/10 updates finishing after a reboot with indicating already 100 % done, but which needed nevertheless 40 additional minutes to really complete the update and automatically reboot the machine once more. And I repaired already several Windows installations because their users powered off the computer thinking the Windows update does hang. At least it should be checked on no file system activity for at least 5 minutes by looking on hard disk activity led (if available) or using a tool like Sysinternals Process Monitor before a process is killed to avoid corruption on file data or even more worse on file system.

      Okay, back to UltraEdit. It is possible to edit files with more than 4 GB in hex edit mode in UltraEdit. But as you also encountered there are some limitations for this special use case.

      The first one is that still a 32-bit variable is used for address offset in file opened in hex edit mode even if the file is larger than 4 GiB. This can be seen on address field in document window, on status bar at bottom and Goto command not supporting entering an address/offset greater than 0xFFFFFFFF respectively 4294967295. I have reported already October 2013 with an email with subject Negative position values in hex edit mode on files larger than 2 GiB that a 64-bit integer variable should be used for the variable showing address offset in file and on status bar. But it looks like I was the only one reporting this issue as this issue is still not fixed by a developer of UE.

      The second one is that UltraEdit on scrolling the first time scans the whole file opened in hex edit mode for line terminators on having Disable line numbers not checked in configuration although line counting is completely useless in hex edit mode. The useless parsing for newline characters is indicated on status bar. The user has to wait useless several seconds after first scroll on huge file opened in hex edit mode with Disable line numbers not checked because of this useless parsing for newline characters before repositioning in file becomes fast. I reported this issue also by email to IDM support August 2007.

      I have never used before the command Save Selection As for a selected block in a binary file opened in hex edit mode. So I created quickly a binary file with 6.680.972.800 bytes (copied image backups together to a single file) and opened it in UltraEdit without using a temporary file. I used the vertical scroll bar to get to a position beyond 0xFFFFFFFF, pressed Ctrl+Shift+End and executed Save Selection As. But all I got was the error message Cannot allocate memory. Well, my old notebook with just 2 GiB of total RAM did not have obviously a free single RAM block for this operation and I have turned off the usage of virtual memory, i.e usage of pagefile.sys.

      For that reason I moved back to file position offset 0x100000000 (4 GiB), selected just 4096 bytes and executed once again Save Selection As. That worked without any problem. I increased the selected block to over 400 KiB and saved the selection which worked again. I increased the selection over 4 MiB and saved the selection which worked again, but required already a few seconds to finish.

      Last I selected over 650 MiB beyond 0x100000000 and executed Save Selection As which also worked. During this operation I watched the file system accesses of UltraEdit using Sysinternals Process Monitor and could not see anything really critical which I would report to IDM support immediately. The source file was not corrupted by the operation as UltraEdit just made read accesses on it. Well, I did not kill UltraEdit.

      UltraEdit v22.20 for Windows XP (as using my old Windows XP machine for this test) read first the entire selected block in chunks of 64 KiB, 16 KiB and 48 KiB from opened file into memory and then wrote the read data from memory in chunks of 256 KiB to the target file. This is the most efficient method for copying large data on a magnetic hard disk or another storage media with a rotating medium.

      But to avoid the RAM limitation on saving (copying) a really huge selected block it would be better to run in a loop
      1. reading a larger block of several MiB from source file like 4, 8, 16 or 32 MiB into memory
      2. and writing that block from memory into new file
      3. and continuing with first step until entire selection saved (copied) from source into target file.
      And if the selection made in text or in hex edit mode is beyond some threshold value like 256 MiB, it would be good to do this block copying in chunks in a breakable background thread and showing in the meantime a dialog with a progress indication and a button to abort the operation.

      All this could be requested by an enhancement request email to IDM support for working with Save Selection As on a huge file opened either in text or hex edit mode. Please feel free to do so. I'm not using UltraEdit for editing such large files and therefore I'm myself not really interested in such an enhancement.
      Best regards from an UC/UE/UES for Windows user from Austria

      2
      NewbieNewbie
      2

        Oct 15, 2016#3

        Thanks Mofi for the reply and for making the post title a bit more descriptive for future viewers.

        I do think UE should be using 64-bit addressing to improve the rare cases such as mine where 32-bit is insufficient. As you mentioned though, it is very difficult to do anything meaningful with a file that large without running into RAM limitations and computation times on the order of hours.
        Mofi wrote: Last I selected over 650 MiB beyond 0x100000000 and executed Save Selection As which also worked. During this operation I watched the file system accesses of UltraEdit using Sysinternals Process Monitor and could not see anything really critical which I would report to IDM support immediately. The source file was not corrupted by the operation as UltraEdit just made read accesses on it. Well, I did not kill UltraEdit.
        I'm glad you did more thorough testing to verify that killing the process was the likely culprit. I am still surprised that killing a read-only process is able to corrupt the source file. I didn't have the file opened in read-only mode which was foolish of me, but I'm not certain that would have made a difference.

        I will send an email to support to see if they are interested in changing this behavior, but I think I won't be using UE for this kind of processing anymore. The extraction process could have been carried out in under a minute with a python script (perhaps only so quickly because I have 16 GB RAM with up to 59 GB pagefile and a SSD). UE still is best for viewing the file, but if I encounter another situation like this in the future I will use read-only mode.

        Thanks again!