Displaying text containing extended ASCII line drawing characters

Displaying text containing extended ASCII line drawing characters

20
Basic UserBasic User
20

    Aug 02, 2015#1

    I think I may have finally found a way to successfully save a text file that contains high ASCII line drawing characters so that they display correctly in UltraEdit. A few minutes ago, I captured part of a screen from a DOS program window, pasted it into an empty Notepad window, and saved the file, setting the encoding to UTF-8 (by overriding the default). Moving to my main computer, a Windows 7 X64 machine, I habitually opened the file in UltraEdit, and, what to my wondering eyes should appear, but line draw characters, as lines!

    If this is true, it resolves an issue of long standing, and could benefit others, too, perhaps.

    I am posting in this forum for two reasons.
    1. Share my discovery with the community of UltraEdit users.
    2. Solicit input regarding whether I have missed something important that I must know before I take this discovery to the next stage.
    To the latter end, I am subscribing to this thread.

    6,603548
    Grand MasterGrand Master
    6,603548

      Aug 03, 2015#2

      Some comments on your post:
      1. High ASCII is the wrong term for Extended ASCII, see ASCII Table and Description.
      2. DOS program window is the wrong term for console window or command prompt window.

        Windows 9x supported DOS applications via command.com because those Windows were based on DOS.

        But since Windows NT4 the Windows NTVDM (NT Virtual DOS Machine) was started automatically in background to run 16-bit DOS applications while 32-bit console applications where processed directly by command processor cmd.exe. Nowadays there is no native Windows support anymore for DOS applications and only 32-bit and 64-bit console applications can be executed because most computers are running now 64-bit Windows 10/8.1/8/7/Vista. (This description is simplified as in real it is much more complicated.)
      3. Which code page is used in console window depends on Region and Language settings.

        By opening a command prompt window and executing there for example command chcp results in getting displayed of by default used code page by command processor in console windows.

        Code: Select all

        Active code page: 850
        It is also possible to execute in command prompt window the command mode con to get used code page of device con displayed:

        Code: Select all

        Status for device CON:
        ----------------------
            Lines:          300
            Columns:        80
            Keyboard rate:  31
            Keyboard delay: 1
            Code page:      850
      4. In Western European countries usually code page 850 is used while in North American countries code page 437 is used by default which are both OEM code pages (Microsoft term for those old MS-DOS based code pages). The characters with code value 128 to 255 are different to code page Windows-1252 used in Windows GUI applications like UltraEdit in Western European and North American countries by default when Unicode is not used at all.
      With the right settings it is very easy in UltraEdit to create, view and edit .bat or .cmd or .nfo files using OEM code page according to Windows region and language settings while all other files viewed and edited in same instance of UltraEdit use the suitable Windows code page.

      So for me there is no problem to create in UltraEdit a batch file with following lines:

      Code: Select all

      @echo off
      echo ╔═══════════════════════════════════╗
      echo ║ This text is using code page 865. ║
      echo ╚═══════════════════════════════════╝
      pause
      Well, marking a block in a console window, copying this block to clipboard and pasting it into a Windows GUI application is another story. There are multiple clipboard formats.

      How Notepad detects
      • which format the text in clipboard has and therefore
      • which code page was most likely used for the text copied to clipboard and
      • which conversion to apply on paste
      to get the text displayed in Unicode encoded text file after paste like in source window of the text is something I don't know. Perhaps Microsoft uses a private clipboard format.

      But I could see on my small batch example that developers of IDM could try to do some things better regarding detection of encoding of text in clipboard.
      Best regards from an UC/UE/UES for Windows user from Austria

      20
      Basic UserBasic User
      20

        Aug 04, 2015#3

        Mofi:

        Thanks for your excellent and thorough tutorial, written from the perspective of someone who lives outside my home country.
        Mofi wrote:Some comments on your post:
        1. High ASCII is the wrong term for Extended ASCII, see ASCII Table and Description.
        2. DOS program window is the wrong term for console window or command prompt window.

          Windows 9x supported DOS applications via command.com because those Windows were based on DOS.
        In particular, thanks for adding the technically correct terms for all of the above. Although I know all of them, I used the more colloquial terms for the benefit of those in the audience who use UltraEdit for something other than programming.

        Thanks also for reminding me about the chcp and mode commands, which I used so infrequently, even when DOS ruled the world, that I had forgotten about them.

        Thankfully, I am fairly familiar with clipboard formats, and I know, for example, that plain text copied into the clipboard is often available in two or more formats, e. g., Plain Text and Unformatted Unicode Text, both of which can be seen in the Paste Special dialog box displayed by any Microsoft Office application.

        Your response, and the absence of others, tells me what I needed to know.

        Thank you!