HTML Tidy outputs always with nonsense characters

HTML Tidy outputs always with nonsense characters

2
NewbieNewbie
2

    Jul 06, 2010#1

    No matter how I adjust the options, it always turns my webpage to unrecognizable characters, like the image attached. Does anyone know what's wrong? Thx
    646464.jpg (67.69KiB)

    6,688587
    Grand MasterGrand Master
    6,688587

      Jul 06, 2010#2

      Upload as attachment to a further post your HTML (PHP) input file and the entire section [HTMLTidyOptions] from %appdata%\IDMComp\UltraEdit\uedit32.ini to have any chance to find out the reason for this output. Please tell us also which version of UltraEdit you use. English UltraEdit version 16.10.0.1035?

      The output I get in the output window when parsing one of my HTML files with HTML Tidy of UE v16.10.0.1035 is only
      HTML Tidy Parsing ...OK
      and the tidy document in the ** HTML Tidy Output ** document window is well encoded and displayed.

      My HTML file contains in head section the line

      <meta http-equiv="content-type" content="text/html; charset=iso-8859-1">

      and the file is an ANSI file using this code page. To be more correct, it is an ASCII file because HTML entities used for ANSI letters.
      Best regards from an UC/UE/UES for Windows user from Austria

      2
      NewbieNewbie
      2

        Jul 07, 2010#3

        I fixed this:
        Download a portable trial version UE and it works so I mirrored every configuration item from it and found I set the UE to create a new file always as utf-8 format.
        After changing that to ascii(default) it works well, don know why though, thank you :D

        6,688587
        Grand MasterGrand Master
        6,688587

          Jul 08, 2010#4

          Thanks for posting what caused the strange display. HTML Tidy always outputs in ASCII/ANSI. UltraEdit should therefore ignore the setting for new files when capturing the output of HTML Tidy into a new file or correct convert the Tidy output from ANSI to UTF-8 or UTF-16. I will report this issue by email to IDM.

          Edited on 2010-08-31: With UltraEdit v16.20.0.1009 this issue is partly fixed. If new files are UTF-8 encoded files according to the configuration setting, UltraEdit converts the new file first to ASCII / ANSI before writing the output of HTML Tidy to the new file and therefore the parser output is readable in the new window. But if new files are UTF-16 encoded files according to the configuration setting, the HTML Tidy output in the new file is still wrong and so not readable at all.

          Edited on 2010-12-03: The remaining problem with UTF-16 as default for new files is fixed with UltraEdit v16.30.0.1000. Now the HTML Tidy output file is always an ASCII / ANSI file if the input HTML file is also an ASCII / ANSI file independent of the configuration setting for the encoding type of new files.
          UltraEdit v16.30.0.1000 introduced the new feature that an HTML file encoded with UTF-8 results in a UTF-8 Tidy output file independent of the configuration setting for the encoding type of new files and independent of the Tidy option for char-encoding. But UE v16.30.0.1000 does not display the UTF-8 encoded file correct as Unicode file. The UTF-8 encoded output file must be stored, closed and re-opened to get it correct displayed. I reported this new issue by email to IDM.
          Best regards from an UC/UE/UES for Windows user from Austria

          1581
          Power UserPower User
          1581

            HTML Tidy: conversion of umlauts ä, ö, ü ...

            Jan 11, 2012#5

            I use UE 14.20 German to write HTML with German umlauts (ä, ö, ü, ...)

            When I use HTML Tidy they are converted from

            Code: Select all

            Müller Möhre Öse
            to

            Code: Select all

            Müller Möhre Öse
            How to avoid this?
            Here are my settings:
            http://imageshack.us/photo/my-images/21 ... mltidy.png

            I tried different setting for "Char encoding", but there was no difference ...

            Peter

            6,688587
            Grand MasterGrand Master
            6,688587

              Re: HTML Tidy: conversion of umlauts ä, ö, ü ...

              Jan 11, 2012#6

              You have probably read char-encoding option explanation. UltraEdit is not up-to-date regarding listing all possible values for this option. If you want full control, store the HTML Tidy options you want to use with a value different from default in a text file and specify this file instead of using the options you can configure in the dialog and which are stored in uedit32.ini.

              According to your report I suppose your HTML file is encoded with UTF-8. So you should see in the status bar at bottom of the UltraEdit window U8-DOS or U8-UNIX. And your HTML file contains in the head section probably (and hopefully) also

              <meta http-equiv="Content-Type" content="text/html; charset=utf-8" >

              or something similar according to document type. In this case HTML Tidy outputs also everything always in UTF-8. But your version of UltraEdit does not recognize that the output captured is encoded in UTF-8. So everything captured from HTML Tidy is written into an ANSI file. If you save the HTML Tidy output as file, close the document window and reopen it, you will see that the German umlauts look fine because now UltraEdit recognizes the characters as UTF-8 encoded.

              Newer versions of UltraEdit support UTF-8 encoded HTML Tidy output, see my above post.

              1581
              Power UserPower User
              1581

                Re: HTML Tidy: conversion of umlauts ä, ö, ü ...

                Jan 12, 2012#7

                Hello Mofi

                thanks for info and links.
                Mofi wrote:According to your report I suppose your HTML file is encoded with UTF-8. So you should see in the status bar at bottom of the UltraEdit window U8-DOS or U8-UNIX. And your HTML file contains in the head section probably (and hopefully) also

                <meta http-equiv="Content-Type" content="text/html; charset=utf-8" >
                Yes. U8-DOS and charset=utf-8.

                But nevertheless ..

                HTML-Tidy reports:

                Code: Select all

                HTML Tidy Parsing ...
                line 1 column 1 - Warning: specified input encoding (utf-8) does not match actual input encoding (utf-16)
                ...
                Info: Doctype given is "-//W3C//DTD HTML 4.01 Transitional//EN"
                Info: Document content looks like HTML 4.01 Transitional
                And to save and reopen the file as you recommended above does not help.

                Edit: I took now the section [HTMLTidyOptions] from uedit32.ini and made a new CFG file. Then I changed

                Code: Select all

                char-encoding=0   (which is not defined) to ascii, to utf8 and to nothing (I removed the line)
                Result: all the same ...

                The report of HTML-Tidy reports also:

                Code: Select all

                line 61 column 21 - Warning: <img> attribute "gelï¿¿ï¿¿schte" lacks value
                It has already a problem here ...

                Peter

                  Re: HTML Tidy: conversion of umlauts ä, ö, ü ...

                  Jan 12, 2012#8

                  Semi - solution:

                  a) The CFG has to use ":" instead of =

                  wrong:
                  char-encoding=utf8
                  correct:
                  char-encoding: utf8

                  An example can be found here: http://www.w3.org/People/Raggett/tidy

                  b) The result is now that umlauts in strings will not be replaced, umlauts in filename will be replaced. But the file has to be saved to refresh the strings (see postings above):

                  Characters displayed in the result of Tidy:

                  Code: Select all

                                              <img alt="Hier fehlt ein Bild." title=
                                              "Abfrage zum endgültigen Löschen des Ordners " src=
                                              "media/Outlook_L%C3%B6schen.png">
                  Characters displayed in the file after "save as .. UTF-8":

                  Code: Select all

                                              <img alt="Hier fehlt ein Bild." title=
                                              "Abfrage zum endgültigen Löschen des Ordners " src=
                                              "media/Outlook_L%C3%B6schen.png">
                  Peter

                    Jan 18, 2012#9

                    I found an alternative:
                    http://int64.org/projects/htmltrim

                    I found it with a link from W3C-Consortium.

                    7 years old, but a nice GUI, a lot of options which can be stored in project-files. Maybe the base is also HTML-Tidy, but now the Umlaute are OK. Now I will try to edit and check it with UE and included HTML-TIDY, and then to make it "pretty-printing" with HTMLTRIM.

                    Peter