Tidy and typographer's quotes

Tidy and typographer's quotes

2
NewbieNewbie
2

    Nov 09, 2007#1

    I really like using the alphabetical references (I don't know the correct terminology) for quotes and the like: “, ”, …, ...

    When I first tried HTML Tidy in UltraEdit, using a sample config file I got from somewhere, they all got translated to numerics: “, ”, …, ... I tried all sorts of changes to the config file but failed. Now I can't even get it to do the numerics; it's inserting characters directly (“Xmas” turns into “Xmas” and I can't even get back to the numerics).

    My config file right now is:

    Code: Select all

    indent: auto
    indent-spaces: 2
    wrap: 72
    markup: yes
    output-xhtml: yes
    input-xml: no
    show-warnings: yes
    quote-marks: no
    quote-nbsp: yes
    quote-ampersand: yes
    break-before-br: no
    uppercase-tags: no
    uppercase-attributes: no
    char-encoding: utf-8
    and my documents start out with:

    Code: Select all

    <?xml version="1.0" encoding="utf-8"?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
        "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" >
    Help?

    6,681583
    Grand MasterGrand Master
    6,681583

      Nov 10, 2007#2

      Here is the manual for HTML Tidy. The correct name for &ldquo;, &rdquo; is HTML entities. You will find the keyword entities several times on the manual page which you should read once. The numeric values are the Unicode values of these characters.

      However, add a line with following text to your configuration file and the existing well-formed HTML entities should be preserved in the UTF-8 encoded XHTML file (not tested):

      preserve-entities: yes

      The of course correct option char-encoding: utf-8 is responsible for the conversion of the HTML entities into UTF-8 characters.
      Best regards from an UC/UE/UES for Windows user from Austria

      2
      NewbieNewbie
      2

        Re: Tidy and typographer's quotes

        Nov 12, 2007#3

        Thank you. That did not work but it led me to the solution, which is to use <?xml version="1.0" encoding="iso-8859-1"?> in the document and

        Code: Select all

        char-encoding: latin1
        preserve-entities: yes
        in the config, and the result still validates.

        6,681583
        Grand MasterGrand Master
        6,681583

          Re: Tidy and typographer's quotes

          Nov 12, 2007#4

          Ah yes, on the main page of the HTML Tidy project I now read:
          11 February 2007

          The configuration option preserve-entities has been added.
          That's the reason why it did not work because the HTML Tidy published with UltraEdit is an older version.
          Best regards from an UC/UE/UES for Windows user from Austria