Why is there a difference in character count with spaces im comparison to Microsoft Word?

Why is there a difference in character count with spaces im comparison to Microsoft Word?

MikeW

    May 25, 2017#1

    UE 24.00.0.76

    I made a test file (previously attached) using a lorem ipsum generator. The character count dialog in Microsoft Word disagreed with the count with spaces in UltraEdit and so I ran a Replace on just spaces. The total spaces replaced, when added to the Characters (no spaces) value at that point agreed with MS Word's character count with spaces.

    Code: Select all

                             MS Word   UltraEdit
    Words                       3070       3,070
    Paragraphs/Lines             140         140
    Characters (no spaces)     18210      18,210
    Characters (with spaces)   21140      21,418
    The total number of spaces that UE replaces is 2930, which when added to the Characters (no spaces) number in UltraEdit adds up to the total Microsoft Word count for Characters (with spaces).

    Mike

    6,603548
    Grand MasterGrand Master
    6,603548

      May 25, 2017#2

      The difference can be explained by different interpretation of spaces.

      Characters (no spaces) is the number of characters without normal spaces, horizontal tabs, carriage returns and line-feeds in UltraEdit and in Microsoft Word. Microsoft Word interprets also a NO-BREAK SPACE as space character for this count. UltraEdit interprets any whitespace character according to Unicode as space character and exclude them all for this count.

      Characters (with spaces) is in Microsoft Word the number of characters with normal spaces, non breaking spaces and horizontal tabs, but without carriage returns and line-feeds.
      Characters (with spaces) is in UltraEdit the number of characters with all whitespace characters according to Unicode included.

      Your example file contains 2930 normal spaces as a Find for space character and clicking on button Count all outputs correct. 18210 + 2930 = 21140 as output by Microsoft Word.

      Your example file contains 3208 whitespaces as a Perl regular expression Find for \s and clicking on button Count all outputs correct. 18210 + 3208 = 21418 as output by UltraEdit.

      The file has 139 carriage return + line-feeds = 278 vertical whitespace characters. The last line has no line termination.

      In my point of view UltraEdit makes here a better job than Microsoft Word 2007. MS Word 2007 interprets for example an EN SPACE (code value 2002 hexadecimal) as non space character while UltraEdit interprets it as space character according to Unicode.

      See the post Remove / delete blank and empty lines for more information about whitespaces according to Unicode.
      Best regards from an UC/UE/UES for Windows user from Austria

      MikeW
      MikeW

        Jun 07, 2017#3

        Thank you for the explanation. Learned something I had never given much thought about.

        Best regards, Mike