UE 28.00.0.114 on Win 7x64 does not correctly sort word list on using a Croatian locale sort

UE 28.00.0.114 on Win 7x64 does not correctly sort word list on using a Croatian locale sort

2
NewbieNewbie
2

    May 02, 2021#1

    Hi,

    I just tried to sort one word list and I noticed that on my language - Croatian - words are not sorted correctly.
    I tried also Serbian, Bosnian  - both 1250 Central Europe languages (very similar ones), didn't try Serbian Cyrillic, etc., but only Latin - sort also is not correct.
    I posted two pictures just to show what is different to avoid different coding of the forum:

    Correct sorting (done manually):
    correct_croatian_sort.png (21.38KiB)
    Correct sorting (done manually)

    Code: Select all

    a
    b
    c
    č
    ć
    d
    dž
    đ
    e
    f
    g
    h
    i
    j
    k
    l
    lj
    m
    n
    nj
    o
    p
    r
    s
    š
    t
    u
    v
    z
    ž
    
    Incorrect sorting - this is what UltraEdit is doing:
    incorrect_croatian_sort.png (31.06KiB)
    Incorrect sorting - this is what UltraEdit is doing.

    Code: Select all

    š
    ž
    a
    b
    c
    č
    ć
    d
    dž
    đ
    e
    f
    g
    h
    i
    j
    k
    l
    lj
    m
    n
    nj
    o
    p
    r
    s
    t
    u
    v
    z
    
    Is this some kind of bug or are my settings not correct?

    Thanks

    6,602548
    Grand MasterGrand Master
    6,602548

      May 03, 2021#2

      I selected at Advanced - Settings or Configuration - File handling - Encoding for setting Default code page (for ANSI encoding) the list item 1250 (ANSI - Central Europe) and for setting Locale (used for sort and time/date) the list item hr-HR, Croatian (Croatia) and closed the configuration dialog with a click on button X. I used already UE v28.10.0.0.

      First I copied into a new file with encoding 1250 (ANSI - Central Europe) the correct sorted lines and executed the Sort with the options:

      Sort order: Ascending
      Remove duplicates (RD): checked with Where all selected keys match selected
      Ignore case: checked
      Numeric sort: unchecked, disabled
      Tab delimited sort: unchecked
      Custom delimited sort: unchecked
      Sort columns: Key 1 with 1 for Start column and -1 for End column (= entire line) and option RD checked, the keys 2 to 4 have all 0 for the start/end columns and remove duplicates option unchecked
      Use locale (slower): checked

      The sort result is the incorrect result. The lines are definitely not correct sorted although all sort settings are correct set.

      Next I converted the file from ASCII (in real ANSI) to Unicode (UTF-16 LE) and ran the sort with the same options as before. Now the sort result was correct.

      I undid the sort, converted the file from Unicode/UTF-8 to UTF-8 (Unicode editing) (UTF-8) and ran the sort once again. The sort was correct again.

      I ran once again a sort on UTF-8 encoded Unicode file, but this time with option Use locale (slower) not checked, and got the different result:

      Code: Select all

      a
      b
      c
      d
      dž
      e
      f
      g
      h
      i
      j
      k
      l
      lj
      m
      n
      nj
      o
      p
      r
      s
      t
      u
      v
      z
      ć
      č
      đ
      š
      ž
      
      So the sort behavior on a Unicode encoded file depends definitely on the sort option Use locale (slower).

      Conclusion: A locale sort with hr-HR is not working for an ANSI encoded file with code page 1250. This is an issue you should report by email to IDM support for getting it fixed in a future version. A locale sort with hr-HR is working for a Unicode encoded file. For that reason the file could be converted to Unicode for correct sorting it and then converted back to ANSI encoding using code page 1250.

      Note: The code page selected for active ANSI encoded file does not affect the locale sort option as the code page is not precise enough. There are multiple locales (languages) included in an ANSI code page. For that reason the Locale configuration setting determines how a locale sort is done working for hr-HR for some unknown reason correct only for Unicode encoded files with UltraEdit for Windows v28.00.0.114 and v28.10.0.0.
      Best regards from an UC/UE/UES for Windows user from Austria

      2
      NewbieNewbie
      2

        May 03, 2021#3

        Thank you Mofi for detailed answer and suggestion to use Unicode. I tried UTF but not Unicode before.
        Glad to see you still provided help to users, not been here at least a decade.