Hello,
I have a large file in UTF8 format with around 200 thousand plus strings which are in different scripts (code-blocks/code-pages):Latin, Arabic, Devanagari, Chinese, Japanese.
I need to extract from the file only the following:
All strings having basic Latin characters: 0021-007E,
all strings in the Devanagari range: 0900 to 097F,
and store them in two separate files.
Many thanks in advance. I have never tried character identification in UltraEdit and hence the request. At present I sort the file (which is around 2,000,000 records) which is very painful.
A sample file is given below:
Many thanks for any help.
I have a large file in UTF8 format with around 200 thousand plus strings which are in different scripts (code-blocks/code-pages):Latin, Arabic, Devanagari, Chinese, Japanese.
I need to extract from the file only the following:
All strings having basic Latin characters: 0021-007E,
all strings in the Devanagari range: 0900 to 097F,
and store them in two separate files.
Many thanks in advance. I have never tried character identification in UltraEdit and hence the request. At present I sort the file (which is around 2,000,000 records) which is very painful.
A sample file is given below:
Code: Select all
wanavati
wanowrie
wapcos
warada
warangal
ward no
warishnagar
warispura
warlees
warnali
waroda
warshiya
warud
wasdi
washermenpet
washim
wasimal
wathar
wayangwade
wazirpur
webworld
wecors
wester
westgodavari
wests
wharnsby
wheelers
whitefield
whitefields
winchester
wind
windermere
winze
wireles
wkshp
वनवाडी
वयांगवाडे
वरोडा
वर्कशॉप
वसडी
वसिमल
वानावती
वायरलेस
वारंगल
वारधा
वारिसनगर
वारिसपूरा
वारूड
वार्नाली
वार्शिया
वाशरमेनपेट
वाशिम
विंड
विनचेस्टर
विन्झे
विन्डरमियर
वॅस्टगोदावरी
वॅस्टर
वेकोर्स
वेधर
वेपकोस
वेबवर्ल्ड
वेस्ट्स
वॉर्ड नं
व्हाइटफील्ड्स
व्हाइटफ़ील्ड
व्हार्न्सबी
व्हीलर्ज़
वज़ीरपुर