- Posted by
- Dear Mofi, The second solution worked brilliantly and real fast and I can now segregate English and Devanagari. With a bit of tweaking I can allocate code-blocks and pipe them out to different files which could mean that the tool could easily segregate to different files different code-pages. I will...Posted in Scripts
-   Topics
-   Views
dictdoc
Feb 19, 2012
- Hello, I have a large file in UTF8 format with around 200 thousand plus strings which are in different scripts (code-blocks/code-pages):Latin, Arabic, Devanagari, Chinese, Japanese. I need to extract from the file only the following: All strings having basic Latin characters: 0021-007E, all strings ...Posted in Scripts
-   Topics
-   Views
dictdoc
Feb 16, 2012
- I am trying to generate NGrams for name analysis I have two files open in UltraEdit. File one contains the basic NGram and file 2 the character that has to be added recursively to the NGram. The following example will make the case clear. File 1 bb bc bd be bf File 2 b c Expected output: bbb bcb bdb...Posted in Scripts
-   Topics
-   Views
dictdoc
Apr 15, 2011
- Hello, my problem and solution which I need are as follows: PROBLEM STATEMENT: I have two sets of files. File 1 is bi-lingual i.e. it is English and another language with the structure: English=Foreign Language File 2 is basically new words that I want to add. These are mono-lingual, i.e. only in En...Posted in Scripts
-   Topics
-   Views
dictdoc
Jan 04, 2011
- Sorry, guess I should have been a bit more clear. The structure of the dictionary is as under: Headword,/Phonetic representation, meanings and examples. It so happens that at times two dictionary entries are on the same line as in the case below Headword,/Phonetic representation/, meanings and examp...Posted in Find/Replace/Regular Expressions
-   Topics
-   Views
dictdoc
Jan 24, 2010
- Hello, This is in continuation of my earlier query re. regex where I wanted to know the answer to finding the occurrence of two regular expression types within a line. I also have in the dictionary IPA (phonetic representations) represented by /IPA/. How do I handle a slash in the perl regex given? ...Posted in Find/Replace/Regular Expressions
-   Topics
-   Views
dictdoc
Jan 23, 2010
- Hello, I am a newbie to regex and would like to a regex which will enable me to find specific number of blank spaces on a line. I have UTF data in Hindi which has 2-3-4-5-6 words on the same line separated by blanks e.g. a b a b c a b c d a b c d e where a,b,c,d,e and so on stand for words. I would ...Posted in Find/Replace/Regular Expressions
-   Topics
-   Views
dictdoc
Jan 19, 2010