- Replies
- Views
- Last post
- I have a whole set of regexes given below which I want to run all together instead of running each one separately. These allow me to check for mismatches within a dictionary and ensure that a wrong word is not mapped to the headword entry. I tried to write a macro with each regex listed and with a l...Replies: 2 Views: 2.3K
- 2 Replies
- 2.3K Views
- Last post by dictdoc
Jan 04, 2014
- My problem is as described below. I am working on name variants and one of the tools developed in C is a Metaphone Engine. Basically the engine looks for name variants and conjoins them to provide similar homographs. The engine is designed for Indian languages but the examples given below are in Eng...Replies: 3 Views: 1.8K
- 3 Replies
- 1.8K Views
- Last post by Ovg
Dec 22, 2013
- Hello, I am trying to sort a file in Urdu by the character by which it ends. Each word is on a separate line An example in English would help: lychee fruit banana apple pear cherry I need the sort to be on the last letter of the word and then sorted recursively in reverse order banana lychee [i]This...Replies: 2 Views: 2.5K
- 2 Replies
- 2.5K Views
- Last post by dictdoc
Oct 08, 2013
- I am working on an Urdu to Hindi dictionary and I have created the following file structure: Headword=Gloss1,Gloss2,Gloss3 i.e. glosses delimited by a comma. It so happens that in some cases (around 6000+ in a file of over 200,000+ the glosses are duplicated. Since this may be a recurrent phenomenon...Replies: 3 Views: 2.1K
- 3 Replies
- 2.1K Views
- Last post by Mofi
Aug 27, 2013
- Hello, I am compiling an open-source stemmer dictionary for English and eventually for other Indian languages. The Engine which I have written has spewed out all lemmatised/expanded forms of the words: Nouns, Adjectives, Adverbs etc. Each set of expanded forms is separated by a hard return. Since ea...Replies: 2 Views: 2.6K
- 2 Replies
- 2.6K Views
- Last post by dictdoc
Jul 11, 2013
- Hello, I have a file which has the following structure word space frequency The file is around 30,000 headwords each along with its frequency. The words have different lengths. What I need is a script which can sort the file on length of the headword and once the file is sorted on length: smallest t...Replies: 1 Views: 2.2K
- 1 Replies
- 2.2K Views
- Last post by Mofi
Mar 22, 2013
- Hello, Splitting a sentence using the full-stop/question-mark/exclamation is a common device. Whereas the question-mark / exclamation do not pose too much of a problem; the full-stop as a sentence delimiter raises certain issues because of its varied use, as shown in the examples below: The temperat...Replies: 2 Views: 7.3K
- 2 Replies
- 7.3K Views
- Last post by Mofi
Jul 30, 2012
- Dear all, I am working with names and I have a large file of names in which some words are written together (upto 4 or 5) and their corresponding single forms are also present in the word-list. An example would make this clear annamarie mariechristine johnsmith johnjoseph smith john smith anna marie...Replies: 1 Views: 2.3K
- 1 Replies
- 2.3K Views
- Last post by Mofi
Jun 10, 2012
- Hello, I have a database with the following structure: Headword in a foreign language followed by frequency (delimited with comma) and eventually followed by a list of glosses in English, each gloss delimited with a comma. A small sample is given below कुर्यवंशी,4,kuryanshi,kuryavanashi,kuryawanshi,...Replies: 2 Views: 3.4K
- 2 Replies
- 3.4K Views
- Last post by dictdoc
Apr 30, 2012
- Hello, I have a large file in UTF8 format with around 200 thousand plus strings which are in different scripts (code-blocks/code-pages):Latin, Arabic, Devanagari, Chinese, Japanese. I need to extract from the file only the following: All strings having basic Latin characters: 0021-007E, all strings ...Replies: 3 Views: 2.6K
- 3 Replies
- 2.6K Views
- Last post by dictdoc
Feb 19, 2012
- Hello, my problem and solution which I need are as follows: PROBLEM STATEMENT: I have two sets of files. File 1 is bi-lingual i.e. it is English and another language with the structure: English=Foreign Language File 2 is basically new words that I want to add. These are mono-lingual, i.e. only in En...Replies: 5 Views: 3.0K
- 5 Replies
- 3.0K Views
- Last post by spottiswoad
Oct 29, 2011
- I am trying to generate NGrams for name analysis I have two files open in UltraEdit. File one contains the basic NGram and file 2 the character that has to be added recursively to the NGram. The following example will make the case clear. File 1 bb bc bd be bf File 2 b c Expected output: bbb bcb bdb...Replies: 2 Views: 1.8K
- 2 Replies
- 1.8K Views
- Last post by dictdoc
Apr 16, 2011
- Hello, I am a newbie to regex and would like to a regex which will enable me to find specific number of blank spaces on a line. I have UTF data in Hindi which has 2-3-4-5-6 words on the same line separated by blanks e.g. a b a b c a b c d a b c d e where a,b,c,d,e and so on stand for words. I would ...Replies: 5 Views: 3.2K
- 5 Replies
- 3.2K Views
- Last post by pietzcker
Jan 24, 2010