Dear all,
I am working with names and I have a large file of names in which some words are written together (upto 4 or 5) and their corresponding single forms are also present in the word-list.
An example would make this clear
I have a script in awk which does something similar but it takes words from an external dictionary, whereas here I need to bootstrap.
Any help given would be gratefully acknowledged.
I am working with names and I have a large file of names in which some words are written together (upto 4 or 5) and their corresponding single forms are also present in the word-list.
An example would make this clear
The program should split the words in the list basing itself on the single forms which are there. Thusannamarie
mariechristine
johnsmith
johnjoseph smith
john
smith
anna
marie
mary
christine
In the case of the last sinceannamarie anna-marie
mariechristine marie christine
johnsmith john smith
johnjosephsmith
is missing, the program could suitably tag the missing element and show the word asjoseph
The script/macro would prove especially helpful in separating words in languages such as German whch have a large number of compounded words.john !joseph! smith
I have a script in awk which does something similar but it takes words from an external dictionary, whereas here I need to bootstrap.
Any help given would be gratefully acknowledged.