Hello,
I have a file which has the following structure
word space frequency
The file is around 30,000 headwords each along with its frequency. The words have different lengths. What I need is a script which can sort the file on length of the headword and once the file is sorted on length: smallest to largest; sort each such set of words having the same length on their frequency.
At present I do this in Excel using the
formula, but this is getting tedious.
I am giving below a sample input file
The expected output would be:
As you can see the file has been sorted on length and then on frequency count value.
Any help given would avoid the tedium of loading the file each time in Excel. Many thanks in advance.
I have a file which has the following structure
word space frequency
The file is around 30,000 headwords each along with its frequency. The words have different lengths. What I need is a script which can sort the file on length of the headword and once the file is sorted on length: smallest to largest; sort each such set of words having the same length on their frequency.
At present I do this in Excel using the
Code: Select all
=Len(text)
I am giving below a sample input file
Code: Select all
about 1903238
and 14291859
are 1487971
but 2994482
can 1915289
come 1541623
for 3296048
from 2207336
get 2081392
have 5930242
here 1558771
him 1571291
just 1756270
know 2221467
like 1845600
not 3091071
now 1453264
one 1988291
out 1812292
right 1410555
say 2345958
she 2123744
that 7834407
the 29962169
there 1957160
they 2684414
think 1398723
this 3814998
was 1399013
what 3327049
when 1465219
who 1543711
with 3983564
would 1346905
you 12345509
your 2329896
Code: Select all
the 29962169
and 14291859
you 12345509
for 3296048
not 3091071
but 2994482
say 2345958
she 2123744
get 2081392
one 1988291
can 1915289
out 1812292
him 1571291
who 1543711
are 1487971
now 1453264
was 1399013
that 7834407
have 5930242
with 3983564
this 3814998
what 3327049
they 2684414
your 2329896
know 2221467
from 2207336
like 1845600
just 1756270
here 1558771
come 1541623
when 1465219
there 1957160
about 1903238
right 1410555
think 1398723
would 1346905
Any help given would avoid the tedium of loading the file each time in Excel. Many thanks in advance.