Mofi, I apreciate your time and attention for my thread.
Your scripts are always of the highest quality, well commented and written.
Thank you.
Doing some testing, I noticed that now it captures the "Copy of..." prefix of filenames accordingly.
Good.
But I can't see it matching words after the first ones in the filename.
I mean, like this example below:
If I have:
Code: Select all
F:\tests\Satoshi Nakamoto created Bitcoin in 2009.docx
F:\tests\He mysteriously vanished in 2011.txt
F:\tests\makes the digital currency seem almost mystical.txt
F:\tests\No interference, no regulation and no surveillance.txt
F:\tests\Bitcoin guru mysteriously vanished.txt
F:\tests\Digital currency seem mystical.txt
The list of similar names would be:
Code: Select all
1. F:\tests\He mysteriously vanished in 2011.txt
2. F:\tests\Bitcoin guru mysteriously vanished.txt
3. F:\tests\makes the digital currency seem almost mystical.txt
4. F:\tests\Digital currency seem mystical.txt
(numbers added just to explain, as below)
Because:
1 and
2 has "
mysteriously vanished" in sequence and
3 and
4 has "
digital currency" in sequence
If I run the last version of the script, result is empty.
Better if the script could get all similarities with 3 or less words in sequence, no matter its position in filename.
Reading the last script, I could see that the core of comparison is right here:
Code: Select all
for (nFileNameIndex = 0; nFileNameIndex < asFullFileNames.length; ++nFileNameIndex)
{
var bDuplicate = false;
var nCompareIndex = nFileNameIndex + 1;
sFileName1 = asFileNames[nFileNameIndex];
nLength1 = sFileName1.length;
while (nCompareIndex < asFileNames.length)
{
sFileName2 = asFileNames[nCompareIndex];
nLength2 = sFileName2.length;
// The string comparisons are case-sensitive which is the reason
// for using above the toLowerCase function for each file name.
// The two file names are simply compared only if the length of
// both file names is identical. Otherwise more smart file name
// comparisons are done by comparing just the shorter file name
// with the end of the longer file name and if no match the shorter
// file name without its file extension with the beginning of the
// longer file name to identify files with similar file names.
// The file extensions must be identical in any case.
if (nLength1 < nLength2)
{
if (sFileName1 != sFileName2.substr(nLength2 - nLength1))
{
sShorterFileName = GetFileName(sFileName1);
if (sShorterFileName != sFileName2.substr(0,sShorterFileName.length))
{
nCompareIndex++;
continue;
}
if (GetFileExt(sFileName1) != GetFileExt(sFileName2))
{
nCompareIndex++;
continue;
}
}
}
else if (sFileName1 != sFileName2)
{
nCompareIndex++;
continue;
}
// Is that the first time that the file name was found
// a second time in the list of file names?
if (!bDuplicate)
{
// Append the full file name to the array of duplicate
// file names first.
asDuplicateFiles.push(asFullFileNames[nFileNameIndex]);
bDuplicate = true;
}
// Now append the similar or duplicate file name in another directory
// and remove that file name from both file name arrays to avoid
// processing similar and duplicate file names more than once.
asDuplicateFiles.push(asFullFileNames[nCompareIndex]);
asFullFileNames.splice(nCompareIndex,1);
asFileNames.splice(nCompareIndex,1);
}
}
Unfortunately, it's beyond my programming skills.
So, if you could improve it to get all similarities with 3 or less words in sequence, no matter its position in filename, it'd be great.
Below, I attached a directory tree for tests.
- tests.rar (3.79 KiB) 0