Hi everyone,
This is my first post on an UltraEdit forum. I'm writing a script to take a long excerpt of Unicode text (in Japanese), break it into sentences, and load each sentence as a string into an array. I will process the array later.
Everything works just fine, up until the point when I check the contents of the array. It seems that some of the characters are getting corrupted. I'm guessing there's some sort of formatting issue, but I really don't know how to solve the problem.
First, here's my JavaScript code:
Everything seems to work just fine. However, some characters are corrupted in the process.
For example, it my input text is this:
... the output window shows this:
Any ideas?
This is my first post on an UltraEdit forum. I'm writing a script to take a long excerpt of Unicode text (in Japanese), break it into sentences, and load each sentence as a string into an array. I will process the array later.
Everything works just fine, up until the point when I check the contents of the array. It seems that some of the characters are getting corrupted. I'm guessing there's some sort of formatting issue, but I really don't know how to solve the problem.
First, here's my JavaScript code:
Code: Select all
// Ask the user how each sentence entry ends (typically, this is the Japanese period character)
var strEntryTerminator = UltraEdit.getString("What ends each sentence?",1);
// Report what the user inputted to the debug window
UltraEdit.outputWindow.write("Entry terminator is" + strEntryTerminator);
// Use DOS-style line terminator for Windows Notepad Unicode .txt files
var lineTerminator = "\r\n";
// Establish our search string for the loop condition
UltraEdit.activeDocument.top();
UltraEdit.activeDocument.findReplace.mode=0; //Replace all in current file
UltraEdit.activeDocument.findReplace.replaceAll=true; //Replace all instances
// Remove all line terminators in the file, making the data one continuous line of text
UltraEdit.activeDocument.findReplace.replace(lineTerminator, "");
// Remove multiple spaces (up to ten) so that there is a maximum of one space between text
var SpaceDeletion = 1;
while (SpaceDeletion < 10) {
UltraEdit.activeDocument.findReplace.replace(" ", " ");
SpaceDeletion ++;
}
// Replace a period plus a space with a period (removing leading spaces from entries)
UltraEdit.activeDocument.findReplace.replace(strEntryTerminator + " ", strEntryTerminator);
// Replace a period with a period plus a terminator, which will put each sentence on its own line
UltraEdit.activeDocument.findReplace.replace(strEntryTerminator, strEntryTerminator + lineTerminator);
// Select all data in the document
UltraEdit.activeDocument.selectAll();
// Selection becomes variable
var mySelection = UltraEdit.activeDocument.selection;
// Split lines at lineTerminator and load them into an array
var resultArr = new Array();
resultArr = mySelection.split(lineTerminator);
// Display total number of records in debug window
resultLength = resultArr.length;
UltraEdit.outputWindow.write(resultLength + " total entries");
// Write array values in debug window
for (var i = 0; i < resultArr.length; i++) {
UltraEdit.outputWindow.write("Value: " + i + " \"" + resultArr[i]);
}
For example, it my input text is this:
Code: Select all
ブラックホール(英語:black hole)とは、きわめて高密度で大質量で、きわめて強い重力のために、物質だけでなく光さえも脱出できない天体のこと[1]。
きわめて強い重力のために光さえも抜け出せなくなった時空の領域、とされている。
「ブラック・ホール」(黒い穴)という名は、アメリカの物理学者ジョン・ホイーラーが1967年にこうした天体を呼ぶために編み出した[2]。
それ以前は「collapsar[3] コラプサー」(崩壊した星)などと呼ばれていた。
Code: Select all
Running script: C:\Program Files\IDM Computer Solutions\UltraEdit\scripts\JapaneseDocumentToSRS.js
========================================================================================================
Entry terminator is縲・
5 total entries
Value: 0 "・スu・ス・ス・スb・スN・スz・ス[・ス・ス・スi・スp・ス・スFblack hole・スj・スニは、・ス・ス・ス・ス゚て搾ソス・ス・ス・スx・スナ大質・スハで、・ス・ス・ス・ス゚て具ソス・ス・ス・スd・スヘのゑソス・ス゚に、・ス・ス・ス・ス・ス・ス・ス・ス・スナなゑソス・ス・ス・ス・ス・ス・ス・ス・スE・スo・スナゑソス・スネゑソス・スV・スフのゑソス・ス・ス[1]・スB
Value: 1 "・ス・ス・ス・ス゚て具ソス・ス・ス・スd・スヘのゑソス・ス゚に鯉ソス・ス・ス・ス・ス・ス・ス・ス・ス・ス・スo・ス・ス・スネゑソス・スネゑソス・ス・ス・ス・ス・ス・スフ領茨ソスA・スニゑソス・ス・ストゑソス・ス・スB
Value: 2 "縲後ヶ繝ゥ繝・け繝サ繝帙・繝ォ縲搾シ磯サ偵>遨エ・峨→縺・≧蜷阪・縲√い繝。繝ェ繧ォ縺ョ迚ゥ逅・ュヲ閠・ず繝ァ繝ウ繝サ繝帙う繝シ繝ゥ繝シ縺・967蟷エ縺ォ縺薙≧縺励◆螟ゥ菴薙r蜻シ縺カ縺溘a縺ォ邱ィ縺ソ蜃コ縺励◆[2]縲・
Value: 3 "・ス・ス・ス・スネ前・スヘ「collapsar[3] ・スR・ス・ス・スv・スT・ス[・スv・スi・ス・ス・スオゑソス・ス・ス・スj・スネどと呼ばゑソストゑソス・ス・ス・スB
Value: 4 "