Hi everyone,
This is my first post on an UltraEdit forum. I'm writing a script to take a long excerpt of Unicode text (in Japanese), break it into sentences, and load each sentence as a string into an array. I will process the array later.
Everything works just fine, up until the point when I check the contents of the array. It seems that some of the characters are getting corrupted. I'm guessing there's some sort of formatting issue, but I really don't know how to solve the problem.
First, here's my JavaScript code:
Everything seems to work just fine. However, some characters are corrupted in the process.
For example, it my input text is this:
... the output window shows this:
Any ideas?
This is my first post on an UltraEdit forum. I'm writing a script to take a long excerpt of Unicode text (in Japanese), break it into sentences, and load each sentence as a string into an array. I will process the array later.
Everything works just fine, up until the point when I check the contents of the array. It seems that some of the characters are getting corrupted. I'm guessing there's some sort of formatting issue, but I really don't know how to solve the problem.
First, here's my JavaScript code:
Code: Select all
// Ask the user how each sentence entry ends (typically, this is the Japanese period character)
var strEntryTerminator = UltraEdit.getString("What ends each sentence?",1);
// Report what the user inputted to the debug window
UltraEdit.outputWindow.write("Entry terminator is" + strEntryTerminator);
// Use DOS-style line terminator for Windows Notepad Unicode .txt files
var lineTerminator = "\r\n";
// Establish our search string for the loop condition
UltraEdit.activeDocument.findReplace.mode=0; //Replace all in current file
UltraEdit.activeDocument.findReplace.replaceAll=true; //Replace all instances
// Remove all line terminators in the file, making the data one continuous line of text
UltraEdit.activeDocument.findReplace.replace(lineTerminator, "");
// Remove multiple spaces (up to ten) so that there is a maximum of one space between text
var SpaceDeletion = 1;
while (SpaceDeletion < 10) {
UltraEdit.activeDocument.findReplace.replace(" ", " ");
SpaceDeletion ++;
// Replace a period plus a space with a period (removing leading spaces from entries)
UltraEdit.activeDocument.findReplace.replace(strEntryTerminator + " ", strEntryTerminator);
// Replace a period with a period plus a terminator, which will put each sentence on its own line
UltraEdit.activeDocument.findReplace.replace(strEntryTerminator, strEntryTerminator + lineTerminator);
// Select all data in the document
// Selection becomes variable
var mySelection = UltraEdit.activeDocument.selection;
// Split lines at lineTerminator and load them into an array
var resultArr = new Array();
resultArr = mySelection.split(lineTerminator);
// Display total number of records in debug window
resultLength = resultArr.length;
UltraEdit.outputWindow.write(resultLength + " total entries");
// Write array values in debug window
for (var i = 0; i < resultArr.length; i++) {
UltraEdit.outputWindow.write("Value: " + i + " \"" + resultArr[i]);
For example, it my input text is this:
Code: Select all
ブラックホール(英語:black hole)とは、きわめて高密度で大質量で、きわめて強い重力のために、物質だけでなく光さえも脱出できない天体のこと[1]。
それ以前は「collapsar[3] コラプサー」(崩壊した星)などと呼ばれていた。
Code: Select all
Running script: C:\Program Files\IDM Computer Solutions\UltraEdit\scripts\JapaneseDocumentToSRS.js
Entry terminator is縲・
5 total entries
Value: 0 "・スu・ス・ス・スb・スN・スz・ス[・ス・ス・スi・スp・ス・スFblack hole・スj・スニは、・ス・ス・ス・ス゚て搾ソス・ス・ス・スx・スナ大質・スハで、・ス・ス・ス・ス゚て具ソス・ス・ス・スd・スヘのゑソス・ス゚に、・ス・ス・ス・ス・ス・ス・ス・ス・スナなゑソス・ス・ス・ス・ス・ス・ス・ス・スE・スo・スナゑソス・スネゑソス・スV・スフのゑソス・ス・ス[1]・スB
Value: 1 "・ス・ス・ス・ス゚て具ソス・ス・ス・スd・スヘのゑソス・ス゚に鯉ソス・ス・ス・ス・ス・ス・ス・ス・ス・ス・スo・ス・ス・スネゑソス・スネゑソス・ス・ス・ス・ス・ス・スフ領茨ソスA・スニゑソス・ス・ストゑソス・ス・スB
Value: 2 "縲後ヶ繝ゥ繝・け繝サ繝帙・繝ォ縲搾シ磯サ偵>遨エ・峨→縺・≧蜷阪・縲√い繝。繝ェ繧ォ縺ョ迚ゥ逅・ュヲ閠・ず繝ァ繝ウ繝サ繝帙う繝シ繝ゥ繝シ縺・967蟷エ縺ォ縺薙≧縺励◆螟ゥ菴薙r蜻シ縺カ縺溘a縺ォ邱ィ縺ソ蜃コ縺励◆[2]縲・
Value: 3 "・ス・ス・ス・スネ前・スヘ「collapsar[3] ・スR・ス・ス・スv・スT・ス[・スv・スi・ス・ス・スオゑソス・ス・ス・スj・スネどと呼ばゑソストゑソス・ス・ス・スB
Value: 4 "