Alignment all data in columns

sdnbhd · Nov 25, 2021#12021-11-25T17:57+00:00

I have data which look something like this:

879899834569:940239023808243:Adelaide:Chloe::::::1/1/0001 12:00:00 AM::
667755634560:635298544373715:Godsway:Amedome::::Lajang::1/1/0001 12:00:00 AM::
787888664559:935534723298821:Nnan:Qwame Abrantepa:laki-laki:Kumasi:::Student:1/1/0001 12:00:00 AM::
657766654558:465344555646913:Fouseni:Gyui::Accra::::1/1/0001 12:00:00 AM::
458556734544:877556896580782:Augustina:Owusua:perempuan:::::1/1/0001 12:00:00 AM::
233233834536:789999742727084:Dramani:Ibrahim:laki-laki:::::1/1/0001 12:00:00 AM::

I need this to be arranged:

Code: Select all

COLUMNA       COLUMNB          COLUMNC    COLUMND          COLUMNE    COLUMNF           COLUMNG  COLUMNH   COLUMNI
879899834569  940239023808243  Adelaide   Chloe                                                            1/1/0001 12:00:00 AM
667755634560  635298544373715  Godsway    Amedome                                                Lajang    1/1/0001 12:00:00 AM
787888664559  935534723298821  Nnan       Qwame Abrantepa  laki-laki  Kumasi            Student            1/1/0001 12:00:00 AM
657766654558  465344555646913  Fouseni    Gyui                        Accra                                1/1/0001 12:00:00 AM
458556734544  877556896580782  Augustina  Owusua           perempuan                                       1/1/0001 12:00:00 AM
233233834536  789999742727084  Dramani    Ibrahim          laki-laki                                       1/1/0001 12:00:00 AM

I am a newbie and need help with either a script or macro to get the wanted format.

Thanks in advance.

Mofi · Nov 25, 2021#22021-11-25T20:24+00:00

It looks like the file is a CSV file using unusually a colon as value separator with two completely useless empty data columns at end of each row. The date/time values are formatted wrong as the time contains also colons and therefore the entire date/time value must be enclosed in " to be formatted valid in this CSV file.

UltraEdit has built-in the command Convert to fixed-width to scan a CSV file with a separator defined by the user for maximum length of all data values within all data columns in the CSV and then convert the CSV file to a fixed-width file. But this command can be used only after enclosing the date/time values in " (and removing the useless colons at the end).

A case-sensitive Perl regular expression replace all with the search string \b(\d+/\d+/\d+ [0-1][0-9]:[0-5][0-9]:[0-5][0-9] [AP]M):: and the replace string "\1" can be used to enclose the date/time value in double quotes and to remove the two colons at end of each row.

I wrote already once a script version of command Convert to fixed-width, see Converting CSV data to fixed width data? For that reason it was very easy for me to write the script for this task as there needed to be modified the character code value of the separator at top and insert the case-sensitive Perl regular expression replace all to enclose the date/time values in double quotes and remove the two colons at end of each line before using the already existing code.

Code: Select all

// Character code value of separator. Use 0x2C for comma, 0x3B for
// semicolon, 0x7C for vertical bar, or 0x09 for a horizontal tab.
var nSeparatorCode = 0x3A;  // Colon is unusually the value separator.

// Array of comma separated list of field widths. For example [5,10,20]
// means first data column should have a width of 5 characters, second
// of 10 characters and third of 20 characters. The number of values in
// first line of CSV file determine the number of data columns. If the
// field widths array is empty or contains less field width numbers than
// values in first line of CSV file, the script automatically determines
// the required width for each data column depending on longest value in
// each data column on which the field width is not defined here at top.
var anFieldWidths = [];

// Number of spaces to add to automatic determined width of a data column.
// For example if longest value in a data column has 22 characters and
// the value for this variable is 3, the field width of this data column
// is 25. The variable value must be 0 or greater. This variable is used
// only for field widths automatic determined by the script, not for the
// widths defined in the array above. The number of additional spaces is
// never added to last field width.
var nAutoWidthSpaces = 3;

// Character as replacement for a carriage return and character as
// replacement for a line-feed in a field value. Each string can be also
// an empty string to remove all carriage returns and line-feeds from
// field values. Also a string with more than one character is possible.
var sCarriageReturn = "«";
var sLineFeed = "»";

// Write the reformatted lines into a new file or overwrite active file.
var bNewFile = true;


if (UltraEdit.document.length > 0)  // Is any file opened?
{
   // Define environment for this script.
   UltraEdit.insertMode();
   if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
   else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();

   // Run first a case-sensitive Perl regular expression replace all
   // searching for date/time with one more date appended and split
   // that data value up into two data values if existing at all.
   UltraEdit.activeDocument.top();
   UltraEdit.perlReOn();
   UltraEdit.activeDocument.findReplace.mode=0;
   UltraEdit.activeDocument.findReplace.matchCase=true;
   UltraEdit.activeDocument.findReplace.matchWord=false;
   UltraEdit.activeDocument.findReplace.regExp=true;
   UltraEdit.activeDocument.findReplace.searchDown=true;
   if (typeof(UltraEdit.activeDocument.findReplace.searchInColumn) == "boolean")
   {
      UltraEdit.activeDocument.findReplace.searchInColumn=false;
   }
   UltraEdit.activeDocument.findReplace.preserveCase=false;
   UltraEdit.activeDocument.findReplace.replaceAll=true;
   UltraEdit.activeDocument.findReplace.replaceInAllOpen=false;
   UltraEdit.activeDocument.findReplace.replace('"?\\b(\\d+/\\d+/\\d+ [0-1][0-9]:[0-5][0-9]:[0-5][0-9] [AP]M)[\\t ":]*?(\\d+/\\d+/\\d+):*$', '"\\1":\\2');
   if (UltraEdit.activeDocument.isFound())
   {
      UltraEdit.activeDocument.top();
      // Enclose the date/time values in double quotes and
      // replace all colons at end of each line by one colon.
      UltraEdit.activeDocument.findReplace.replace('"?\\b(\\d+/\\d+/\\d+ [0-1][0-9]:[0-5][0-9]:[0-5][0-9] [AP]M)"?:+$', '"\\1":');
   }
   else  // Different data with no additional date column.
   {
      // Enclose the date/time values in double quotes and
      // remove the last two colons at end of each line.
      UltraEdit.activeDocument.findReplace.replace('"?\\b(\\d+/\\d+/\\d+ [0-1][0-9]:[0-5][0-9]:[0-5][0-9] [AP]M)"?::', '"\\1"');
   }

   UltraEdit.activeDocument.selectAll();
   if (UltraEdit.activeDocument.isSel())
   {
      // Parse the selected CSV data character by character to support also
      // 1 or more delimiters as well as newline characters within a double
      // quoted value. Additionally just store in memory on which indices in
      // CSV character stream the data values start and end instead of using
      // array of strings to avoid copying the strings in memory as much as
      // possible for more efficient processing of the CSV data.

      var bShowOutput = false;     // Show output window after finishing conversion.
      var bNewDataRow = false;     // For detection of a new data row.
      var bWithinQuotes = false;   // For detection of double quoted values.
      var anValueIndices = [0,0];  // Start and end indices of value in stream.
      var anRowValueIndices = [];  // Array of value indices pairs in a data row.
      var anAllValueIndices = [];  // Array of all data row arrays in entire file.
      var nMultiLineValue = 0;     // Is there any multi-line value in CSV file?

      var nCharIndex = 0;
      do
      {
         var nCharCode = UltraEdit.activeDocument.selection.charCodeAt(nCharIndex);
         if (!bWithinQuotes)
         {
            // Is the current character the separator?
            if (nCharCode == nSeparatorCode)
            {
               anValueIndices[1] = nCharIndex;
               anRowValueIndices.push(anValueIndices);
               anValueIndices = [nCharIndex+1,nCharIndex+1];
            }
            // Is the current character a carriage return or a line-feed?
            else if ((nCharCode == 0x0D) || (nCharCode == 0x0A))
            {
               if (!bNewDataRow)
               {
                  bNewDataRow = true;
                  anValueIndices[1] = nCharIndex;
                  anRowValueIndices.push(anValueIndices);
                  anAllValueIndices.push(anRowValueIndices);
                  anRowValueIndices = [];
               }
               anValueIndices = [nCharIndex+1,nCharIndex+1];
               continue;
            }
            else if (nCharCode == 0x22)
            {
               bWithinQuotes = true;
            }
            bNewDataRow = false;
         }
         else  // Everything enclosed in double quotes must be interpreted
         {     // as part of data value including delimiter and newline
               // character until next double quote character is found.
            if (nCharCode == 0x22)
            {
               bWithinQuotes = false;
            }
            else if (nCharCode == 0x0D)
            {
               nMultiLineValue |= 1;
            }
            else if (nCharCode == 0x0A)
            {
               nMultiLineValue |= 2;
            }
         }
      }
      while (++nCharIndex < UltraEdit.activeDocument.selection.length);

      // Does the file not end with newline characters?
      if (!bNewDataRow)
      {
         anValueIndices[1] = nCharIndex;
         anRowValueIndices.push(anValueIndices);
         anAllValueIndices.push(anRowValueIndices);
      }

//      UltraEdit.outputWindow.showWindow(true);
//      UltraEdit.outputWindow.write("\nDump of array anAllValueIndices:\n");
//      var_dump(anAllValueIndices);
//      for (var nLine = 0; nLine < anAllValueIndices.length; nLine++)
//      {
//         var sPluralS = (anAllValueIndices[nLine].length != 1) ? "s" : " ";
//         var sLineData = "The " + anAllValueIndices[nLine].length.toString(10)+
//                         " value indices pair" + sPluralS + " of data row " +
//                         (nLine+1).toString(10) + ": ";
//         for (var nPair = 0; nPair < anAllValueIndices[nLine].length; nPair++)
//         {
//            if (nPair != 0) sLineData += ", ";
//            sLineData += anAllValueIndices[nLine][nPair][0].toString(10);
//            sLineData += '-';
//            sLineData += anAllValueIndices[nLine][nPair][1].toString(10);
//         }
//         UltraEdit.outputWindow.write(sLineData);
//      }
//      UltraEdit.outputWindow.write("");

      // The number of values in first row is the number of data columns.
      // The number of rows in file is the number of data rows in output.
      var nColCount = anAllValueIndices[0].length;
      var nRowCount = anAllValueIndices.length;

      var nFieldWidth;
      var sFieldValue;
      var nCol = anFieldWidths.length;

      // Extend the array with field width numbers with automatic determined
      // field widths if the array with the number of field widths contains
      // less values than data values in first row of CSV file.
      while (nCol < nColCount)
      {
         nFieldWidth = 0;
         for (nRow = 0; nRow < nRowCount; nRow++)
         {
            if (nCol < anAllValueIndices[nRow].length)
            {
               sFieldValue = UltraEdit.activeDocument.selection.substring(anAllValueIndices[nRow][nCol][0],anAllValueIndices[nRow][nCol][1]);

               sFieldValue = sFieldValue.replace(/^\"(.*)\"$/,"$1");
               sFieldValue = sFieldValue.replace(/\"\"/g,"");

               if (nMultiLineValue) // Is there any multi-line value in file?
               {
                  sFieldValue = sFieldValue.replace(/\r/g,sCarriageReturn);
                  sFieldValue = sFieldValue.replace(/\n/g,sLineFeed);
               }
               if (sFieldValue.length > nFieldWidth)
               {
                  nFieldWidth = sFieldValue.length;
               }
            }
         }

         if (nCol < (nColCount-1))  // Add number of additional spaces
         {                          // except on last data column.
            nFieldWidth += nAutoWidthSpaces;
         }
         anFieldWidths.push(nFieldWidth);

         nCol++;
         bShowOutput = true;
         UltraEdit.outputWindow.write("Field width for data column " + nCol +
                                      " determined with " + nFieldWidth + " characters.");
      }

      // Build string with just spaces for each data column for fast
      // extending with spaces later each value to required length.
      var asFieldWidths = new Array(nColCount);
      for (nCol = 0; nCol < nColCount; nCol++)
      {
         var sSpaces = "";
         nFieldWidth = anFieldWidths[nCol];
         while (nFieldWidth)
         {
            sSpaces += ' ';
            nFieldWidth--;
         }
         asFieldWidths[nCol] = sSpaces;
      }

      // Convert the lines in CSV file to fixed width lines now.
      var sValues = (nColCount != 1) ? " values." : " value.";
      var asRows = new Array(nRowCount);

      for (nRow = 0; nRow < nRowCount; nRow++)
      {
         // Verify the number of values in that row.
         var nValueCount = nColCount;
         if (nValueCount > anAllValueIndices[nRow].length)
         {
            bShowOutput = true;
            nValueCount = anAllValueIndices[nRow].length;
            UltraEdit.outputWindow.write("Warning: Line " + (nRow+1) +
                                         " has just " + nValueCount +
                                         " instead of " + nColCount + sValues);
         }
         else if (nValueCount < anAllValueIndices[nRow].length)
         {
            bShowOutput = true;
            UltraEdit.outputWindow.write("Warning: Line " + (nRow+1) + " has " +
                                         anAllValueIndices[nRow].length +
                                         " instead of " + nColCount + sValues);
         }

         var sDataRow = "";
         for (nCol = 0; nCol < nValueCount; nCol++)
         {
            nFieldWidth = anFieldWidths[nCol];
            sFieldValue = UltraEdit.activeDocument.selection.substring(anAllValueIndices[nRow][nCol][0],anAllValueIndices[nRow][nCol][1]);

            sFieldValue = sFieldValue.replace(/^\"(.*)\"$/,"$1");
            sFieldValue = sFieldValue.replace(/\"\"/g,"");

            if (nMultiLineValue) // Is there any multi-line value in file?
            {
               var sReplacedValue;
               if (nMultiLineValue & 1)
               {
                  sReplacedValue = sFieldValue.replace(/\r/g,sCarriageReturn);
                  if (sReplacedValue != sFieldValue)
                  {
                     bShowOutput = true;
                     UltraEdit.outputWindow.write("Replaced in data row " + (nRow+1) + " in value " +
                                                  (nCol+1) + " at least one carriage return by '" +
                                                  sCarriageReturn + "'.");
                     sFieldValue = sReplacedValue;
                  }
               }
               if (nMultiLineValue & 2)
               {
                  sReplacedValue = sFieldValue.replace(/\n/g,sLineFeed);
                  if (sReplacedValue != sFieldValue)
                  {
                     bShowOutput = true;
                     UltraEdit.outputWindow.write("Replaced in data row " + (nRow+1) + " in value " +
                                                  (nCol+1) + " at least one line-feed by '" +
                                                  sLineFeed + "'.");
                     sFieldValue = sReplacedValue;
                  }
               }
            }

            if (sFieldValue.length > nFieldWidth)
            {
               bShowOutput = true;
               UltraEdit.outputWindow.write("Warning: Truncated in data row " + (nRow+1) +
                                            " value " + (nCol+1) + " from " +
                                            sFieldValue.length + " to " +
                                            nFieldWidth + " character" +
                                            ((nFieldWidth != 1) ? "s." : "."));
               sFieldValue = (nFieldWidth > 0) ? sFieldValue.substr(0,nFieldWidth) : "";
            }

            if (sFieldValue.length < nFieldWidth)
            {
               sFieldValue += asFieldWidths[nCol].substr(sFieldValue.length);
            }

            sDataRow += sFieldValue;
         }

         while (nCol < nColCount)
         {
            // Add spaces for each value missing in that row.
            sDataRow += asFieldWidths[nCol];
            nCol++;
         }

         // Add this row to the array of reformatted rows.
         asRows[nRow] = sDataRow;
      }

      // Determine line terminator type from active file.
      var sLineTerm;
      if (UltraEdit.activeDocument.lineTerminator < 1) sLineTerm = "\r\n";
      else if (UltraEdit.activeDocument.lineTerminator == 1) sLineTerm = "\n";
      else sLineTerm = "\r";

      if (bNewFile) // Write the reformatted lines into a new file?
      {
         // Replace the file extension of active file by ".txt".
         // This quick and dirty solution works only for files
         // which really have a file extension.
         var sNewFileName = UltraEdit.activeDocument.path.replace(/\.[^.\\]+$/,".txt");
         // Append .txt if new file name is identical to active file name.
         if (sNewFileName == UltraEdit.activeDocument.path) sNewFileName += ".txt";

         // Cancel the selection in input file.
         UltraEdit.activeDocument.top();

         // Create a new file with same line ending type as input file.
         UltraEdit.newFile();
         UltraEdit.activeDocument.unixMacToDos();
         if (sLineTerm == "\n") UltraEdit.activeDocument.dosToUnix();
         else if (sLineTerm == "\r") UltraEdit.activeDocument.dosToMac();

         UltraEdit.activeDocument.write(asRows.join(sLineTerm)+sLineTerm);
         UltraEdit.saveAs(sNewFileName);
         UltraEdit.closeFile(UltraEdit.activeDocument.path,2);
      }
      else
      {
         // Overwrite the entire data in input file.
         UltraEdit.activeDocument.write(asRows.join(sLineTerm)+sLineTerm);
         // Close file with saving it.
         UltraEdit.closeFile(UltraEdit.activeDocument.path,1);
      }

      if (bShowOutput) UltraEdit.outputWindow.showWindow(true);
   }
}

Please read the comments at top of the script if you want to make easily changes on the output.

The active file on execution of the script is first modified by the Perl regular expression replace all and then a new file is created with the data converted to fixed-width. The new file is saved with same name as active file into the same directory as active file with file extension replaced by .txt respectively with .txt appended if the active file has the file extension .txt. The new file with fixed-width data is closed automatically by the script and so just the active file remains opened with the modification made by the case-sensitive Perl regular expression replace all.

sdnbhd · Nov 26, 2021#32021-11-26T08:49+00:00

Hi @Mofi thank you very much for your help! Really appreciate it!

Below is the sample of my code, found got one different format type (first row)

Code: Select all

786579734497:768796678888747:Nana:Le-Roy:laki-laki:Kumasi:Cape:Lajang::"1/1/0001 12:00:00 AM"03/17/1996
879899834569:940239023808243:Adelaide:Chloe::::::1/1/0001 12:00:00 AM::
667755634560:635298544373715:Godsway:Amedome::::Lajang::1/1/0001 12:00:00 AM::
787888664559:935534723298821:Nnan:Qwame Abrantepa:laki-laki:Kumasi:::Student:1/1/0001 12:00:00 AM::
657766654558:465344555646913:Fouseni:Gyui::Accra::::1/1/0001 12:00:00 AM::
458556734544:877556896580782:Augustina:Owusua:perempuan:::::1/1/0001 12:00:00 AM::
233233834536:789999742727084:Dramani:Ibrahim:laki-laki:::::1/1/0001 12:00:00 AM::

When I executed with the script, it appear like this (with double quote and date):

Code: Select all

COLUMNA       COLUMNB          COLUMNC    COLUMND          COLUMNE    COLUMNF           COLUMNG  COLUMNH   COLUMNI
786579734497  768796678888747  Nana       Le-Roy           laki-laki  Kumasi            Cape     Lajang    "1/1/0001 12:00:00 AM"03/17/1996
879899834569  940239023808243  Adelaide   Chloe                                                            1/1/0001 12:00:00 AM
667755634560  635298544373715  Godsway    Amedome                                                Lajang    1/1/0001 12:00:00 AM
787888664559  935534723298821  Nnan       Qwame Abrantepa  laki-laki  Kumasi            Student            1/1/0001 12:00:00 AM
657766654558  465344555646913  Fouseni    Gyui                        Accra                                1/1/0001 12:00:00 AM
458556734544  877556896580782  Augustina  Owusua           perempuan                                       1/1/0001 12:00:00 AM
233233834536  789999742727084  Dramani    Ibrahim          laki-laki                                       1/1/0001 12:00:00 AM

My desired output as below:

Code: Select all

COLUMNA       COLUMNB          COLUMNC    COLUMND          COLUMNE    COLUMNF           COLUMNG  COLUMNH   COLUMNI               COLUMNJ
786579734497  768796678888747  Nana       Le-Roy           laki-laki  Kumasi            Cape     Lajang    1/1/0001 12:00:00 AM  03/17/1996
879899834569  940239023808243  Adelaide   Chloe                                                            1/1/0001 12:00:00 AM
667755634560  635298544373715  Godsway    Amedome                                                Lajang    1/1/0001 12:00:00 AM
787888664559  935534723298821  Nnan       Qwame Abrantepa  laki-laki  Kumasi            Student            1/1/0001 12:00:00 AM
657766654558  465344555646913  Fouseni    Gyui                        Accra                                1/1/0001 12:00:00 AM
458556734544  877556896580782  Augustina  Owusua           perempuan                                       1/1/0001 12:00:00 AM
233233834536  789999742727084  Dramani    Ibrahim          laki-laki                                       1/1/0001 12:00:00 AM

Seeking for your help on this, sorry I am very new to this. Appreciate for your help, thanks

Thanks a lot for your help! But this doesn't seems perform for large file?

Mofi · Nov 30, 2021#42021-11-30T08:20+00:00

I updated the script code in my first post to support also the second variant with one more data column with a date value in addition to first variant.

It is right that the script is not designed for large CSV data on which it is not possible to load the entire data into memory of JavaScript core interpreter inside UltraEdit. It was not anything written about the amount of data to handle by the script.

It is possible to use just the first part of the script with the Perl regular expression replaces to reformat the two variants of the CSV file with colon as separator to valid CSV files and use next manually the built-in command Convert to fixed-width of UltraEdit which can be used also on very large files with hundreds of MBs or even some GB. In this case the script would be just:

Code: Select all

if (UltraEdit.document.length > 0)  // Is any file opened?
{
   // Define environment for this script.
   UltraEdit.insertMode();

   if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
   else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();

   // Run first a case-sensitive Perl regular expression replace all
   // searching for date/time with one more date appended and split
   // that data value up into two data values if existing at all.
   UltraEdit.activeDocument.top();
   UltraEdit.perlReOn();
   UltraEdit.activeDocument.findReplace.mode=0;
   UltraEdit.activeDocument.findReplace.matchCase=true;
   UltraEdit.activeDocument.findReplace.matchWord=false;
   UltraEdit.activeDocument.findReplace.regExp=true;
   UltraEdit.activeDocument.findReplace.searchDown=true;
   if (typeof(UltraEdit.activeDocument.findReplace.searchInColumn) == "boolean")
   {
      UltraEdit.activeDocument.findReplace.searchInColumn=false;
   }
   UltraEdit.activeDocument.findReplace.preserveCase=false;
   UltraEdit.activeDocument.findReplace.replaceAll=true;
   UltraEdit.activeDocument.findReplace.replaceInAllOpen=false;
   UltraEdit.activeDocument.findReplace.replace('"?\\b(\\d+/\\d+/\\d+ [0-1][0-9]:[0-5][0-9]:[0-5][0-9] [AP]M)[\\t ":]*?(\\d+/\\d+/\\d+):*$', '"\\1":\\2');
   if (UltraEdit.activeDocument.isFound())
   {
      UltraEdit.activeDocument.top();
      // Enclose the date/time values in double quotes and
      // replace all colons at end of each line by one colon.
      UltraEdit.activeDocument.findReplace.replace('"?\\b(\\d+/\\d+/\\d+ [0-1][0-9]:[0-5][0-9]:[0-5][0-9] [AP]M)"?:+$', '"\\1":');
   }
   else  // Different data with no additional date column.
   {
      // Enclose the date/time values in double quotes and
      // remove the last two colons at end of each line.
      UltraEdit.activeDocument.findReplace.replace('"?\\b(\\d+/\\d+/\\d+ [0-1][0-9]:[0-5][0-9]:[0-5][0-9] [AP]M)"?::', '"\\1"');
   }
}

The complete script as it is currently coded is optimized for reformatting files with just some MB done completely in internal memory of JavaScript core interpreter.