Transpose 2 rows with 40 comma delimited values into 40 rows with 2 columns

GoalSetter40s · PostSep 01, 2017#12017-09-01T06:59+00:00

I have a two line CSV file that was output from a form. The first line is a comma delimited header row containing 40 field labels and the second line is the corresponding comma delimited 40 value data record row. I want to transpose this into 40 rows with 2 columns. I need to do this on a regular basis, so if possible I'd like to run a macro or a script. What's the best way to do this?

Mofi · PostSep 01, 2017#22017-09-01T08:20+00:00

Here is a very quick written script for this task without comments, without error checking and without parsing the CSV data right in case of comma or newline characters within a double quoted value.

Code: Select all

if (UltraEdit.document.length > 0)  // Is any file opened?
{
   // Define environment for this script.
   UltraEdit.insertMode();
   if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
   else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();

   UltraEdit.activeDocument.selectAll();
   if (UltraEdit.activeDocument.isSel())  // Is the file not empty?
   {
      var sLineTerm;
      if (UltraEdit.activeDocument.lineTerminator < 1) sLineTerm = "\r\n";
      else if (UltraEdit.activeDocument.lineTerminator == 1) sLineTerm = "\n";
      else sLineTerm = "\r";

      var asLines = UltraEdit.activeDocument.selection.split(sLineTerm);
      if (asLines[asLines.length-1] == "") asLines.pop();

      var asData = new Array(asLines.length);
      for (var nLine = 0; nLine < asLines.length; nLine++)
      {
         var asValues = asLines[nLine].split(',');
         asData[nLine] = asValues;
      }

      var sData = "";
      for (var nRow = 0; nRow < asData[0].length; nRow++)
      {
         sData += asData[0][nRow];
         for (var nLine = 1; nLine < asLines.length; nLine++)
         {
            sData += ',' + asData[nLine][nRow];
         }
         sData += sLineTerm;
      }

      UltraEdit.activeDocument.write(sData);
   }
}

Example input data:

Code: Select all

Label1,Label2,Label3,Label4,Label5
Data1,Data2,Data3,Data4,Data5
DataA,DataB,DataC,DataD,DataE

Example output data:

Code: Select all

Label1,Data1,DataA
Label2,Data2,DataB
Label3,Data3,DataC
Label4,Data4,DataD
Label5,Data5,DataE

It handles all variations of data columns and data rows as long as the file is not too large as it can be seen on the input/output example.

GoalSetter40s · PostSep 01, 2017#32017-09-01T16:47+00:00

This is working great. Thank you very much!

There are a few fields that have commas within them, so they are delimited with double quotes. Can you please make this small edit for me?

Mofi · PostSep 02, 2017#42017-09-02T14:13+00:00

Here is the small edited script to parse CSV data correct according to explanation of Comma-Separated Values format.

Code: Select all

if (UltraEdit.document.length > 0)  // Is any file opened?
{
   // Define environment for this script.
   UltraEdit.insertMode();
   if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
   else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();

   UltraEdit.activeDocument.selectAll();
   if (UltraEdit.activeDocument.isSel())
   {
      // Parse the selected CSV data character by character to support also
      // 1 or more delimiters as well as newline characters within a double
      // quoted value. Additionally just store in memory on which indices in
      // CSV character stream the data values start and end instead of using
      // array of strings to avoid copying the strings in memory as much as
      // possible for more efficient processing of the CSV data.

      var bNewDataRow = false;     // For detection of a new data row.
      var bWithinQuotes = false;   // For detection of double quoted values.
      var anValueIndices = [0,0];  // Start and end indices of value in stream.
      var anRowValueIndices = [];  // Array of value indices pairs in a data row.
      var anAllValueIndices = [];  // Array of all data row arrays in entire file.

      var nCharIndex = 0;
      do
      {
         var nCharCode = UltraEdit.activeDocument.selection.charCodeAt(nCharIndex);
         if (!bWithinQuotes)
         {
            // Is the current character a comma?
            if (nCharCode == 0x2C)
            {
               anValueIndices[1] = nCharIndex;
               anRowValueIndices.push(anValueIndices);
               anValueIndices = [nCharIndex+1,nCharIndex+1];
            }
            // Is the current character a carriage return or a line-feed?
            else if ((nCharCode == 0x0D) || (nCharCode == 0x0A))
            {
               if (!bNewDataRow)
               {
                  bNewDataRow = true;
                  anValueIndices[1] = nCharIndex;
                  anRowValueIndices.push(anValueIndices);
                  anAllValueIndices.push(anRowValueIndices);
                  anRowValueIndices = [];
               }
               anValueIndices = [nCharIndex+1,nCharIndex+1];
               continue;
            }
            else if (nCharCode == 0x22)
            {
               bWithinQuotes = true;
            }
            bNewDataRow = false;
         }
         else  // Everything enclosed in double quotes must be interpreted
         {     // as part of data value including delimiter and newline
               // character until next double quote character is found.
            if (nCharCode == 0x22)
            {
               bWithinQuotes = false;
            }
         }
      }
      while (++nCharIndex < UltraEdit.activeDocument.selection.length);

      // Does the file not end with newline characters?
      if (!bNewDataRow)
      {
         anValueIndices[1] = nCharIndex;
         anRowValueIndices.push(anValueIndices);
         anAllValueIndices.push(anRowValueIndices);
      }

//      UltraEdit.outputWindow.showWindow(true);
//      UltraEdit.outputWindow.write("\nDump of array anAllValueIndices:\n");
//      var_dump(anAllValueIndices);
//      for (var nLine = 0; nLine < anAllValueIndices.length; nLine++)
//      {
//         var sPluralS = (anAllValueIndices[nLine].length != 1) ? "s" : " ";
//         var sLineData = "The " + anAllValueIndices[nLine].length.toString(10)+
//                         " value indices pair" + sPluralS + " of line " +
//                         (nLine+1).toString(10) + ": ";
//         for (var nPair = 0; nPair < anAllValueIndices[nLine].length; nPair++)
//         {
//            if (nPair != 0) sLineData += ", ";
//            sLineData += anAllValueIndices[nLine][nPair][0].toString(10);
//            sLineData += '-';
//            sLineData += anAllValueIndices[nLine][nPair][1].toString(10);
//         }
//         UltraEdit.outputWindow.write(sLineData);
//      }
//      UltraEdit.outputWindow.write("");

      // Determine line terminator type from active file.
      var sLineTerm;
      if (UltraEdit.activeDocument.lineTerminator < 1) sLineTerm = "\r\n";
      else if (UltraEdit.activeDocument.lineTerminator == 1) sLineTerm = "\n";
      else sLineTerm = "\r";

      // The number of values in first row is the number of data rows in output.
      // The number of rows in file is the number of data columns in output.
      var nRowCount = anAllValueIndices[0].length;
      var nColCount = anAllValueIndices.length;

      // Convert data rows to data columns and data columns to data rows.
      var sData = "";
      for (var nRow = 0; nRow < nRowCount; nRow++)
      {
         sData += UltraEdit.activeDocument.selection.substring(anAllValueIndices[0][nRow][0],anAllValueIndices[0][nRow][1]);
         for (var nCol = 1; nCol < nColCount; nCol++)
         {
            sData += ',';
            if (nRow < anAllValueIndices[nCol].length)
            {
               sData += UltraEdit.activeDocument.selection.substring(anAllValueIndices[nCol][nRow][0],anAllValueIndices[nCol][nRow][1]);
            }
         }
         sData += sLineTerm;
      }

      // Write the reformatted data to file with overwriting existing selection.
      UltraEdit.activeDocument.write(sData);

      if (bWithinQuotes)   // Special error correction for double quote
      {                    // begin with no matching ending double quote.
         UltraEdit.activeDocument.key("UP ARROW");
         UltraEdit.activeDocument.key("END");
         UltraEdit.activeDocument.write('"');
         UltraEdit.activeDocument.bottom();
      }
   }
}

Example input data:

Code: Select all

Value 1,Value 2,Value 3,Value 4
It,"Why do","1,5","15:44:00"
is,so many,20,"15:44:02"
so,"applications fail",,"15:44:32"
easy,parsing CSV,"Value with
newline characters in string.","15:44:48"
!,"data right?","42""","15:44:54"

Example output data:

Code: Select all

Value 1,It,is,so,easy,!
Value 2,"Why do",so many,"applications fail",parsing CSV,"data right?"
Value 3,"1,5",20,,"Value with
newline characters in string.","42"""
Value 4,"15:44:00","15:44:02","15:44:32","15:44:48","15:44:54"

Some notes:

The script is still not written for CSV format validation according to RFC 4180. Point 5 of the definition is not checked. This could be added, but the purpose of this this script is definitely not validation of CSV data in file. The script could not automatically fix such errors. It could only detect and report it, and then break processing the data as correction must be done by a person.
Application programmers writing code for importing data of a CSV file without being able to import values containing newline characters in double quoted values should take a look on this simply code. It's really a shame for every programmer writing a CSV import function when a well formatted CSV file according to RFC 4180 can't be imported correct.
The script contains a small code block commented out with command Comment Add. I have extra added this code although not needed by myself on writing the script if somebody is interested in how this code works and wants to better understand what all those arrays are for in script code.

Transpose 2 rows with 40 comma delimited values into 40 rows with 2 columns

Transpose 2 rows with 40 comma delimited values into 40 rows with 2 columns

Choose Display Mode