Converting CSV data to fixed width data?

Converting CSV data to fixed width data?

12
Basic UserBasic User
12

    May 14, 2018#1

    Hi all,

    I have a list that needs to go to a warehouse. The warehouse has an odd column layout that goes something like this:
    • Data column 1: W with a width always just one character
    • Data column 2: Name with a width of fixed 30 characters
    • Data column 3: Street address with a width of fixed 30 characters
    • Data column 4: City with a width of fixed 20 characters
    • Data column 5: St with a width of fixed 2 characters
    • Data column 6: ZIP with a width of fixed 5 characters
    The end result looks like this:

    Code: Select all

    WNAME.                         1234 anyplace street.               Anytown.               GE55512
    I have my macro parsing the CSV so it looks right and columns go in the right order, but I don't see how to apply a column structure automatically within the macro.
    Hope this makes sense, but how do I make it so the text after the first comma starts at space #2 and the text after comma #2 starts at space #32, etc.?

    6,686585
    Grand MasterGrand Master
    6,686585

      May 15, 2018#2

      Do you know the command Convert to Fixed Column (in menu Column on using traditional menus)?

      This command is exactly what you need to convert the CSV file using comma as separator into a fixed width file with a fixed width for each data column.

      This command is not available as macro or scripting command most likely because of the number of options and the individual list of field widths.

      I have not yet coded an UltraEdit script as replacement for command Convert to Fixed Column. I have coded a script for the opposite command Convert to Character Delimited.

      A conversion from comma (character) separated to fixed width could be perhaps done with an UltraEdit macro with some regular expression replaces, but it would be better to code this task as UltraEdit script because of a script has support for string variables and mathematical operations to take individual lengths of the data into account on inserting the right number of spaces for each data on each row instead of the separator.

      However, use the command Convert to Fixed Column if you don't need to make this conversion automated in future without user interaction.
      Best regards from an UC/UE/UES for Windows user from Austria

      12
      Basic UserBasic User
      12

        May 15, 2018#3

        Hi Mofi,

        As always, thank you so much for your helpful information!

        I did find both the convert to fixed column (which works very well). I also found your fixed column to CSV example and tried to reverse it.  

        When you said:
        ... UltraEdit script because of a script has support for string variables and mathematical operations to take individual lengths of the data into account on inserting the right number of spaces for each data on each row instead of the separator.
        I didn't consider macros and scripts to be separate concepts. 😆 I am open to doing either, whichever gets the job done. The client wants this process to be automated. If you feel like messing with it, could you please give me a script example of how I would convert a CSV column1, column2, column3 where column1 can have 2 characters, #2 can have 20 and #3 can have 10?

        Code: Select all

        Col1(2spaces),Col2(20spaces),Col3(10spaces)
        State,Street address,last name
        
        end result would be:
        GA123Anyplacestreet   Smith     
        Thanks again!

        6,686585
        Grand MasterGrand Master
        6,686585

          May 16, 2018#4

          Here is the script replacement for command Convert to fixed column.

          Please read the comments above the variables at top of the script which have to be configured to customize the script for the CSV to fixed width conversion.

          Code: Select all

          // Character code value of separator. Use 0x2C for comma, 0x3B for
          // semicolon, 0x7C for vertical bar, or 0x09 for a horizontal tab.
          var nSeparatorCode = 0x2C;
          
          // Array of comma separated list of field widths. For example [5,10,20]
          // means first data column should have a width of 5 characters, second
          // of 10 characters and third of 20 characters. The number of values in
          // first line of CSV file determine the number of data columns. If the
          // field widths array is empty or contains less field width numbers than
          // values in first line of CSV file, the script automatically determines
          // the required width for each data column depending on longest value in
          // each data column on which the field width is not defined here at top.
          var anFieldWidths = [];
          
          // Number of spaces to add to automatic determined width of a data column.
          // For example if longest value in a data column has 22 characters and
          // the value for this variable is 3, the field width of this data column
          // is 25. The variable value must be 0 or greater. This variable is used
          // only for field widths automatic determined by the script, not for the
          // widths defined in the array above. The number of additional spaces is
          // never added to last field width.
          var nAutoWidthSpaces = 3;
          
          // Character as replacement for a carriage return and character as
          // replacement for a line-feed in a field value. Each string can be also
          // an empty string to remove all carriage returns and line-feeds from
          // field values. Also a string with more than one character is possible.
          var sCarriageReturn = "«";
          var sLineFeed = "»";
          
          // Write the reformatted lines into a new file or overwrite active file.
          var bNewFile = true;
          
          
          if (UltraEdit.document.length > 0)  // Is any file opened?
          {
             // Define environment for this script.
             UltraEdit.insertMode();
             if (typeof(UltraEdit.columnModeOff) == "function") UltraEdit.columnModeOff();
             else if (typeof(UltraEdit.activeDocument.columnModeOff) == "function") UltraEdit.activeDocument.columnModeOff();
          
             UltraEdit.activeDocument.selectAll();
             if (UltraEdit.activeDocument.isSel())
             {
                // Parse the selected CSV data character by character to support also
                // 1 or more delimiters as well as newline characters within a double
                // quoted value. Additionally just store in memory on which indices in
                // CSV character stream the data values start and end instead of using
                // array of strings to avoid copying the strings in memory as much as
                // possible for more efficient processing of the CSV data.
          
                var bShowOutput = false;     // Show output window after finishing conversion.
                var bNewDataRow = false;     // For detection of a new data row.
                var bWithinQuotes = false;   // For detection of double quoted values.
                var anValueIndices = [0,0];  // Start and end indices of value in stream.
                var anRowValueIndices = [];  // Array of value indices pairs in a data row.
                var anAllValueIndices = [];  // Array of all data row arrays in entire file.
                var nMultiLineValue = 0;     // Is there any multi-line value in CSV file?
          
                var nCharIndex = 0;
                do
                {
                   var nCharCode = UltraEdit.activeDocument.selection.charCodeAt(nCharIndex);
                   if (!bWithinQuotes)
                   {
                      // Is the current character the separator?
                      if (nCharCode == nSeparatorCode)
                      {
                         anValueIndices[1] = nCharIndex;
                         anRowValueIndices.push(anValueIndices);
                         anValueIndices = [nCharIndex+1,nCharIndex+1];
                      }
                      // Is the current character a carriage return or a line-feed?
                      else if ((nCharCode == 0x0D) || (nCharCode == 0x0A))
                      {
                         if (!bNewDataRow)
                         {
                            bNewDataRow = true;
                            anValueIndices[1] = nCharIndex;
                            anRowValueIndices.push(anValueIndices);
                            anAllValueIndices.push(anRowValueIndices);
                            anRowValueIndices = [];
                         }
                         anValueIndices = [nCharIndex+1,nCharIndex+1];
                         continue;
                      }
                      else if (nCharCode == 0x22)
                      {
                         bWithinQuotes = true;
                      }
                      bNewDataRow = false;
                   }
                   else  // Everything enclosed in double quotes must be interpreted
                   {     // as part of data value including delimiter and newline
                         // character until next double quote character is found.
                      if (nCharCode == 0x22)
                      {
                         bWithinQuotes = false;
                      }
                      else if (nCharCode == 0x0D)
                      {
                         nMultiLineValue |= 1;
                      }
                      else if (nCharCode == 0x0A)
                      {
                         nMultiLineValue |= 2;
                      }
                   }
                }
                while (++nCharIndex < UltraEdit.activeDocument.selection.length);
          
                // Does the file not end with newline characters?
                if (!bNewDataRow)
                {
                   anValueIndices[1] = nCharIndex;
                   anRowValueIndices.push(anValueIndices);
                   anAllValueIndices.push(anRowValueIndices);
                }
          
          //      UltraEdit.outputWindow.showWindow(true);
          //      UltraEdit.outputWindow.write("\nDump of array anAllValueIndices:\n");
          //      var_dump(anAllValueIndices);
          //      for (var nLine = 0; nLine < anAllValueIndices.length; nLine++)
          //      {
          //         var sPluralS = (anAllValueIndices[nLine].length != 1) ? "s" : " ";
          //         var sLineData = "The " + anAllValueIndices[nLine].length.toString(10)+
          //                         " value indices pair" + sPluralS + " of data row " +
          //                         (nLine+1).toString(10) + ": ";
          //         for (var nPair = 0; nPair < anAllValueIndices[nLine].length; nPair++)
          //         {
          //            if (nPair != 0) sLineData += ", ";
          //            sLineData += anAllValueIndices[nLine][nPair][0].toString(10);
          //            sLineData += '-';
          //            sLineData += anAllValueIndices[nLine][nPair][1].toString(10);
          //         }
          //         UltraEdit.outputWindow.write(sLineData);
          //      }
          //      UltraEdit.outputWindow.write("");
          
                // The number of values in first row is the number of data columns.
                // The number of rows in file is the number of data rows in output.
                var nColCount = anAllValueIndices[0].length;
                var nRowCount = anAllValueIndices.length;
          
                var nFieldWidth;
                var sFieldValue;
                var nCol = anFieldWidths.length;
          
                // Extend the array with field width numbers with automatic determined
                // field widths if the array with the number of field widths contains
                // less values than data values in first row of CSV file.
                while (nCol < nColCount)
                {
                   nFieldWidth = 0;
                   for (nRow = 0; nRow < nRowCount; nRow++)
                   {
                      if (nCol < anAllValueIndices[nRow].length)
                      {
                         sFieldValue = UltraEdit.activeDocument.selection.substring(anAllValueIndices[nRow][nCol][0],anAllValueIndices[nRow][nCol][1]);
          
                         sFieldValue = sFieldValue.replace(/^\"(.*)\"$/,"$1");
                         sFieldValue = sFieldValue.replace(/\"\"/g,"");
          
                         if (nMultiLineValue) // Is there any multi-line value in file?
                         {
                            sFieldValue = sFieldValue.replace(/\r/g,sCarriageReturn);
                            sFieldValue = sFieldValue.replace(/\n/g,sLineFeed);
                         }
                         if (sFieldValue.length > nFieldWidth)
                         {
                            nFieldWidth = sFieldValue.length;
                         }
                      }
                   }
          
                   if (nCol < (nColCount-1))  // Add number of additional spaces
                   {                          // except on last data column.
                      nFieldWidth += nAutoWidthSpaces;
                   }
                   anFieldWidths.push(nFieldWidth);
          
                   nCol++;
                   bShowOutput = true;
                   UltraEdit.outputWindow.write("Field width for data column " + nCol +
                                                " determined with " + nFieldWidth + " characters.");
                }
          
                // Build string with just spaces for each data column for fast
                // extending with spaces later each value to required length.
                var asFieldWidths = new Array(nColCount);
                for (nCol = 0; nCol < nColCount; nCol++)
                {
                   var sSpaces = "";
                   nFieldWidth = anFieldWidths[nCol];
                   while (nFieldWidth)
                   {
                      sSpaces += ' ';
                      nFieldWidth--;
                   }
                   asFieldWidths[nCol] = sSpaces;
                }
          
                // Convert the lines in CSV file to fixed width lines now.
                var sValues = (nColCount != 1) ? " values." : " value.";
                var asRows = new Array(nRowCount);
          
                for (nRow = 0; nRow < nRowCount; nRow++)
                {
                   // Verify the number of values in that row.
                   var nValueCount = nColCount;
                   if (nValueCount > anAllValueIndices[nRow].length)
                   {
                      bShowOutput = true;
                      nValueCount = anAllValueIndices[nRow].length;
                      UltraEdit.outputWindow.write("Warning: Line " + (nRow+1) +
                                                   " has just " + nValueCount +
                                                   " instead of " + nColCount + sValues);
                   }
                   else if (nValueCount < anAllValueIndices[nRow].length)
                   {
                      bShowOutput = true;
                      UltraEdit.outputWindow.write("Warning: Line " + (nRow+1) + " has " +
                                                   anAllValueIndices[nRow].length +
                                                   " instead of " + nColCount + sValues);
                   }
          
                   var sDataRow = "";
                   for (nCol = 0; nCol < nValueCount; nCol++)
                   {
                      nFieldWidth = anFieldWidths[nCol];
                      sFieldValue = UltraEdit.activeDocument.selection.substring(anAllValueIndices[nRow][nCol][0],anAllValueIndices[nRow][nCol][1]);
          
                      sFieldValue = sFieldValue.replace(/^\"(.*)\"$/,"$1");
                      sFieldValue = sFieldValue.replace(/\"\"/g,"");
          
                      if (nMultiLineValue) // Is there any multi-line value in file?
                      {
                         var sReplacedValue;
                         if (nMultiLineValue & 1)
                         {
                            sReplacedValue = sFieldValue.replace(/\r/g,sCarriageReturn);
                            if (sReplacedValue != sFieldValue)
                            {
                               bShowOutput = true;
                               UltraEdit.outputWindow.write("Replaced in data row " + (nRow+1) + " in value " +
                                                            (nCol+1) + " at least one carriage return by '" +
                                                            sCarriageReturn + "'.");
                               sFieldValue = sReplacedValue;
                            }
                         }
                         if (nMultiLineValue & 2)
                         {
                            sReplacedValue = sFieldValue.replace(/\n/g,sLineFeed);
                            if (sReplacedValue != sFieldValue)
                            {
                               bShowOutput = true;
                               UltraEdit.outputWindow.write("Replaced in data row " + (nRow+1) + " in value " +
                                                            (nCol+1) + " at least one line-feed by '" +
                                                            sLineFeed + "'.");
                               sFieldValue = sReplacedValue;
                            }
                         }
                      }
          
                      if (sFieldValue.length > nFieldWidth)
                      {
                         bShowOutput = true;
                         UltraEdit.outputWindow.write("Warning: Truncated in data row " + (nRow+1) +
                                                      " value " + (nCol+1) + " from " +
                                                      sFieldValue.length + " to " +
                                                      nFieldWidth + " character" +
                                                      ((nFieldWidth != 1) ? "s." : "."));
                         sFieldValue = (nFieldWidth > 0) ? sFieldValue.substr(0,nFieldWidth) : "";
                      }
          
                      if (sFieldValue.length < nFieldWidth)
                      {
                         sFieldValue += asFieldWidths[nCol].substr(sFieldValue.length);
                      }
          
                      sDataRow += sFieldValue;
                   }
          
                   while (nCol < nColCount)
                   {
                      // Add spaces for each value missing in that row.
                      sDataRow += asFieldWidths[nCol];
                      nCol++;
                   }
          
                   // Add this row to the array of reformatted rows.
                   asRows[nRow] = sDataRow;
                }
          
                // Determine line terminator type from active file.
                var sLineTerm;
                if (UltraEdit.activeDocument.lineTerminator < 1) sLineTerm = "\r\n";
                else if (UltraEdit.activeDocument.lineTerminator == 1) sLineTerm = "\n";
                else sLineTerm = "\r";
          
                if (bNewFile) // Write the reformatted lines into a new file?
                {
                   // Replace the file extension of active file by ".txt".
                   // This quick and dirty solution works only for files
                   // which really have a file extension.
                   var sNewFileName = UltraEdit.activeDocument.path.replace(/\.[^.\\]+$/,".txt");
                   // Append .txt if new file name is identical to active file name.
                   if (sNewFileName == UltraEdit.activeDocument.path) sNewFileName += ".txt";
          
                   // Cancel the selection in input file.
                   UltraEdit.activeDocument.top();
          
                   // Create a new file with same line ending type as input file.
                   UltraEdit.newFile();
                   UltraEdit.activeDocument.unixMacToDos();
                   if (sLineTerm == "\n") UltraEdit.activeDocument.dosToUnix();
                   else if (sLineTerm == "\r") UltraEdit.activeDocument.dosToMac();
          
                   UltraEdit.activeDocument.write(asRows.join(sLineTerm)+sLineTerm);
                   UltraEdit.saveAs(sNewFileName);
                   UltraEdit.closeFile(UltraEdit.activeDocument.path,2);
                }
                else
                {
                   // Overwrite the entire data in input file.
                   UltraEdit.activeDocument.write(asRows.join(sLineTerm)+sLineTerm);
                   // Close file with saving it.
                   UltraEdit.closeFile(UltraEdit.activeDocument.path,1);
                }
          
                if (bShowOutput) UltraEdit.outputWindow.showWindow(true);
             }
          }
          
          This script is not designed for very large CSV files as it loads entire file into memory for reformatting the lines.

          The CSV parsing part is from Transpose 2 rows with 40 comma delimited values into 40 rows with 2 columns.
          Best regards from an UC/UE/UES for Windows user from Austria

          12
          Basic UserBasic User
          12

            May 17, 2018#5

            Mofi, I don't know how you do it, but in my sample export file it worked first time. Thanks!

            I am going to work with it so it will add stuff on different lines for the final file. Basically the first line is address, then second is item detail.

            Thanks again!

            6,686585
            Grand MasterGrand Master
            6,686585

              Nov 08, 2018#6

              I updated the script in my previous post. It is better to first replace in a multi-line value all carriage returns and line-feeds by their replacement strings before truncating a too long value. The updated script is optimized also a bit more.
              Best regards from an UC/UE/UES for Windows user from Austria