Line breaks in TXT files?

Line breaks in TXT files?

4

    Sep 05, 2017#1

    I'm trying to get text to display properly when converting a TXT file to PDF. However, when I used UltraEdit to add line-breaks inside the TXT file, the conversion results were even worse (there were even less line-breaks in the PDF, despite there being more in the TXT. (I'd attach the files so you all can see the results, but the TXT and PDF extensions apparently aren't allowed.

    Is there any way I can add line-breaks and/or paragraph breaks in a TXT file using UltraEdit that will carry-over when I convert from TXT to PDF?

    6,602548
    Grand MasterGrand Master
    6,602548

      Sep 06, 2017#2

      You can compress files into a ZIP or RAR archive and attach the archive file to a post as long as the file is smaller than 500 KiB.

      It looks like you don't know anything about line terminators in text files. So I need to give you a small lesson. There are 3 types of line terminators/endings:
      1. DOS ... carriage return + line-feed is the standard for text files on Microsoft DOS and Microsoft Windows.
      2. UNIX ... line-feed only is the standard for text files on UNIX, Linux and MAC since OS X.
      3. MAC ... carriage return only is the standard for text files on MAC before OS X.
      Many Windows applications support only text files with carriage return + line-feed as line terminator. UltraEdit supports all 3 line terminator types and detects them also automatically on opening a text file. UltraEdit indicates the line terminator type of active file in status bar at bottom of main UltraEdit window.

      So you insert a line terminator into the file according to line terminator type of active file on hitting key RETURN or ENTER. On converting the text file to a PDF file it depends on the used application which line terminator types it supports. It is best on Windows to make sure the text file has DOS (Windows) line terminators as many Windows applications supports only those type of line terminators.

      See also forum topic UE symbol explanations for line terminators / line endings.
      Best regards from an UC/UE/UES for Windows user from Austria

      4

        Sep 06, 2017#3

        I read through both of your links. Both the original.TXT and edit.TXT seem to having the same coding for their line-breaks (0D 0A). I also used the same program to convert them (PDF-TOOLS). However, one still converts very differently compared to the other. I'm not sure what aspect I'm missing about this. I've now attached all the files in a zip.
        Mofi wrote:Many Windows applications support only text files with carriage return + line-feed as line terminator. UltraEdit supports all 3 line terminator types and detects them also automatically on opening a text file. UltraEdit indicates the line terminator type of active file in status bar at bottom of main UltraEdit window.
        Do you mean the bar that says "1252 (ANSI - Latin 1)"?
        ue.zip (9.74 KiB)   40

        6,602548
        Grand MasterGrand Master
        6,602548

          Sep 07, 2017#4

          In both files most lines end with hexadecimal 0D 0A which is carriage return and line-feed. But in both files there are also lines ending with just 0A. So both text files are a mixture of DOS and UNIX. The PDF converter interprets only 0D 0A as line termination and ignores 0A without 0D before. It would be really interesting how these two text files were created because if the creation was done with an application, the application has same bugs.

          To solve this you have to configure UltraEdit as I use since more than 15 years: convert all 0D without next byte being 0A to 0D 0A and all 0A without preceding byte being 0D also to 0D 0A to have every file loaded as DOS/Windows text file.

          Open Advanced - Settings or Configuration - File Handling - DOS/Unix/Mac Handling and configure:
          1. Default file type for new files: DOS
          2. Unix/Mac file detection/conversion: Automatically convert to DOS format
          3. Only recognize DOS terminated lines (CR/LF) as new lines for editing: not checked
          4. Save file as input format (Unix/Mac/DOS): checked
          5. Status bar shows original line terminator format (on disk): checked
          And in configuration at File Handling - Conversions enable On paste convert line endings to destination type (Unix/Mac/DOS) to avoid pasting UNIX terminated lines copied to clipboard in a different application into the text file opened in UltraEdit with DOS terminated lines with just 0A as line ending. With this setting enabled UltraEdit makes the same line ending conversion on every paste as on opening a file.

          Then opening original.txt results in loading it as DOS/Windows text file as most lines end with carriage return and line-feed. Inserting a space, immediately removing it with key BACKSPACE and saving the modified file results in an increase in file size from 3317 bytes to 3348 bytes. This means UltraEdit has added on the 31 lines ending with just line-feed the missing carriage return. Also edit.txt is opened with these settings as DOS/Windows text file and saving it after a dummy modification results in increasing file size from 3275 to 3286 bytes, i.e. inserting additional 11 carriage return.

          Now both text files containing only 0D 0A as line ending can be converted to PDF and the PDF files has the same number of lines as the text file in UltraEdit.

          And yes, the status bar is the bar at bottom having the encoding selector and in the box before showing DOS or UNIX or MAC depending on (main) line ending type of active file.
          Best regards from an UC/UE/UES for Windows user from Austria

          4

            Sep 07, 2017#5

            I understand now, thank you!!! Who creates the hex codes? Are there any resources where I can learn about different codes and what they reference/do?
            Mofi wrote:It would be really interesting how these two text files were created because if the creation was done with an application, the application has same bugs.
            I got CSV dataset files from OKCUPID. I didn't create them, I only know that someone at OKCUPID did. I was simply handed them (figuratively speaking) for research projects. I'm just editing them (slightly) and organizing them for my own ease of use.

            6,602548
            Grand MasterGrand Master
            6,602548

              Sep 07, 2017#6

              The processor of a computer understands just 0s and 1s. How all those 0s and 1s in a file are interpreted depends on standards. How text is encoded with 0s and 1s is explained by the last referenced power tip page. This power tip page is very important for that reason.

              CSV files can contain values enclosed in double quotes with 1 or more line breaks. Those line breaks in a value are encoded often with just line-feed as line ending. For example Microsoft Excel creates such CSV files on export/save in CSV format. The data row ends with carriage return + line-feed.

              UltraEdit has the configuration setting Only recognize DOS terminated lines (CR/LF) as new lines which can be enabled to get CSV files with data rows ending with CR+LF displayed as lines in text mode even when data values have 1 or more line-feeds as line endings. The usage of this setting requires the selection of configuration setting Never prompt to convert files to DOS to avoid an automatic conversion of all line-feeds without a preceding carriage return to CR+LF.

              UltraEdit displays a line-feed without a carriage return with this special configuration in the text with whatever the used font has defined as glyph for this usually not displayed whitespace control character. Most fonts display a line-feed as rectangle in this mode.
              Best regards from an UC/UE/UES for Windows user from Austria

              4

                Sep 12, 2017#7

                Thanks so much for all the help and info.

                By the way, sorry for the late response. I wanted time to do more research. I not only read all the articles you linked me to, but many others as well. I have a much better understanding now. :)