Tapatalk

How to remove line breaks in a numbered list?

How to remove line breaks in a numbered list?

1
NewbieNewbie
1

    Oct 28, 2014#1

    I converted a PDF into a text file, and the result is that most numbered lists have multiple line breaks. I would like each numbered item to be on its own line, regardless of the line length. I'm trying to insert the lines as records into a database.

    Example: I want to convert

    Code: Select all

    1. some text
     and more text
     and a little more
    2. more stuff
    3. other stuff
    and some
    more other stuff
    to

    Code: Select all

    1. some text and more text and a little more
    2. more stuff
    3. other stuff and some more other stuff
    Any help would be greatly appreciated.

    11327
    MasterMaster
    11327

      Oct 29, 2014#2

      Search->Replace
      Check Regular expressions: and choose Perl
      Check Replace all from top of file

      Step One
      Find what: \r\n
      Replace with: space (symbol with ASCII code 32)

      Step Two
      Check Regular expressions: and choose Perl
      Check Replace all from top of file

      Find what: (.)(\d+\.)
      Replace with: $1\r\n$2
      It's impossible to lead us astray for we don't care even to choose the way.

      6,685587
      Grand MasterGrand Master
      6,685587

        Oct 29, 2014#3

        A perhaps better one step solution. Run a Perl Regular Expression replace searching for [\t ]*\r?\n[\t ]*(?![\r\n\d]) and using a single space as replace string.
        Best regards from an UC/UE/UES for Windows user from Austria

        11327
        MasterMaster
        11327

          Oct 30, 2014#4

          Mofi wrote:A perhaps better one step solution.
          Yes, indeed!!! Could you please give some brief explanations for "[\t ]*\r?\n[\t ]*(?![\r\n\d])". I nearly understood this regex, but I want full clarity.
          Thank you for your help!
          It's impossible to lead us astray for we don't care even to choose the way.

          6,685587
          Grand MasterGrand Master
          6,685587

            Oct 30, 2014#5

            [\t ]* ... find zero or more tabs or spaces left to

            \r?\n ... an optional carriage return and a line-feed (= DOS or UNIX line terminator, but not MAC) and

            [\t ]* ... find zero or more tabs or spaces at beginning of next line

            (?![\r\n\d]) ... which starts NOT with a carriage return, line-feed or a digit after optional tabs and spaces. This is a negative lookahead.

            This expression does not work for something like

            Code: Select all

            4. I have a
             6 years old son.
            5. His name is John.
            The second line starts also with a digit. A lookahead or lookbehind must be of fixed length. Therefore a negative lookahead could not be used to check if there is a dot after the number with 1 or more digits.

            But it would be possible to use multiple negative lookaheads. So an even better search string for this task would be [\t ]*\r?\n[\t ]*(?![\r\n])(?!\d\.)(?!\d\d\.)(?!\d\d\d\.) if maximum number of the list (= maximum number of digits, here 3) is known.
            Best regards from an UC/UE/UES for Windows user from Austria

            11327
            MasterMaster
            11327

              Oct 30, 2014#6

              Thank you VERY MUCH, Mofi!!! Very useful answer!
              It's impossible to lead us astray for we don't care even to choose the way.