XenonSurf wrote:Is there any other method to select text in multiple lines from TAB to TAB without converting to fixed columns?
No. But CSV files can be easily edited using regular expression replaces. For example if you want just data column 3 and 5 from a CSV file with 8 data columns from line 500 to 1000, you can first copy the lines 500 to 1000 into a new file, and then use a tagged regular expression Replace All to delete all except data column 3 and 5. I use for the example the semicolon as separator character because tab characters are displayed as space(s) in browsers.
Code: Select all
data col1;data col2;data col3;data col4;data col5;data col6;data col7;data col8
data col1;;data col3;;data col5;;data col7;data col8
data col1;data col2;;data col4;data col5;data col6;data col7;data col8
The
Perl regular expression search string to delete everything except data column 3 and 5 in above CSV file with a semicolon as separator is
^(?:.*?;){2}(.*?;).*?;(.*?);.*$ and the replace string is
\1\2. The result is:
Code: Select all
data col3;data col5
data col3;data col5
;data col5
Explanation for the search string:
^ ... start search at beginning of a line.
.*? ... matches 0 or more characters of any value except line terminating characters carriage return and line-feed. The
. means any character except line terminating characters.
* is the multiplier and means 0 or more. The question mark after
.* tells the Perl regular expression engine to match as less as possible to next fixed character which is a semicolon on this example. Because
. matches also a semicolon, the question marks is needed to avoid matching to one of the later semicolons than the next one in the line.
; is the fixed separator character.
(?:
.*?;) ... the above explained expression is put into round brackets to build a group. This group is used only for repeating the expression as explained next.
?: immediately after opening round bracket tells the Perl regular expression not to tag the string found by the expression inside the bracket. This is called a non-capturing or non-tagging group. This will be more clear after reading complete explanation.
{2} ... means that the expression before in the non tagging group should be applied two times. Therefore data column 1 and 2 are matched by
(?:.*?;){2}.
(.*?;) ...
.*?; matches also 0 or more characters up to next semicolon as above. And also this expression is enclosed in round brackets. But the difference is the missing
?: immediately after opening round bracket. This means that the Perl regular expression engine should tag the string found by this part of the entire search expression. This part of the found string is referenced in the replace string by
\1.
.*?; ... a well known expression to match data column 4.
(.*?) ... matches data column 5, this time without the semicolon. Again the string found by this expression is tagged and as this is the second tagged group, this part of the found string is referenced in the replace string by
\2.
;.*$ ... matches the semicolon after the data of the fifth column and everything up to end of line without matching the line terminating characters itself.
So the search string matches always all within a line, but after replace only the data of column 3, the semicolon and the data of column 5 remain. Everything else is removed from all lines.
Instead of
; the expression
\t must be used for a CSV file using the tab character as separator.
That reads not very easy, I know. But after some practice you will find it very easy to apply the few regular expression patterns needed in various ways to reformat a CSV file according to your current requirements.