Given an input like this
this,is,a,test,string,containing,multiple
lines,of,string,with,numb3rs,and,w0rds
I want to delete every second and fourth word in each line using sed. Words are strictly alphanumeric.
ACCEPTED]
The most natural tool for this is
cut
[1].
cut -d , -f 1,3,5-
With sed, use \([^,]*,\) to match one field.
sed 's/^\([^,]*,\)\([^,]*,\)\([^,]*,\)\([^,]*,\)/\1\3/'
[1] https://www.gnu.org/software/coreutils/manual/html_node/cut-invocation.html#cut-invocationsed 's/^\([^,]*,\)[^,]*,\([^,]*,\)[^,]*,/\1\2/'. Note that the sed ones assume all lines have at least 5 fields. sed 's/,[^,]*//3; s/,[^,]*//' would take care of those. - Stéphane Chazelas
If you just want to remove the 2nd and 4th fields on each line, you can do:
$ perl -F, -lane 'print join ",", @F[0,2,4..$#F]' file
this,a,string,containing,multiple
lines,string,numb3rs,and,w0rds
The -n tells perl to read an input file and apply the script given by -e to each line. The -a causes perl to act like awk and split its input on the character given by -F and save the result in the array @F. Then, join ",",@F[0,2,4..$#F]' makes a new string by joining the 1st and 3rd fields (arrays start from 0) and then the 5th field and everything else until the end of the array ($#F is the highest index in the array), and the print print this string.
it's not sed, but you can use Miller (https://github.com/johnkerl/miller) and run
<input mlr --csv -N unsparsify then cut -x -f 2,4
to have
this,a,string,containing,multiple
lines,string,numb3rs,and,w0rds
awk '{ split($0,a,","); delete a[4]; delete a[2]; for (i=1;i<=length(a); i++){ if(a[i]!="") printf "%s,", a[i] }; printf "\n";}' inputfile
This second one does not work, despite this line in man gawk:
Assigning a value to an existing field causes the whole record to be rebuilt when $0 is referenced. Similarly, assigning a value to $0 causes the record to be resplit, creating new values for the fields.
gawk 'BEGIN{ FS=","; OFS="," }{ $2=""; $4=""; a=$0; $0=a; print $0 }' inputfile
EDIT: Above does not work because of the FS and gow awk handles them, and that's why this works:
gawk 'BEGIN{ FS=","; OFS="," }{ gsub(FS $2,""); gsub(FS $4,""); print $0 }'
output:
this,a,test,containing,multiple
lines,string,with,and,w0rds
this,,a,,string,, so two commas after this, which is same as my last version. - Luuk
Using only sed as requested:
$ sed 's/,[^,]*//1;s/,[^,]*//2' file1
this,a,string,containing,multiple
lines,string,numb3rs,and,w0rds
awk 'BEGIN{FS=",";OFS=","}{$2=$4="\b";print $0}' file
$2 is rendered, the cursor goes one position back, and the next comma overwrites the previous one. But if you had to remove the last column and use $NF="\b", the last comma would remain, because there are no more characters to overwrite it. Or for other cases, it could have unexpected behaviour. - thanasisp
Using Raku (formerly known as Perl_6)
~$ raku -ne 'put join ",", .split(",")[0,2,4..*-1];' file
Above is an answer written in Raku, a member of the Perl-family of programming languages. Among other things, Raku features high-level support for Unicode.
-necommand line flags, which invokes Raku's awk-like non-autoprinting linewise mode.$_.split, which can be shortened to just .split.[0,2,4..*-1]index to pull out desired elements, which are joined and output.Sample Input:
this,is,a,test,string,containing,multiple
lines,of,string,with,numb3rs,and,w0rds
Sample Output:
this,a,string,containing,multiple
lines,string,numb3rs,and,w0rds