share
Unix & LinuxHow can I delete from a file every second and fourth comma separated word using sed?
[0] [7] Paralyz3d
[2020-04-19 08:46:49]
[ text-processing sed ]
[ https://unix.stackexchange.com/questions/581068/how-can-i-delete-from-a-file-every-second-and-fourth-comma-separated-word-using ]

Given an input like this

this,is,a,test,string,containing,multiple
lines,of,string,with,numb3rs,and,w0rds

I want to delete every second and fourth word in each line using sed. Words are strictly alphanumeric.

"Every 2nd word" would include the 4th word. Do you mean "the 2nd and 4th word on each line"? - Kusalananda
Sorry, I did not specify more precisely. I meant exactly the 2nd and 4th word on each line. - Paralyz3d
Do you want to remove only the 2nd/4th word even though a field may contain multiple words? Or does a field only ever contain a single word? - Kusalananda
(1) Should the separating commas be removed, too? Please add a sample output. - FelixJN
(1) Please show us your expected output. "Every second word" means you want to delete the 2nd, 4th, 6th, 8th etc. "Every fourth" would mean deleting the 4th, 8th, 12th, 16th etc. Without an example output it's very hard to understand what you mean. - terdon
[+4] [2020-04-19 09:25:48] Gilles 'SO- stop being evil' [ACCEPTED]

The most natural tool for this is cut [1].

cut -d , -f 1,3,5-

With sed, use \([^,]*,\) to match one field.

sed 's/^\([^,]*,\)\([^,]*,\)\([^,]*,\)\([^,]*,\)/\1\3/'
[1] https://www.gnu.org/software/coreutils/manual/html_node/cut-invocation.html#cut-invocation

Or sed 's/^\([^,]*,\)[^,]*,\([^,]*,\)[^,]*,/\1\2/'. Note that the sed ones assume all lines have at least 5 fields. sed 's/,[^,]*//3; s/,[^,]*//' would take care of those. - Stéphane Chazelas
What if you want the second and the 113th ? Should you write a very long request? - Sandburg
1
[+2] [2020-04-19 13:20:47] terdon

If you just want to remove the 2nd and 4th fields on each line, you can do:

$ perl -F, -lane 'print join ",", @F[0,2,4..$#F]' file
this,a,string,containing,multiple
lines,string,numb3rs,and,w0rds

The -n tells perl to read an input file and apply the script given by -e to each line. The -a causes perl to act like awk and split its input on the character given by -F and save the result in the array @F. Then, join ",",@F[0,2,4..$#F]' makes a new string by joining the 1st and 3rd fields (arrays start from 0) and then the 5th field and everything else until the end of the array ($#F is the highest index in the array), and the print print this string.


2
[+1] [2020-04-19 09:10:57] aborruso

it's not sed, but you can use Miller (https://github.com/johnkerl/miller) and run

<input mlr --csv -N unsparsify then cut -x -f 2,4

to have

this,a,string,containing,multiple
lines,string,numb3rs,and,w0rds

3
[+1] [2020-04-19 09:42:17] Luuk
awk '{ split($0,a,","); delete a[4]; delete a[2]; for (i=1;i<=length(a); i++){ if(a[i]!="") printf "%s,", a[i] }; printf "\n";}' inputfile

This second one does not work, despite this line in man gawk:

Assigning a value to an existing field causes the whole record to be rebuilt when $0 is referenced. Similarly, assigning a value to $0 causes the record to be resplit, creating new values for the fields.

gawk 'BEGIN{ FS=","; OFS="," }{ $2=""; $4=""; a=$0; $0=a; print $0 }' inputfile

EDIT: Above does not work because of the FS and gow awk handles them, and that's why this works:

gawk 'BEGIN{ FS=","; OFS="," }{ gsub(FS $2,""); gsub(FS $4,""); print $0 }'

output:

this,a,test,containing,multiple
lines,string,with,and,w0rds

@Cyrus, but this, indeed shorter, version will output this,,a,,string,, so two commas after this, which is same as my last version. - Luuk
added an edit with working second version... - Luuk
4
[+1] [2025-12-12 10:49:03] canupseq

Using only sed as requested:

$ sed 's/,[^,]*//1;s/,[^,]*//2' file1 
this,a,string,containing,multiple
lines,string,numb3rs,and,w0rds

5
[0] [2022-04-18 08:59:40] mrqiao001
awk 'BEGIN{FS=",";OFS=","}{$2=$4="\b";print $0}' file

This works for this case, where inner fields are deleted, including the delimiters. But in general, the backspace character is moving the cursor back, when the output is rendered, not actually deleting content. Here, when $2 is rendered, the cursor goes one position back, and the next comma overwrites the previous one. But if you had to remove the last column and use $NF="\b", the last comma would remain, because there are no more characters to overwrite it. Or for other cases, it could have unexpected behaviour. - thanasisp
6
[0] [2025-10-04 08:31:28] jubilatious1

Using Raku (formerly known as Perl_6)

~$ raku -ne 'put join ",", .split(",")[0,2,4..*-1];'  file

Above is an answer written in Raku, a member of the Perl-family of programming languages. Among other things, Raku features high-level support for Unicode.

  • We start by using the -necommand line flags, which invokes Raku's awk-like non-autoprinting linewise mode.
  • The input is taken by $_.split, which can be shortened to just .split.
  • Once you split on comma, you can [0,2,4..*-1]index to pull out desired elements, which are joined and output.

Sample Input:

this,is,a,test,string,containing,multiple
lines,of,string,with,numb3rs,and,w0rds

Sample Output:

this,a,string,containing,multiple
lines,string,numb3rs,and,w0rds

https://docs.raku.org


7