I have a text file containing 1000 lines in this format:
001122 abc def ghi
334455 xyz aaa bbb
667788 ccc ccc ddd
How can I convert it into this format using a Linux command by adding spaces to certain columns?
00 11 22 abc def ghi
33 44 55 xyz aaa bbb
66 77 88 ccc ccc ddd
Naively but straight forward:
$ sed 's/\(..\)\(..\)\(..\)/\1 \2 \3/' file
00 11 22 abc def ghi
33 44 55 xyz aaa bbb
66 77 88 ccc ccc ddd
That is, match and collect the three first groups of two characters on each line, and space them out by inserting spaces in the replacement string.
Fancy but requires thinking:
$ sed 's/../ &/3; s/../ &/2' file
00 11 22 abc def ghi
33 44 55 xyz aaa bbb
66 77 88 ccc ccc ddd
This first expression replaces the 3rd match of .. on each line with a space followed by whatever those .. matched. Then again, but for the 2nd match.
A simple sed command is all that is needed (change filename with the acutal file):
sed -E 's|([0-9]{2})([0-9]{2})([0-9]{2})[[:blank:]]*(.*)|\1 \2 \3 \4|g' filename
If you want to change the source file (filename) in place, pass in the -i option:
sed -i -E 's|([0-9]{2})([0-9]{2})([0-9]{2})[[:blank:]]*(.*)|\1 \2 \3 \4|g' filename
Explanation:
([0-9]{2}) matches groups of 2 digits 3 times
(.*) matches everything else which is all the letters
[[:blank:]]* matches space characters including tabs
\1 through \4 are matched groups
Note that this will only work with GNU sed. Almost all mainstream Linux distributions come with GNU Linux. If you are using macOS, your sed is BSD sed, unless your installed GNU sed available as gsed.
|. Although it does not make any difference in this instance, when you have strings like https://, you won't have to bother escaping https:\/\/ if you use |. - GMaster
-i without a backup file suffix is also GNU only AFAIK. - Ed Morton
sed groks -i with no option-argument. - Kusalananda
-i requires an argument with that sed" and I thought it was the default sed on MacOS which I think is BSD - Ed Morton
Using any awk in any shell on every UNIX box and letting you specify which column to change and independent of the characters in that column:
$ awk -v c=1 '{gsub(/../,"& ",$c); sub(/ $/,"",$c)}1' file
00 11 22 abc def ghi
33 44 55 xyz aaa bbb
66 77 88 ccc ccc ddd
$ awk -v c=2 '{gsub(/../,"& ",$c); sub(/ $/,"",$c)}1' file
001122 ab c def ghi
334455 xy z aaa bbb
667788 cc c ccc ddd
$ awk -v c=3 '{gsub(/../,"& ",$c); sub(/ $/,"",$c)}1' file
001122 abc de f ghi
334455 xyz aa a bbb
667788 ccc cc c ddd
A Generic version for any number/position of spaces in awk:
awk -v s='2,4' '{f=!split(s,a,",");for(i in a){r="^.{"a[i]+f++"}";gsub(r,"& ")}}1'
00 11 22 abc def ghi
⋮
A more powerful version, where other characters than space can be inserted:
spacers(){
awk -v s="$1" '{f=!split(s,a,/[^*0-9]*/);split(s,p,/[*0-9]*/);
for(i in a){if(""==b=a[i])continue;
r="^.{"(b!="*"?b+f++:length($0))"}";
gsub(r,"&"p[i+1])}} 1' $2;}
That way, you can do e.g.:
spacers '0|2 4 6|10@yahoo.com |* |' file
|00 11 22| abc@yahoo.com | def ghi |
which is great for creating org-mode tables and piping directly to clipboard.
Note: The shell-function also accepts data through STDIN.
(Earlier versions of this answer contained a generic awk-solution, that used sed for the final replace)
If the input data is exactly as depicted, GNU cut is an option. Note that the --output-delimiter has to be explicitly set to a space.
This makes for a very rigid solution unlike some of the other answers, lacking both the flexibility to deal with variable string length in the first field and the ability to designate an arbitrary field to operate on.
cut -c1-2,3-4,5- --output-delimiter=' ' <file
00 11 22 abc def ghi
33 44 55 xyz aaa bbb
66 77 88 ccc ccc ddd
Being completely lazy about typing here,
sed -E "s/([0-9]{2})/\1 /g; s/ +/ /g" file1
Put a space after every pair of digits and then reduce the multiple spaces to a singleton.
Or, perhaps even lazier
sed 's/./& /4;s/./& /2' file1
certain columnalwaysfirst columnor are you looking for a solution that lets you specific which column to space by it's number? - Ed Morton