Dear all I have a big data file lets say file.dat, it contains two columns
e.g file.dat (showing few rows)
0.0000 -23.4334
0.0289 -23.4760
0.0578 -23.5187
0.0867 -23.5616
0.1157 -23.6045
0.1446 -23.6473
0.1735 -23.6900
0.2024 -23.7324
0.2313 -23.7745
0.2602 -23.8162
0.2892 -23.8574
0.3181 -23.8980
0.3470 -23.9379
0.3759 -23.9772
0.4048 -24.0156
0.4337 -24.0532
0.4627 -24.0898
0.4916 -24.1254
note: data file has a blank line at the end of the file
I want to find/extract the maximum and minimum from both the column e.g column-1
max - 0.4916
min - 0.0000
similarly column-2
max - -23.4334
min - -24.1254
For Column-1
awk 'BEGIN{min=9}{for(i=1;i<=1;i++){min=(min<$i)?min:$i}print min;exit}' file.dat
0.0000
cat file.dat | awk '{if ($1 > max) max=$1}END{print max}'
0.4916
for column-2
awk 'BEGIN{min=9}{for(i=2;i<=2;i++){min=(min<$i)?min:$i}print min;exit}' file.dat
-23.4334
cat file.dat | awk '{if ($2 > max) max=$2}END{print max}'
**no output showing**
Please help me to find the min and max value from column-2 note: data file has a blank line at the end of the file
ACCEPTED]
The issue in your code,
awk 'BEGIN{min=9}{for(i=2;i<=2;i++){min=(min<$i)?min:$i}print min;exit}' file.dat
... is that you immediately exit after processing the first line of input. Your middle block there need to be triggered for every line. Then, in an END block, you can print the values that you have found. You do this in another code snippet:
awk '{if ($1 > max) max=$1}END{print max}'
Another issue is that you initialize min with a magic number (9 in the first code that I quoted, and 0 in the second piece; variables that are not explicitly initialized has the value 0 if you use them in calculations). If this magic number does not fall within the range of numbers in the actual data, then the calculated min and/or max values will be wrong. It is better to initialize both min and max to some value found in the data.
To keep track of both min and max values, you need two variables, and both of these needs to be checked against the data in the file for every line, to see whether they need updating.
As awk supports arrays, it would be natural to use arrays for min and max, with one array element per column. This is what I have done in the code below.
Generalized to any number of columns:
NF == 0 {
# Skip any line that does not have data
next
}
!initialized {
# Initialize the max and min for each column from the
# data on the first line of input that has data.
# Then immediately skip to next line.
nf = NF
for (i = 1; i <= nf; ++i)
max[i] = min[i] = $i
initialized = 1
next
}
{
# Loop over the columns to see if the max and/or min
# values need updating.
for (i = 1; i <= nf; ++i) {
if (max[i] < $i) max[i] = $i
if (min[i] > $i) min[i] = $i
}
}
END {
# Output max and min values for each column.
for (i = 1; i <= nf; ++i)
printf("Column %d: min=%s, max=%s\n", i, min[i], max[i])
}
Given this script and the data in the question:
$ awk -f script.awk file
Column 1: min=0.0000, max=0.4916
Column 2: min=-24.1254, max=-23.4334
The condition NF == 0 for the first block (which is executed for all lines) is to ensure that we skip blank lines. The test means "if there are zero fields (columns) of data on this line". The variable initialized will be zero from the start (logically false), but will be set to one (logically true) as soon as the first line that has data is read.
The nf variable is initialized to NF (the number of fields) on the line that we initialize the min and max values from. This is so that the output in the END block works even if the last line has zero fields.
Actually, you can combine all instructions into one awk program:
awk 'NR==1{min1=max1=$1;min2=max2=$2}\
NR>1 {if ($1<min1) {min1=$1} else if ($1>max1) {max1=$1};\
if ($2<min2) {min2=$2} else if ($2>max2) {max2=$2}; }\
END{printf("Column1 min: %f\nColumn1 max: %f\nColumn2 min: %f\nColumn2 max:%f\n",min1,max1,min2,max2)}' file.dat
This will initialize the minimum and maximum values for both columns with the respective values of the first row (rule with condition NR==1), and then scan the successive rows to see if the values are larger than the current maximum/smaller than the current minimum, respectively (rule with condition NR>1).
At the end of file (rule with condition END), it prints the result.
Notice that this assumes there are no empty lines. If there are, you have to replace the NR>1 condition with NR>1 && NF>0. If there can be empty lines before the first one, use
awk '!init && NF>0 {init=1; min1=max1=$1; min2=max2=$2} \
init==1 && NF>0 {if ($1<min1) {min1=$1} else if ($1>max1) {max1=$1};\
if ($2<min2) {min2=$2} else if ($2>max2) {max2=$2}; }\
END{printf("Column1 min: %f\nColumn1 max: %f\nColumn2 min: %f\nColumn2 max:%f\n",min1,max1,min2,max2)}' file.dat
This will use a variable init to check if a non-empty line was already found, and use the contents of the first non-empty line to pre-set the current maximum/minimum for both columns. Only if init is set (after this initialization) are (non-empty) lines considered for entering that statistic.
As a general remark, you never need to cat a file and pipe the result into awk.
Using
datamash
[1] and printf:
for f in 1 2 ; do printf 'Column #%s\nmax - %s\nmin - %s\n\n' $f \
$(datamash -W max $f min $f < file.dat); done
...or without a loop:
printf 'Column #%s\nmax - %s\nmin - %s\n\n' \
$(datamash -W max 1 min 1 max 2 min 2 < file.dat |
tr -s '\t' '\n' | paste - - | nl)
Output of either:
Column #1
max - 0.4916
min - 0
Column #2
max - -23.4334
min - -24.1254
[1] https://savannah.gnu.org/projects/datamash/for loop isn't really necessary. It's more efficient to call datamash max 1 min 1 max 2 min 2 once, but the printf formatting would be gnarlier then... - agc
Try this,
awk '{if (max == "") max=$2 ; else if ($2 > max) max=$2}END{print max}' file
awk '{if (min == "") min=$2 ; else if ($2 < min) min=$2}END{print min}' file
if: awk '{if (NF>1) {if (min == "") min=$2 ; else if ($2 < min) min=$2}}END{print min}' file - Cyrus
This for col 1 ( It calculate avg max and min )
sort -n -k 1 file |awk '{SUM+=$1 ; if ( NR == 1) MIN=$1} END{print "Average - "SUM/NR, "Min time - "MIN,"Max Time - "$1}'
this for col 2
sort -n -k 2 file |awk '{SUM+=$1 ; if ( NR == 1) MIN=$1} END{print "Average - "SUM/NR, "Min time - "MIN,"Max Time - "$1}'
Issues on your code:
exit until all the input has been processed. In fact, an exit is not needed.min > $1 will be false as min doesn't have a value yet (so it defaults to 0 numerically).A generic solution for any number of fields (within reason) at any line (the number of fields need not be constant) which only assumes that an empty field contains a null ("") and that accepts all the values that awk accepts (usually strings are converted to 0) is this:
awk '
{
if(nf<NF){nf=NF}; # find the max number of fields
# to print at the end
for(i=1;i<=NF;i++){
f=$i+0; # convert each field to a number
# Either initialize (if empty)
# or capture max and min.
if(max[i]==""||max[i]<f){ max[i]=f }
if(min[i]==""||min[i]>f){ min[i]=f }
}
}
END{
for(i=1;i<=nf;i++){print i,min[i],max[i]}
}' file
On this short file (an example):
0.1735 -23.6900
0.2024 -23.7324
0.2313 -23.7745
0.2602 -23.8162 23 -12 PREC
0.2892 -23.8574 46 -23
0.3181 -23.8980
The output will be:
1 0.1735 0.3181
2 -23.898 -23.69
3 23 46
4 -23 -12
5 0 0
For your file it will print:
1 0 0.4916
2 -24.1254 -23.4334
NF > nf { nf = NF }for that if you wanted. - KusalanandaNF, ok. I see, well that's interesting. I'll leave my answer as it is i think as there's nothing wrong with it. It's valid for any number of columns and it covers the case in the question. - Kusalananda