i have scenario
where i am having issue because UNIX sum up to 8 to 9 scales gives me incorrect sum how to fix it ?
my command used
awk -F '"?\\|"?' '{T+=$(2)} END {printf "%.2f\n",T}' demofile.txt
This the link to previous question posted Why is there a difference between these two sum commands? [1]
Any better way to deal with it so that i can get accurate sum
by using awk or bc or dc
Demo data
1|"12.8"|demo1
2|"13.5678341234567"|demo1
3|"14.578"|demo1
4|"15.58"|demo1
5|"16.56784"|demo1
6|"17.578"|demo1
7|"18.678"|demo1
8|"19.568890123"|demo1
9|"20.588792"|demo1
You don't say the file size (i.e. how many rows you are adding). The download claimed 18.3MB before the site showed up as "Dangerous" and "Fraud Alert". If the average row length is 18, that's a million floats being added, and we don't know the span of values. The total you show in the question is 13.2 digits, so the average value per line is around 7 digits, with unknown variability.
If you keep adding values like 27.865326635297 to a running total that is getting close to 13 whole-number digits, then only the 27.87 (rounded) part is going to make it into the total because the .00532... is outside the 15 or 16 digits result range. Sometimes those errors cancel out, sometimes they don't: Monte Carlo arithmetic.
Check output from awk --version. If it mentions MPFR and MP, your awk is compiled with extended-precision math. You just add -M 113 to your awk command. That's the mantissa length that gets you quadruple precision real arithmetic -- 33 digit accuracy.
www.gnu.org/software/gawk/manual/gawk.html#Arbitrary-Precision-Arithmetic
gawk? - Kamil Maciorowski
This is a method based on the dc command (assuming this has adequate accuracy compiled in). It dresses up the second column with dc commands, and works to 60-digit (200-bit) precision.
This runs on the 10 data lines provided previously, plus a couple of extreme values. It shows intermediate sums: to remove these, remove the 'p' just before the \n where awk emits $2.
Paul--) cat awkToDc
#! /bin/bash
function Data { cat <<'EOF'
1|"12.8"|demo1
2|"13.5678341234567"|demo1
3|"14.578"|demo1
4|"15.58"|demo1
5|"16.56784"|demo1
6|"17.578"|demo1
7|"18.678"|demo1
8|"19.568890123"|demo1
9|"20.588792"|demo1
10|"55555555555555555555000000000000"|demo1
11|"20.588792"|demo1
12|"0.000000000000000000077777777777"|demo1
EOF
}
function dataDC {
AWK='
BEGIN { FS = "\042"; printf ("60 k 0\n"); }
{ printf ("%s + p\n", $2); }
END { printf ("p q\n"); }
'
awk "${AWK}"
}
Clarification on the emitted dc commands (which are in reverse polish notation):
'60 k' sets the arithmetic precision, and '0' initialises the total.
' +' add the value from $2 to the total. 'p' prints the running total for illustration.
'p q' prints the final total, and quits.
Data | dataDC | dc
Paul--) ./awkToDc
12.8
26.3678341234567
40.9458341234567
56.5258341234567
73.0936741234567
90.6716741234567
109.3496741234567
128.9185642464567
149.5073562464567
55555555555555555555000000000149.5073562464567
55555555555555555555000000000170.0961482464567
55555555555555555555000000000170.096148246456700000077777777777
55555555555555555555000000000170.096148246456700000077777777777
Paul--)
Now have four tested techniques (against your test file of 722277 rows), with accuracy ratings.
Using gawk with precision 200-bits, and dc with precision 60-digits, both agree on the same 33-digit total, which I suspect is exact.
25396577843.7560139069641121618832
Using gawk in standard IEEE accuracy (should be 15 or 16 digits) only agrees with the first 12 of those digits. I assume a million additions erode the accuracy as the exponents become more disjoint.
25396577843.7769622802734375
I found a recursive addition algorithm in standard awk too. This initially adds values according to the last 5 digits of NR, to make 100,000 subtotals. Then it totals those, reducing the number of digits to 4, 3, 2, 1, and finally a single total. Each number therefore gets only 60 additions. That result agrees with the first 16 digits of the high-precision ones, which is as good as could be expected.
25396577843.756011962890625
Check out Kahan summation [1], it tries to keep track of the rounding error and compensates. A must for such huge sums.
[1] https://en.wikipedia.org/wiki/Kahan_summation_algorithmWith cvstool and bc:
$ csvtool -t '|' col 2 A | paste -sd + - | bc
149.5073562464567
yes 0.1 | head -n 10000000 | awk '{sum+=$1}END{printf "%.5f",sum}'for demonstration instead of a file. - pLumo