Yeah, summed area tables are pretty horrible on precision. The precision of the output would, at best, be no better than the precision of the mantissa. To look at the precision, just consider that in the worst case, you're going to be subtracting two large numbers to get a small one.
So, if we want 8 bits of accuracy in the output, we're going to need to lose no more than 16 bits of accuracy in that subtraction. Thus, the sum of all numbers in the texture must be less than 2^16 times the dimmest region of the texture, for accurate representation in FP32.
Quite unfortunately, 256x256 = 2^16, so the average must be equal to the dimmest region for accurate representation of a 256x256 FP16 in a FP32 SAT. For each halving of the texture size, we gain a factor of four in allowed dynamic range (i.e. 128x128 allows 4:1 dynamic range).