Help: Numerical precision issues?

KimB

Legend
Hi, guys, I've been banging my head against a wall with this problem for about a week now, and though this doesn't really apply to this forum, I was hoping that those that browse this forum have the best chance of answering my question.

Anyway, here's the deal. I programmed a likelihood code, called a Blackwell-Rao estimator, for analyzing CMB data. It works great, as long as I only make use of the first few spherical harmonics. But it very quickly starts giving erroneous results as I include more l's. I need the code to work out to about l = 30, and right now it's only working properly up to roughly l=5-7 or so.

Here's the money shot of the essential code block:
Code:
  ldvector samplelikelihood(numsamples);
  for (ii = 0; ii < numsamples; ii++)
  {
    ldvector thislnlike(lmax-1);

    for (jj = 0; jj < lmax-1; jj++)
    {
      long double ll = lval.val[jj];
      thislnlike.val[jj] = (2*ll+1.0)/2.0*(log((long double)sigmal[ii].val[jj]/Cl.val[jj]) - sigmal[ii].val[jj]/Cl.val[jj])
	                   - log((long double)sigmal[ii].val[jj]);
    }
    //Total lnlikelihood for this sample:
    samplelikelihood.val[ii] = thislnlike.sum(VSORT);
  }
  //ln(likelihoods) calculated, now calculate optimal offset
  double maxlnlike = 10.0;
  double offset = -1e100;
  //Find maximum ln(likelihood) stored
  for (jj = 0; jj < lmax-1; jj++)
    if (offset < samplelikelihood.val[jj])
      offset = samplelikelihood.val[jj];
  //Set offset to a bit below this for maximal numerical stability
  offset -= maxlnlike;
  
  //Offset each ln(likelihood) and exponentiate
  for (jj = 0; jj < numsamples; jj++)
    samplelikelihood.val[jj] = exp(samplelikelihood.val[jj] - offset)/(long double)numsamples;
  
  //Sum likelihoods for each l to get likelihood for this sample
  likelihood = log(samplelikelihood.sum(VSORT)) + offset;
A few notes: ldvector is my long double-precision array container class. In the class member function sum() of ldvector, the argument is just a flag passed to the function that describes how the array is summed.

I'm attempting to implement the estimator as described in the following paper:
http://www.arxiv.org/abs/astro-ph/0411737

The equations I'm attempting to implement with the above code are equations 11 and 13.

Anyway, I'd be appreciative if anybody here could help me brainstorm for other possible ways in which this code might possibly be breaking down. Thanks much.
 
Chalnoth,
I really don't have time to look through this properly but (eliminating the typedefs for clarity) glancing at:
Code:
     (log(sigmal[ii].val[jj]/Cl.val[jj]) - etc...
you seem to be doing things like log(x/y) - log(z).

[EDIT: Hmmmm.. I might have lost count of the brackets!]

Why don't you transform that to log(x/(y*z))?


Anyway, as a rule of thumb, if you're losing precision, look at the factorizations and try a different one that minimises the differences between values in additions/subtractions.
 
Last edited by a moderator:
Sorry if this sounds dumb but I really do want to help you on this as it's important to me and moreso to you.

Have you tried using a Taylor series on the equation to get more accuracy, and have you increased the range over a period of time to ensure numerical stability?

PS: A little dig, it's seems that physicists don't make good software designers. ;)

EDIT:
Looking at the paper I got lost, however looking at the equations they are nothing I haven't tackled before.
If you are loosing precision then you have no choice but to do your calculations over a period of time using appropximation methods which gives you exact results within a small interval.
What you're asking for isn't much, however can you give me a specific formula you are loosing precision within so I can work out a numerically stable alternative which gives you what you're looking for?
 
One note: the C exp() and log() functions expect a 'double' argument, so your long-doubles will be downconverted to ordinary doubles. Check if your C compiler supports the functions expl() and logl() (long double versions of the exp and log functions).

Somehow I do not expect this to be the solution to your problem, though. It also puzzles me a bit that the text states that Eq.13 can be solved analytically (as opposed to numerically).
 
Simon F said:
Chalnoth,
I really don't have time to look through this properly but (eliminating the typedefs for clarity) glancing at:
Code:
     (log(sigmal[ii].val[jj]/Cl.val[jj]) - etc...
you seem to be doing things like log(x/y) - log(z).

[EDIT: Hmmmm.. I might have lost count of the brackets!]

Why don't you transform that to log(x/(y*z))?
Right, so that's already been done. Notice that one logrithm is multiplied by a number, and the other isn't, so this would require taking a number to a power within the logarithm to put the terms under the same log. I don't think that would help things.

But in response to your suggested, I decided to try to split up the divisions within the logarithms. There was no change.

Anyway, as a rule of thumb, if you're losing precision, look at the factorizations and try a different one that minimises the differences between values in additions/subtractions.
Well, this is why I implemented the sum() class member function: with the VSORT flag enabled, it first sorts the array elements from smallest to largest. In this way, in a situation where there is a lot of dynamic range, I figure I'll get the best numerical precision by summing the array from the smallest array element to the largest (the other option that I have is for when there isn't a lot of dynamic range: VBISECT, where the array is continually bisected and always summed in pairs).

I just tried separating the three terms in the "thislnlike.val[jj]=" equation into three separate arrays to be summed individually, and there was no change to the output.
 
arjan de lumens said:
One note: the C exp() and log() functions expect a 'double' argument, so your long-doubles will be downconverted to ordinary doubles. Check if your C compiler supports the functions expl() and logl() (long double versions of the exp and log functions).
Actually, I tested this, and found that by making the arguments long doubles, I could get an extra 10^4 precision. I didn't expect this to be the case myself, and considered attempting to write my own higher-precision log/exp functions, but fortunately tested it first, with a code that was something like this:
Code:
for (n = 0; n < N; n++)
{
   cout << log(exp(1+10^-n)*exp(-1))
}
With double precision in the argument, the lowest output reached was 10^-15 (though it was off by 10%). With long double precision, the lowest output was 10^-19 (again, off by about 10%).

Somehow I do not expect this to be the solution to your problem, though. It also puzzles me a bit that the text states that Eq.13 can be solved analytically (as opposed to numerically).
Well, by analytically I think it just means that in the context of the equation, we know C_l and sigma_l exactly. No inference of any of the values is required.
 
Chalnoth said:
Actually, I tested this, and found that by making the arguments long doubles, I could get an extra 10^4 precision. I didn't expect this to be the case myself, and considered attempting to write my own higher-precision log/exp functions, but fortunately tested it first, with a code that was something like this:
Code:
for (n = 0; n < N; n++)
{
   cout << log(exp(1+10^-n)*exp(-1))
}
With double precision in the argument, the lowest output reached was 10^-15 (though it was off by 10%). With long double precision, the lowest output was 10^-19 (again, off by about 10%).
In such a case, you are effectively placing yourself at the mercy of fast-math+register-allocation optimizations in your compiler (in particular, in the x86 FPU, data computations are commonly done using the 80-bit 'long double' type for as long as you only do register-to-register operations, and then shaved down to the ordinary 32/64-bit float/double types when they are written to memory - unless you specify everything as 'long double'. This sometimes leads to surprising loss of precision and invariance the moment the compiler runs out of registers.)
 
What sort of number ranges are you dealing with? I needed MSE error estimates that always sum to exactly 0 (for my image compression) and used fixed point math (but with the fractional part adjusted to always make maximal use of the used data-type (int / int64)).
It's a bit odd when you do
a_1 + a_2 + ... + a_n = b, but then subtract all the a's from b (in a different order), and you don't end up with 0. But that floating point math for you... ;)
 
Chalnoth said:
Well, this is why I implemented the sum() class member function: with the VSORT flag enabled, it first sorts the array elements from smallest to largest. In this way, in a situation where there is a lot of dynamic range, I figure I'll get the best numerical precision by summing the array from the smallest array element to the largest (the other option that I have is for when there isn't a lot of dynamic range: VBISECT, where the array is continually bisected and always summed in pairs).
That sounds sensible, but I was thinking more of the typical problems of things like (a+b)*(a-b) being better than (a^2 - b^2)....not that I actually noticed any in the code but then I didn't look too hard <shrug>.

Also, it's important to remember the old adage
Doing arithmetic with floating-point numbers is like moving piles of sand. Each time you do it, you lose a little sand and pick up a little dirt

i.e. don't do something multiple times if you can do it once : What I have noticed is:
Code:
 //Offset each ln(likelihood) and exponentiate
  for (jj = 0; jj < numsamples; jj++)
    samplelikelihood.val[jj] = exp(samplelikelihood.val[jj] - offset)/(long double)numsamples;
  
  //Sum likelihoods for each l to get likelihood for this sample
  likelihood = log(samplelikelihood.sum(VSORT)) + offset;
Your multiple divisions by "(long double)numsamples" will introduce a number of small errors (unless numsamples is a power of 2 ;) ) - why don't you defer your scaling until after the summation?

Even if it doesn't help your accuracy, it'll make it faster ;-)
 
Last edited by a moderator:
Colourless said:
My advice, make sure your compiler also isn't trying to be 'smart' and using SSE2 instead of x87
Hmm, looked at this, and the default behavior for GCC is supposed to be to not make use of SSE2. I added the option "-mfpmath=387" just in case, as described here, but it made no difference.

Edit:
Hmm, just looked back, and realized that -mfpmath=sse is the default option for x86_64, but since the above change made to difference, oh, well...
 
[maven] said:
What sort of number ranges are you dealing with?
Well, in the first array sum, the dynamic range is only about 10^2 or so (since this is over l's, this is going to be about 30 array elements). In the second array sum, it's about 10^80 (this is over about 4000 array elements).

The dynamic range between the three terms in the sum "thislnlike.val[jj]=" is likewise approximately 10^2.

Now, since the problem is still there no matter how many Gibbs samples I make use of, I don't think the problem has anything to do explicitly with that array summation (each choice for the sigma_l's is one Gibbs sample, denoted by the array index ii in my code).
 
Does it really need to be fast? (ie. is throwing your hands in the air and simply busting out some bigfloat class an option?)

Also you could try some other tricks with the sorted summing, for instance keep a running sum and a total sum and add the running sum to the total sum if it's larger. Or even look for the two closest numbers each time and only sum those.
 
Last edited by a moderator:
MfA said:
Does it really need to be fast? (ie. is throwing your hands in the air and simply busting out some bigfloat class an option?)
Well, the limitation is in generating the C_l's, not doing this likelihood calculation, so a few times slowdown might be okay. Know of a nice, easy to use bigfloat class?

Also you could try some other tricks with the sorted summing, for instance keep a running sum and a total sum and add the running sum to the total sum if it's larger. Or even look for the two closest numbers each time and only sum those.
Well, I don't think it's the sums, though. I mean, there doesn't appear to be any difference between sorting and not sorting the array before a simple sequential sum. If I'm not barking up the wrong tree, and it is numerical precision, I think it has something to do with the interaction between the log/exp functions, which generate the monstrous dynamic range, and the sum over Gibbs samples.

Anyway, I did find one error in the above code: in finding the maximum ln(likelihood) to use as an offset, I used the wrong maximum value for the counter variable: it was only checking the first ~30 values out of nearly 4000! Fixing that didn't help much, though :(
 
What's the deal with the lval.val[] array? As far as I can tell from the paper and your code, the only thing the lookup into this array does is to convert an integer (the 'jj' loop counter) to a long double; why not just a cast?
 
arjan de lumens said:
What's the deal with the lval.val[] array? As far as I can tell from the paper and your code, the only thing the lookup into this array does is to convert an integer (the 'jj' loop counter) to a long double; why not just a cast?
Well, I implemented that for binning the Cl's, mistakenly. It really should go, I just haven't gotten around to removing it.
 
Chalnoth said:
Well, the limitation is in generating the C_l's, not doing this likelihood calculation, so a few times slowdown might be okay. Know of a nice, easy to use bigfloat class?
Sorry, never used them ... but google should be helpfull enough ;)
 
MfA said:
Sorry, never used them ... but google should be helpfull enough ;)
Yeah, actually I already found one myself :) Took a little bit to get it compiled properly and stuff, but anyway. It's called LiDIA, and implements high-precision math in a class-based way, with lots of operator overloading and stuff. At the lowest precision level on my x86_64 computer it's precise to 10^31. But it's more than just a few times slower than normal fp math, unfortunately.

Typically, my test code takes about 10-20 seconds to run. With bigfloats used in the codeblock I posted above, the code wasn't yet halfway done after about three hours, so making use of them is out :( I'll let the runs I started keep going, though, as a last gasp check for numerical precision issues.

Right now I'm going to go back and make certain that I'm properly interpretting the values of the sigma_l's that are in the data file.
 
Woot! I solved most of my problem :)

Numerical precision is still potentially an issue, but at least now I'm getting reasonable results from the likelihood estimator! It looks like I had a problem where I was misinterpretting the input file that I used for generating the Cl's.

But I'm still getting some migration of the results depending upon how many l's I use in the calculation, so numerical precision isn't out of the boat yet (hopefully the bigfloat runs finish sometime this year....).
 
Chalnoth, if you're using C++ via #include <cmath> then the standard math functions are overloaded for float/double/long double (to C's logl, logf, expl, expf, etc... functions). So by casting the input parameter to long double you cause the more precise function to get called.

I haven't used the library you mention, but it looks like something that does way much more than just give you some extra precision. If you don't need many thousands of bits of accuracy, you should try Bailey's double-double and quad-double libraries, available from this page :

high precision software directory
qd.tar.gz

These give you mantissa precisions of ~100 and ~200 bits respectively (by using Dekker's error free transformations to manipulate unevaluated sums of 2 and 4 doubles).

Alternatively, if you have access to a Solaris SPARC machine, they have software libraries for IEEE quad precision arithmetic (113 mantissa bits), although I'm not sure what the state of their special function implementations are.

Regards,
Serge
 
Back
Top