Sqrt() (Not actually a 3D question)

Discussion in 'Rendering Technology and APIs' started by NoteMe, Feb 14, 2004.

  1. ector

    Newcomer

    Joined:
    Nov 3, 2002
    Messages:
    111
    Likes Received:
    2
    Location:
    Sweden
    What are you doing that causes Sqrt to be a bottleneck? Are you sure it is?
    I can't think of any kind of game that would need to do so many sqrts...
     
  2. NoteMe

    Newcomer

    Joined:
    Feb 14, 2004
    Messages:
    59
    Likes Received:
    0

    I did it both ways to test....
     
  3. NoteMe

    Newcomer

    Joined:
    Feb 14, 2004
    Messages:
    59
    Likes Received:
    0

    You "non all post reader"..:D..just kidding with you...but I will quote my self for you...)

     
  4. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    Code:
    inline float rsqrtf(float v){
        float v_half = v * 0.5f;
        long i = *(long *) &v;
        i = 0x5f3759df - (i >> 1);
        v = *(float *) &i;
        return v * (1.5f - v_half * v * v);
    }
    
    The 'correct' number to use in the fourth line is 5F400000h. This will result in rsqrtf(1) == 1. There's lots of interesting information about this approximation (and others) in this thread at flipCode: arcus cosinus. If you need a better approximation, one Newton-Rhapson iteration makes is quite precise.
    Unless x = 0... A little while ago my application's performance was suddenly halved. After a lot of searching, I found out that 0 * Inf = NaN. And doing further computations with NaN is extremely slow. Here's a fool-proof SSE implementation:
    Code:
    inline float fsqrt(float x)
    {
        __asm
        {
            rsqrtss xmm0, x
            rcpss xmm0, xmm0
            movss x, xmm0
        }
    
        return x;
    }
    
    Although rcpss is an approximation instruction as well, I found out that it doesn't noticably reduce total precision of the function.
     
  5. Scali

    Regular

    Joined:
    Nov 19, 2003
    Messages:
    2,127
    Likes Received:
    0
    An alternative fool-proof implementation would be something like:

    Code:
    inline float fsqrt(float x)
    {
        static float zero[4] = { 0, 0, 0, 0 };
    
        __asm
        {
            movaps xmm0, x
            movaps xmm1, zero
            cmpsseq xmm1, xmm0
            addss xmm0, xmm1
            rsqrtss xmm0, xmm0
            mulss xmm0, x
            movss x, xmm0
        }
    
        return x;
    } 
    
    (Could be mistakes here, I'm not on a PC with Intel docs and/or SSE/compiler to test it, so just from the top of my head).
    Anyway, the basic idea is to add 1 (result from the compare operation) to x if x == 0, so that you will not get 1/0, but 1/1.
    The mul afterwards will make sure that the result is still 0 in this case (0*1/1 == 0).
    It should be slightly more accurate than the 1/(1/sqrt), and speedwise it should be okay, since the rcp operation is relatively slow compared to cmp, add and mul. On some CPUs it may actually be faster, I don't know.
     
  6. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    How about:
    Code:
    inline float fsqrt(float x) 
    {
        static int big = 7F7FFFFFh;
    
        __asm 
        { 
            rsqrtss xmm0, x
            minss xmm0, [big]
            mulss xmm0, x
            movss x, xmm0 
        } 
    
        return x; 
    } 
    
    if every bit of precision is welcome...
     
  7. Scali

    Regular

    Joined:
    Nov 19, 2003
    Messages:
    2,127
    Likes Received:
    0
    Ah, that's even better. I overlooked that instruction.
    This is definitely preferred over the rcp then. Probably more precise AND faster.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...