Origin of Quake3's Fast InvSqrt()

Discussion in 'Beyond3D Articles' started by Geo, Dec 1, 2006.

  1. TheodorikOBroin

    Newcomer

    Joined:
    Dec 4, 2006
    Messages:
    1
    Likes Received:
    0
    Location:
    Dublin, Ireland
    Brill Post

    Cheers for the post Rys,

    Was just surfing by and came across this little bundle of joy. I'm a true geek, got my binary watch 'n all. Just reading about the code, and its sheer power brought back memories of when John Carmack used to (and I'm sure he still does) give out big explanations on how he did such and such a feature in any of his games to date (with iD Software).

    Can't wait to find out who the original author is too.

    Unfortunately, I can't code, but I can script. But I also love these simple things and definitely would love to get my mitts on Assembler (still plenty of people out there using it).

    Kudos mate,

    :D

    db
     
  2. benweiss

    Newcomer

    Joined:
    Dec 4, 2006
    Messages:
    1
    Likes Received:
    0
    This Looks Familiar

    Whoops, I was out of the country for the past few days and missed the Slashdotfest... But for those still reading this thread, I independently derived this same function in the mid-nineties (~1995?) at MetaCreations (then MetaTools), using the constant 0x5F3997BB instead of 0x5F3759DF. I had run some experiments to determine the "best" value for this constant, but I believe I optimized for best average case instead of best worst case; I'm not completely sure how I derived that exact number.

    Anyway, there's a book called "Hacker's Delight" that's full of stuff like this, though it doesn't include some of the more fun pixel-munging hacks I've seen. (Blending two 16-bit RGB pixels with arbitrary alpha using a single multiply, for instance.) Cool to see this stuff pop up once in a while!

    Ben
     
  3. Magnum PI

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,054
    Likes Received:
    12
    http://www.jbenjamin.org/docs/asm-sqroot.html

    See the section:
    "A non-table based accuracy scalable floating point approach."

    Could the same idea be used here ?

    Seems to be an ASM version from 1997 from [FONT=Arial, Geneva] Vesa Karvonen using a constant calculated by [/FONT][FONT=Arial, Geneva]James Van Buskirk[/FONT][FONT=Arial, Geneva][/FONT]
    [FONT=Arial, Geneva][/FONT]
    see the usenet thread here:
    http://groups.google.com/group/comp.lang.asm.x86/browse_frm/thread/d8e56f01321777d6/edd6034d4417cb13
     
  4. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    FWIW, the first time I saw an implementation of an inverse square root (that used Newton-Rhapson) was in the (1990) manual for the Texas Instruments C30/31 DSP .

    We used the C31 (which could do up to 66MFLOPS!) in the PowerVR prototype as a T&L processor since x86 floating point wasn't very good back then.
     
  5. captainfreedom

    Newcomer

    Joined:
    Dec 3, 2006
    Messages:
    5
    Likes Received:
    0
    I develop games for mobile phones, which have no floating point processor. Can anyone tell me it it would be possible to develop a version of this algorithm using only 32-bit integers?
    Many thanks for any tips you can give me ;)
     
  6. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    If you follow the link provided by Magnum, you'll find a number of integer square root algorithms.

    UPDATE: Hmm. I think I might be thinking of a different link. I'm sure I've seen a site with a list of algorithms for integer square root which were quite "interesting".
     
  7. Rufus

    Newcomer

    Joined:
    Oct 25, 2006
    Messages:
    246
    Likes Received:
    60
    You almost certainly don't want to use FP if you don't have native float. Using FP would mean *every* single FP instruction gets software emulation. You really want to be using Fixed Point integers. A quick google will give you quite a few references, most of them for the 386 (which also lacked an FPU). It's amazing how everything old is new again :grin: .
     
  8. Magnum PI

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,054
    Likes Received:
    12
    if you use java:

    http://atoms.alife.co.uk/sqrt/index.html
     
  9. captainfreedom

    Newcomer

    Joined:
    Dec 3, 2006
    Messages:
    5
    Likes Received:
    0
    Thanks for the tips, but I was looking for a integer version of that function (ie. inverse square root, not square root), but it does not seem to be possible
     
  10. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Are you using integers to simulate fixed point? Otherwise you're not going to get an inverse square root to be non-zero.

    Tables and iterations seem to be the best way, though. It would help if you gave some specifics, like what range of numbers you have, how much accuracy you need, what the application is, etc.
     
  11. l_mammel

    Newcomer

    Joined:
    Dec 6, 2006
    Messages:
    5
    Likes Received:
    1
    Newton-Raphson vs. Taylor Series

    Because the Newton-Raphson step is not iterated, it can be interpreted as the first term in a Taylor series. Call the "magic" estimate a^ -1/2 and it is the correct value for an input a = 1/ ( a^ -1/2 ) ^ 2. Then with f(x) = x^ -1/2, the approximation f(x) = f(a) + (x-a) f'(a) yields the expression in InvSqrt() .

    Taking the next term, and letting y0 = a^ -1/2 = (float) 0x5f3759df -(i<<1)

    f(a) + (x-a)f'(a) + (x-a)^2/2 f''(a)

    yields

    Y = x*y0*y0

    value = y0*( 1.875 - ( 1.25 - 0.375 * Y ) * Y )

    This involves only one more mult than the original, noting we don't take 0.5*x . Here's two lines of output showing inputs of "2" and "3" and the hex floating point format of the input value, 1/sqrt(x), InvSqrt(x), Inv2Sqrt(x) and the unscaled error term for InvSqrt and Inv2Sqrt.

    $ invsqrt
    2
    40000000 3f3504f3 3f34f95e 3f350533 -0.0001767278 0.0000038147
    3
    40400000 3f13cd3a 3f13ac3c 3f13ce90 -0.0005034208 0.0000203848


    Well, it was fun!
     
  12. Rufus

    Newcomer

    Joined:
    Oct 25, 2006
    Messages:
    246
    Likes Received:
    60
    "Tomorrow"?
     
  13. Rys

    Rys PowerVR
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,156
    Likes Received:
    1,433
    Location:
    Beyond3D HQ
    Still waiting for some permission to post a couple of things related to the code, sorry for the delay :smile:
     
  14. captainfreedom

    Newcomer

    Joined:
    Dec 3, 2006
    Messages:
    5
    Likes Received:
    0
    I'm pretty sure it's not possible to implement it using only integers because it uses some trick to do with the floating point representation of the number
     
  15. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    It's not essential - the Newton Rhapson method does take advantage of the floating-point form's exponent to produce an initial guess, but the same can be done with fixed-point.
     
  16. captainfreedom

    Newcomer

    Joined:
    Dec 3, 2006
    Messages:
    5
    Likes Received:
    0
    Well, if you give me a hint where to start I will try it out. Unfortunately, the "Newton Rhapson" thingy goes way over my head
     
  17. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    The Newton-Rhapson method just takes one estimate of the result and uses that to produce a better estimate. Each iteration generally doubles the accuracy.

    In the case of 1/sqrt(x) for each iteration you just need to compute...
    Code:
    estimate := estimate * (1.5 - (x/2) * estimate * estimate);
    
    using your fixed-point mult/add functions.


    Now you need an initial estimate. The better this is, the fewer iterations you'll need. To get you started though, find the most significant bit in your fixed point number (relative to the fixed binary point). I.E. if your input X is about 2^2M, then set the estimate to 2^-M. (e.g if X = 300, then it's close to 2^8 so try estimate = 2^-4 = 1/16).

    If you really want to be lazy, you could probably just set estimate = 1 but it'll take a lot more iterations to become accurate!

    If you want to improve things, then use a small LUT that uses the first few significant bits of the number. It will cut down the number of iterations you need.
     
  18. l_mammel

    Newcomer

    Joined:
    Dec 6, 2006
    Messages:
    5
    Likes Received:
    1
    Of course, you only need to calculate over the interval [1,4) since every result is trivially 2^-M times a result in this interval. But then I thought, hmmm, why not reduce this to the interval [1,2) since results for inputs in [2,4) are given by f( x/2 ) / sqrt(2) . And well, try it one more time, we can find results in [sqrt(2),2) by f(x/sqrt(2))/ sqrt(sqrt(2)) and x/sqrt(2) falls in the interval [1,sqrt(2)) ... note the divides are just multiplications by constants.

    Of course, you could keep going with the interval reduction. I'm not sure where diminishing returns sets in. It's easy to implement as drop-through code which reduces the input and accumulates a single multiplier for the return value. I carried it up to [1,sqrt(sqrt(sqrt(2)))) and easily get 13 decimals with a single constant initial guess and 2 applications of the second order formula I posted above. Here is the "bc" code, which I use on cygwin.

    c(x) gives the error term, of course. Input to i(x) is assumed in the range [1,4)

    scale=20
    define i(x){
    f=1
    if( x >= 2 ){ f=1/sqrt(2); x=x/2 ; }
    if( x >= sqrt(2) ){ f=f/sqrt(sqrt(2)); x=x/sqrt(2); }
    if( x >= sqrt(sqrt(2)) ){ f=f/sqrt(sqrt(sqrt(2))); x=x/sqrt(sqrt(2)); }
    if( x >= sqrt(sqrt(sqrt(2))) ){
    f=f/sqrt(sqrt(sqrt(sqrt(2)))); x=x/sqrt(sqrt(sqrt(2))); }

    w = x/1.045

    y = ( (1.875 - (1.25 - 0.375*w)*w)/sqrt(1.045) )
    w = x*y*y
    return ( f * y*(1.875 - (1.25 - 0.375*w)*w) )
    }
    define c(x){
    return( i(x) - 1/sqrt(x) )
    }
     
  19. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    l_mammel:
    The reduction to the domain of the function to [1..4) is nice because the scaling to that domain and back to the final result only requires multiplication by powers of 2. You therefore don't lose any relative precision.

    If you start using a lot of arbitrary scaling factors (each of which has been rounded) you are going to increase the amount of error in both the input and output results <shrug>.

    Otherwise, it's a nice idea :)
     
  20. l_mammel

    Newcomer

    Joined:
    Dec 6, 2006
    Messages:
    5
    Likes Received:
    1
    Since the algorithm seemed to be good to 13 decimal places, I tried specifying the constants to 10 places and setting the bc scale to 10. Scanning through [1,4) I found +/- .0000000001 errors in about equal numbers with 0, and a sprinkling of +/- .0000000002

    Anyway, any such concern as this may raise is easily remedied by applying the scale after the first iteration, then running the second iteration ( and any subsequent iterations ) with the rescaled value and the original input, so the whole scaling thing can be regarded as a super-good estimator, if you like.

    So, I don't see a downside here.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...