Crippled DP performance holds back gamers or not?

punchinthejunk · Sep 7, 2014

What are some ways crippled DP could have been bad for gamers?

Bonus question (examples would be helpful if applicable):
Can shader precision directly effect the accuracy of operations done by the ROPs and z/stencil units?

Thanks in advance

Alexko · Sep 8, 2014

Double precision is not used in games. I've heard it may be of use in flight simulators with very large environments, but in practice I don't think it's ever done, since most GPUs have little to no support for DP.

sebbbi · Sep 8, 2014

Game developers prefer maximum performance. 64 bit floats are 4x slower on GCN (and much slower on consumer NVIDIA cards). We often use 16 bit floats and 16/10/8 bit normalized values to store data as tightly as possible. 64 bit float processing isn't important with the current view distances (unless your game has interplanetary scale).

punchinthejunk · Sep 8, 2014

Good answers so far

More questions though (I don't expect all of them to be answered and it is fine if they aren't, but I really think games released over the next 3-5 would and could be designed to look a lot better if DP performance was faster) :

Wouldn't more games have interplanetary scale if DP were faster? Didn't some pre-DX9 games use interplanetary scale or something similar? If not, then why not? Is interplanetary scale the same thing as eye space linearity? If not, then what is the difference? Also, why not only use shaders for shadows instead of relying on the stencil buffer?

And how is 32 bit float a good substitute for 32 bit integer? I understand that approximation hacks and wrapping are okay, but then... Isn't coming up with some equation or hack generally more work for programmers than simply replicating?

I am not trying to sound rude, but... imagine and describe some physics effects that could really, really affect gameplay and benefit from double precision at the same time.

sebbbi · Sep 8, 2014

I don't know about NVIDIA and Intel, but most AMD GCN 32 bit integer operations are full rate. 32 bit integer multiply is 1/4 rate. 24 bit integer multiply is full rate (using 32 bit registers). That is often good enough (range from 0 to 16 million).

Current game physics engines running on CPU also use 32 bit float math. 64 bit numbers take twice as much memory and memory bandwidth. SIMD (AVX / SSE) also processed twice as many 32 bit lanes compared to 64 bit lanes per instruction (doubling the theoretical performance).

Some good 64 bit physics related discussion here (UE doesn't support it):
https://forums.unrealengine.com/showthread.php?111-64-bit-Physics

Psycho · Sep 8, 2014

punchinthejunk said:
Wouldn't more games have interplanetary scale if DP were faster? Didn't some pre-DX9 games use interplanetary scale or something similar? If not, then why not? Is interplanetary scale the same thing as eye space linearity? If not, then what is the difference? Also, why not only use shaders for shadows instead of relying on the stencil buffer?

We have a 3d world viewer as part of our rendering application/system. And on that scale the normal SP pipeline wasn't enough - with an origin in europe the precision in california is like a meter... So you have to do your basic object/transformation setup, culling etc in DP and then only in the end make SP matrices for the rendering. And of course avoid global world space, but make it somewhat camera local. It's cumbersome (and only implemented to what was needed here) but doable.
It's hard to see where rendering from the camera really *needs* double precision, but it would surely make life easier.
I guess it would largely be the same with game physics - doing it in a somewhat local space - again more work, but doable.

Grall · Sep 8, 2014

If the camera is set in europe looking at california, wouldn't 1m precision be quite sufficient? If an object near the camera (in europe) spans 1000 screen pixels across, in california the same object wouldn't span even a thousandth of a pixel in size... How the hell could you even tell if objects are Z-fighting if they don't even show up on the screen?

milk · Sep 8, 2014

Grall said:
If the camera is set in europe looking at california, wouldn't 1m precision be quite sufficient? If an object near the camera (in europe) spans 1000 screen pixels across, in california the same object wouldn't span even a thousandth of a pixel in size... How the hell could you even tell if objects are Z-fighting if they don't even show up on the screen?

Wondered the same thing. And at this scale, wouldn't your LOD system have made California as complex as half a dozen polys from such distance?

Psycho · Sep 8, 2014

No, not that way

(that's essentially what I mean when saying that SP should suffice from rendering - objects that far away don't need precision)
But when origin is in europe and the camera is moving around at building scale in L.A. it's pretty useless unless you take care and for instance avoid building you viewing matrices in SP etc etc.

Rurouni · Sep 9, 2014

http://www.davenewson.com/dev/unity-notes-on-rendering-the-big-and-the-small

I'm not using unity, but I do use max for my work and the further you're from origin, the accuracy is worse. Most of my data coming from cad, and it seems those cad people doesn't care about origin too much. When I import the cad, sometimes the location is very far from the origin that the import result is.. funky.

sebbbi · Sep 9, 2014

Floating point values are not a good choice if you want to represent something big with uniform precision at every location. Fixed point is much better for this use case.

For example 24.8 (32 bit) fixed point 3d coordinates are enough to represent the whole world with 1/256 meter (~0.4 centimeter) presicion everywhere. 24 bits are enough to store +- 8388km distance from the origin (earth radius is 6378 km, so it fits nicely inside the coordinate space).

Many game developers are storing their vertex data as normalized 16 bit integers (fixed point presentation) and storing the object bounding box to expand vertex positions to floating point presentation. This keeps the precision uniform inside the object bounding box. Object surface is usually near the edges of the bounding box and floating point would give the worst resolution there. The origin in the other hand is usually in the middle of the object (inside) and thus there are no vertices there. 16 bit integer (fixed point) thus results in much better quality than 16 bit float. 32 bit float provides slightly better quality for huge objects (and 32 bit normalized int even better). But 16 bit normalized is enough for huge majority of objects.

lanek · Sep 9, 2014

sebbbi said:
I don't know about NVIDIA and Intel, but most AMD GCN 32 bit integer operations are full rate. 32 bit integer multiply is 1/4 rate. 24 bit integer multiply is full rate (using 32 bit registers). That is often good enough (range from 0 to 16 million).

Current game physics engines running on CPU also use 32 bit float math. 64 bit numbers take twice as much memory and memory bandwidth. SIMD (AVX / SSE) also processed twice as many 32 bit lanes compared to 64 bit lanes per instruction (doubling the theoretical performance).

Some good 64 bit physics related discussion here (UE doesn't support it):
https://forums.unrealengine.com/showthread.php?111-64-bit-Physics

Interessant read, specially the response that PhysX using 64bit could come soon. Apart that i dont really think it will come really so soon or anyway it will not be used.

How do you make scale your engine/game if you use 64bit floating points between let say a 780TI ( who is allready crippled in this sense ) and middle end gpu ( let say GTX 760 ) who is even more crippled. or if you better like how you do for make scale performance between a GPU with 1:4 DP rate vs one even slower ( by his own graphic performance ) with 1:24 DP rate.

Better to keep "SP" performance, and can maybe be limited in some case, but dont "automatically" cripple the performance on middle end - Low end more than they are allready. In addition if we look on a larger context, is Nvidia have any reason to release then middle end gaming gpu with 1:2-1:4 DP rate and release workstation low and middle end Quadro with crippled DP performance.. Not really a good strategy. ( At least on the CUDA lineup of prosumer / professional market )

punchinthejunk · Sep 9, 2014

@Sebbi: I agree that 32 bit fixed point calcs definitely makes space more uniform than 32 bit float calcs, but 2 problems I can think of are:
1. most GK products marketed for gamers don't have any way to get fast full fixed point precision (that I know of).
2. DX doesn't use full fixed point precision for depth. In UT '99, for example, 32 bit float reverse z doesn't result in depth distribution that makes the game look like it did on the 3dfx Voodoo 2 I had back in '99. At least I remember it looking different but then maybe my hippocampus is screwed up in addition to me having poor executive function.

Nvidia has done a lot of good, but its reputation could go down if it continues to go out of its way to do things like cripple DP. It's hard (for me at least) to believe that it can make that much more profit in the long run having all these different classes of cards and then coming up with algorithms to cripple performance that is potentially useful to everyone. It keeps too many people in the dark and prevents many people from using better products for less money. nvidia may have to hide exactly how they did it, store those secrets, and maybe even use NDAs to do it. But then maybe I am speculating too much.

Exophase · Sep 9, 2014

punchinthejunk said:
@Sebbi: I agree that 32 bit fixed point calcs definitely makes space more uniform than 32 bit float calcs, but 2 problems I can think of are:
1. most GK products marketed for gamers don't have any way to get fast full fixed point precision (that I know of).
2. DX doesn't use full fixed point precision for depth. In UT '99, for example, 32 bit float reverse z doesn't result in depth distribution that makes the game look like it did on the 3dfx Voodoo 2 I had back in '99. At least I remember it looking different but then maybe my hippocampus is screwed up in addition to me having poor executive function.

I'm not a game developer, but I'm pretty sure that what sebbi described just requires biasing the 32-bit integer coordinates to camera space then converting to float with some scale factor in the vertex shader (or whatever first touches the screen geometry). Even hardware with lackluster integer performance won't suffer very much from this.

None of this has to do with rendering precision, where single-precision float is fine. It's only about enabling a more convenient uniform storage for a large world.

sebbbi said:
Many game developers are storing their vertex data as normalized 16 bit integers (fixed point presentation) and storing the object bounding box to expand vertex positions to floating point presentation. This keeps the precision uniform inside the object bounding box.

Incidentally, Nintendo DS used a 16-bit fixed int object coordinate format. Although nothing ever got turned into floats internally, just higher precision fixed with some ugly dynamic range problems occasionally.

Andrew Lauritzen · Sep 10, 2014

punchinthejunk said:
1. most GK products marketed for gamers don't have any way to get fast full fixed point precision (that I know of).

"GK"? I'm not sure what you mean by that but fixed point math doesn't really require special hardware. Regular integer stuff is sufficient.

punchinthejunk said:
2. DX doesn't use full fixed point precision for depth. In UT '99, for example, 32 bit float reverse z doesn't result in depth distribution that makes the game look like it did on the 3dfx Voodoo 2 I had back in '99.

I can't speak for whatever it was doing in '99, but properly-implemented 32-bit float inverse z has plenty of precision... you'll likely run into precision issues with your vertices before you have depth issues. My bet is on your '99 version being wrong, or the two differing in other ways

There really is very little use for doubles in games. As sebbbi has said, in the vast majority of cases where people think they want them, they really should be using fixed-point instead. In fact I'd go as far as to say a lot of the stuff people currently do in floating point is simply for convenience (i.e. not worrying about scale/bounds), not because it's more appropriate than fixed point per se.

Frankly even the scientific computing community doesn't need doubles nearly as much as they think they do, and I say that as someone who has worked in that space. There are actually legitimate uses there are least, but there's even less understanding of float precision and how to manage it and far more "I just want it to work, search-replace float->double" (or even quad!) and buy new hardware in HPC...

3dcgi · Sep 10, 2014

Andrew Lauritzen said:
Frankly even the scientific computing community doesn't need doubles nearly as much as they think they do, and I say that as someone who has worked in that space. There are actually legitimate uses there are least, but there's even less understanding of float precision and how to manage it and far more "I just want it to work, search-replace float->double" (or even quad!) and buy new hardware in HPC...

I remember a conference panel a year or two ago where someone from a visual effects or animation company stated they were using doubles so they didn't need to figure out exactly how much precision they needed. So that's anecdotal confirmation of your comment. I think they were referring to a simulation rather than rendering.

punchinthejunk · Sep 10, 2014

@ Andrew: Thank you. You are a good man. By "GK" I meant the entire Kepler line-up. I don't really think that 32 bit float reverse z-buffering is always sufficient (like it isn't with PCSX2) but then infinite precision isn't possible and many programmable methods (as well as other hardware instructions) can be used to work around less machine precision. There will always be precision mismatches and it is generally better to focus on the present and future than the past. However, the DX specs have been terrible for backward compatibility while OpenGL is about to be a better balance (OpenGL was usually better for programmers who had a high tolerance of ambiguity and it would've generally been best or very competitive if it weren't for Microsoft's patents).

[rambling] Anyway, I wish that there was more free competition in graphics hardware and that nvidia didn't prey on ignorance as much as I think they do to make profits. Nothing radically different has launched since G80 launched 8 years. Driver features are taken away more than added (and I am pretty sure that they have been using subtle lossy depth optimizations for all the drivers that I tried with my 780 and the 660Ti I sold). I mean, PC gamers often avoid consoles because they like choices. I understand that I couldn't always tell the difference, but we could specifically charge more/differently for special drivers, and/or for support, not use NDA contracts, be more tolerant of anonimity (especially if it is accidental), not put such a retainer on people in the Focus group, etc., etc. But then maybe there would be too many choices for people to make. In any event, the intellectual monopoly system is way too powerful (and that is not the only way large corps are perpetuated aided by the ruling class in the central aggression agency aka the u.s. government). Wealthy companies and individuals shouldn't have the privilege to legally sue poor ones. I also believe that that Nvidia has been helped more by the presence of the patent system than hurt by it, as it ultimately sent a signal to 3dfx that helped end 3dfx (as did their contract suit against Sega).

Jen Hsuang is a good man, but I think he is a very powerful and possibly not-so-democratic CEO. Perhaps someone like Michael Katz would be better (although Tom Kalinske was good, Sega's prices got high as hell even under him and while Bernie Stolar was good he set the U.S. launch price of the Dreamcast too low and including a 56k modem was the worst thing that could've been done for the future or for Sega's profits (perhaps he was trying to make Hiro Nakayama angry)... early adopters and especially my parents would've paid a lot more for the DC even though even I knew that the Dreamcast wasn't going to last long). I know i would be worse and even less democratic than Jen Hsuang (his deductive reasoning is top notch while my inductive logic obscures the truth and my deductive logic doesn't as quickly yield truths like JH's does) and I ain't the only one.
[/rambling]
Finally (for now), I offer apologies for coming back as well as for the incoherence and/or other ambiguity in any of my posts, past, present, or future. I am thankful for this soapbox even if I am not welcome to it. I offer further apologies if I am not welcome.

Exophase · Sep 11, 2014

Bringing in PS2 emulation is a red herring.. just because games managed to rely on a certain depth format to work properly doesn't mean that they intrinsically needed it. You can just as well find emulated games that break by having too much precision.

punchinthejunk · Sep 11, 2014

@Exophase: That is true, and that is one reason why I think software rendering is better (no need to keep legacy formats for functionality) and lower level (as close to the metal as possible) is better. Hardware rendering just isn't very versatile vs full programmable rendering

Exophase · Sep 11, 2014

punchinthejunk said:
@Exophase: That is true, and that is one reason why I think software rendering is better (no need to keep legacy formats for functionality) and lower level (as close to the metal as possible) is better. Hardware rendering just isn't very versatile vs full programmable rendering

Software rendering (on CPU or GPU) could be a better choice if your requirements are to emulate some console very accurately. But better for native games? Not really, unless someone is trying something really unusual. For the way anyone does stuff now the fixed function parts of the GPU more than make up for whatever limited flexibility in efficiency.

Crippled DP performance holds back gamers or not?

punchinthejunk

Alexko

sebbbi

punchinthejunk

sebbbi

Psycho

Grall

Invisible Member

milk

Like Verified

Psycho

Rurouni

sebbbi

lanek

punchinthejunk

Exophase

Andrew Lauritzen

Moderator

3dcgi

punchinthejunk

Exophase

punchinthejunk

Exophase

Similar threads