When will we get double precision?

Cg definitely supports double. It is treated as float in current profiles but that doesn't prevent any future profile to treat double with full precision.
Ah ha, that I didn't know. I was fairly sure there was no way to in GLSL or HLSL, but Cg makes sense.
 
Ah ha, that I didn't know. I was fairly sure there was no way to in GLSL or HLSL, but Cg makes sense.

Well, HLSL seems to support the datatype:
http://msdn2.microsoft.com/en-us/library/bb509646.aspx
http://msdn2.microsoft.com/en-us/library/bb509707.aspx

GLSL has double and dvec2 dvec3 dvec4 reserved for future need so all all that is needed there is an update of the language specification from 1.20.
http://www.opengl.org/registry/doc/GLSLangSpec.Full.1.20.8.pdf
 
By the way, does your application require double precision for generating the SAT or for sampling? If it's just the former have you considered doing it on the CPU? SSE2 is really quite fast for double precision.
It also needs it for sampling unfortunately. (As an aside using INT32 works really well in practice, so the need for double-precision is lessened, but I still feel that it would be useful to have available for research, etc).

And yes, HLSL also exposes a "double" data type although of course they make no guarantees on the actual precision that you get. However Direct3D 9 and 10 do not to my knowledge support any double-precision *storage* types, which actually makes the HLSL support a bit useless for most applications.
 
Regardless, I believe there will be a high-end Quadro card that supports double precision. I wouldn't be surprised if this is only available through CUDA, though.
 
There's a lower limit to how far NV can "cripple" FP64 for GPGPU-HPC purposes before it becomes essentially irrelevant in the face of competition from plain multi-core CPUs with their spectacular ease of use. They may well be in the 1TFLOP SP arena but if that yielding <~100MFLOP DP sustained then given the programming challenge of GPGPU they'll have more than one CPU maker bearing down on them like a tonne of briquettes.
 
I agree 100%, but everyone that I've spoken to at NVIDIA seems to use the feature-differentiation and "consumer GPUs don't need it" arguments, which are certainly irritating to me as a researcher.

-----------------------------------------------------------------------------------
Questions: Could you explain what it means by feature-differentiation arguments? I am doing a design project + research for G80 datapath in different chipsets and CUDA. The quick follow-up is that they can't answer all the questions. Not sure if it's confidential tho.
 
Questions: Could you explain what it means by feature-differentiation arguments? I am doing a design project + research for G80 datapath in different chipsets and CUDA. The quick follow-up is that they can't answer all the questions. Not sure if it's confidential tho.
Tesla, Quadro, and GeForce have all used the same chips. The upcoming generation will continue to use the same chips (G92 across product lines, we assume). NV sees double precision as a way to drive people to Tesla instead of just developing with standard GeForce cards, so it will somehow be disabled on the GeForce boards (similar to the differences between Quadro and GeForce that has existed for years). Maybe through drivers, maybe through BIOS stuff, nobody knows. That's what they mean by feature differentiation, though--take the same chip and create different products for different segments of the market from it.
 
Tesla, Quadro, and GeForce have all used the same chips. The upcoming generation will continue to use the same chips (G92 across product lines, we assume). NV sees double precision as a way to drive people to Tesla instead of just developing with standard GeForce cards, so it will somehow be disabled on the GeForce boards (similar to the differences between Quadro and GeForce that has existed for years). Maybe through drivers, maybe through BIOS stuff, nobody knows. That's what they mean by feature differentiation, though--take the same chip and create different products for different segments of the market from it.


-------------------------------------------------------------------------------------------------
No no no no no, Tim, I was talking about the interaction and the cost of interaction between the CPU, GPU and in different platforms(AMD and Intel). I just put up another thread and I am not sure if you could answer the questions maybe here or there.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

1. Nvidia claims that there is no direct mapping between video memory and main memory. So when you have the following simple block digrams:

--> CPU --> L1 --> L2 --> MM(Main memory) --> VirtualM(I/O) for the CPU datapath

--> GPU --> SPs w/register files/load store --> video memory --> PCIe.

Two questions
(a) it's a consensus that the texture is directly loaded from the main memory for all previous generations of GPU. So if that's the case, for G80, is it still the same? We suppose that if the program is GPU-oriented( e.g. CUDA), then the texture can be moved from I/O to the video memory for faster executions later. But if that's the case, does G80 try to take the data not via CPU but through the PCIe, then the I/O via the south bridge of all the chipsets? We think this question is very important because even though the texture size is smaller than 768MB(8800GTX) or larger than 768MB(which will generate the issue of VM), this could be one issue which leads to the difference of performance.

(b) If there's no direct mapping between video memory and the main memory, what's the cost of interaction?

2. Related to Q.4 as well. Now we have AMD and Intel platforms(haven't done a huge research on the datapath on chipsets yet, but we will do it in the near future.). We all know that memory controllers and PCIe go through the northbridge of the Intel Platforms(nForce4, 570i, 590i, 680i, and perhaps P35), but the memory controllers are integrated in AMD processors, shown in

http://www.xbitlabs.com/articles/cpu...x_4.html#sect0
http://www.xbitlabs.com/articles/cpu...-e3-mem_2.html

And the below are for Intel chipsets:

http://www.pcper.com/article.php?type=expert&aid=320
http://www.extremetech.com/article2/...2134184,00.asp

How could we end up yielding the similar bandwidth and the performance between AMD and Intel chipsets.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Thanks.
 
So... does Double Precision finally allow higher quality Anisotropic Filtering (32xAF, or more?) without a huge performance hit? I really don't know much about these things, just wondering out aloud here.

Also, could someone explain in layman's terms what DP actually brings into games?
 
So... does Double Precision finally allow higher quality Anisotropic Filtering (32xAF, or more?) without a huge performance hit? I really don't know much about these things, just wondering out aloud here.
No.
Also, could someone explain in layman's terms what DP actually brings into games?
Nothing. Remember FP16 versus FP32 versus FP24? Well, now you'll have FP64.
 
FP64 is even a different kind of beast. While FP32 can give an optical benefit in some cases FP64 isn't even used in offline rendering often.

Double support is added to GPUs primary to be used in scientific GPGPU stuff. That's also why it's introduced with Tesla first.
 
Back
Top