Nvidia GT300 core: Speculation

Status
Not open for further replies.
That would be quite an accomplishment : )

Interesting - i don't know enough about the GPU they include though.

But you may very well be correct and it wouldn't be the first time that charlie was wrong.

DK

I have a hard time finding where he said that GT216/18 were cancelled.
He was spot on with 212/214 being cancelled, the tape-outs of 216/18 seem to coincide with the production week stamp on the "leaked" 216-350 package.
 
http://forums.nvidia.com/index.php?showtopic=103046&st=0

fuST.png
 
Last edited by a moderator:

Your image doesn't show up since you're not linking to a picture but a link to an attachment, you can only see it when you're logged in on the nv site.

Simon Green nVidia Developer said:
I'm not sure who told you this, but it's not true, our double precision units are separate. ATI does implement their double precision using the SP units, apparently, maybe they were confused!

Anyone seeing this change in GT300, it was obviously a bad choice before.
 
Last edited by a moderator:
Anyone seeing this change in GT300, it was obviously a bad choice before.

I'd say that from a GPGPU perspective it's anything but a bad choice; it's rather a bad choice for the mainstream consumer market where I as a buyer have dedicated transistors for something that I won't be able to use.
 
This doesn't sound all that formal to me ... more tests and assertions than usual in software design, but no greater reliance on formal verification (the equivalence testing below RTL doesn't really count IMO).
 
I'd say that from a GPGPU perspective it's anything but a bad choice; it's rather a bad choice for the mainstream consumer market where I as a buyer have dedicated transistors for something that I won't be able to use.

Yep, which makes it all the more mysterious. How are they going to accomodate the HPC crowd (which they seem intent on doing) while limiting the amount of transistors dedicated to them? Based on the material I've been able to find so far it seems that doing it AMD's way will be very expensive if they provide the same level of functionality as GT200's DP unit (AFAIK AMD's version right now doesn't provide FMAC's etc). So they might actually stick with the dedicated units.
 
Yep, which makes it all the more mysterious. How are they going to accomodate the HPC crowd (which they seem intent on doing) while limiting the amount of transistors dedicated to them? Based on the material I've been able to find so far it seems that doing it AMD's way will be very expensive if they provide the same level of functionality as GT200's DP unit (AFAIK AMD's version right now doesn't provide FMAC's etc). So they might actually stick with the dedicated units.

A dedicated HPC chip would be one of those very funky theories but it doesn't make sense to me from many perspectives. It would simply cost them way more than anything else.
 
It has nothing to do with compilers. You lose precision if you round the result of the multiplication before the addition. In an FMAC the rounding is only done after the ADD.

Every fp operation involves loss of precision. The point is, how many codes can actually take advantage of the FMA/FMAC instructions. Those most likely are, n body, FIR and matrix multiplication. Raw n body is pretty rare, you use FMM in a practical problem. For FIR filters, convolution is often preferred. As for matrix multiplication, all by itself, it is a very small part of the HPC problem space.
 
Every fp operation involves loss of precision. The point is, how many codes can actually take advantage of the FMA/FMAC instructions. Those most likely are, n body, FIR and matrix multiplication. Raw n body is pretty rare, you use FMM in a practical problem. For FIR filters, convolution is often preferred. As for matrix multiplication, all by itself, it is a very small part of the HPC problem space.

Yeah but what's your point though? Obviously Nvidia added the capability because they intend to peddle Tesla to people who care about it. And the CPU guys are gonna have it soon. In any case, does anyone have detailed info on AMD's DP support? Besides a few cursory mentions in some RV770 reviews I can't find anything along the lines of this table or examples of its usage.

5.png


 
R700 Family ISA said:
MULADD_64

Floating-point 64-bit multiply-add. Multiplies the double-precision value in src0.YX by the double-precision value in src1.YX, adds the lower 64 bits of the result to a double-precision value in src2.YX, and places this result in dst.YX and dst.WZ.

dst = src0 * src1 + src2;
Sounds like it's attempting to describe a fused MAD...

Jawed
 
"adds the lower 64 bits of the result" implies rounding to me.

Where'd you find that by the way? It's a nightmare trying to find info on AMD's stuff.
 
Yeah but what's your point though? Obviously Nvidia added the capability because they intend to peddle Tesla to people who care about it. And the CPU guys are gonna have it soon.

My point is that as long as the chip kicks ass, nobody cares if it does FMADD/FMAC or not. Of course, FMA/FMAC improves perf, but it is not a must have. I am all for having FMA capability, but I won't take a GPU which has FMA but lower perf/W/$ over one which does not have FMA but higher perf/W/$.
 
My point is that as long as the chip kicks ass, nobody cares if it does FMADD/FMAC or not. Of course, FMA/FMAC improves perf, but it is not a must have. I am all for having FMA capability, but I won't take a GPU which has FMA but lower perf/W/$ over one which does not have FMA but higher perf/W/$.

In case you missed it we were discussing DP for GPGPU applications. I'm not sure how the chip "kicking ass" is relevant. Why so sensitive about it anyway? Nvidia has their priorities and you're free to ignore them if they aren't aligned with yours. But I'm sure the target audience finds this functionality useful else it wouldn't have been implemented.
 
I am not sure why the disconnect is there. When people look for chips for an upgrade/ new facility, they look at perf/W/$, and not whether or not chip x does FMA or not. The demand is perf/programmability/driver stability etc. A product that does better on this count will win in the HPC (GPGPU) marketplace even if it doesn't do FMA. Of course, all modern chips do FMA, but it is not a marketing bullet point.
 
That's why I asked for examples of people using AMD's DP capabilities. You're implying that their approach is perfectly fine for HPC applications but I haven't come across anything demonstrating its usage or documenting its pros and cons. Denorm support for example is a big deal but I can't find any info on how those are handled by AMD's hardware and at what performance.

If you read a few posts above we're contrasting AMD's iterative approach with Nvidia's dedicated hardware. The question is whether you can provide the same level of functionality using the former. Perf/W/$ doesn't mean anything if the hardware isn't sufficiently functional for your needs.
 
I know at one point AMD couldn't do certain transcendental functions with DP, but I believe that was a software limitation. I don’t know if that’s still true or not (or if it really was a software limitation).
 
Status
Not open for further replies.
Back
Top