DavidGraham
Veteran
Sounds like HPC Ampere is a 128CU, another chip just showed up, with 7936 CUDA cores.
445.01 & 445.35 are both insider drivers using WDDM 2.7 for Windows 10 20H1
Current latest driver is 442.50 and CUDA 10.2
Yeah because that makes sense. No other factors are involved of course.So going from 118 to 124 resulted in an increase of 31%!
Yea, I know it must hurt so much!Oooh Nvidiaaaaaaaa!
Could be just as easily due to memory bandwidth changes.I think it's interesting that if the clocks are not being missrepresented, those benchmark results are pretty consistent with previous rumors/leaks.
First, the higher than expected performance-per-SM is consistent with the proposed setup of 16xFP32 + 16xINT or 16xFP32 + 16xFP32, and I'd even say that by the expected amount. Turing would be effectively ~1.4x warps every 2 cycles (36 INT per 100 FP thing), while Ampere would actually be able to do the full 2 warps, which is a 40% perf increase, which we are kinda actually seeing in those benches.
Second, 50% higher performance is roughly what we are shown and I think with such low clocks on 7nm and the afforementioned setup, half the power consumption is pretty realistic, almost a given to be very low.
Could be just as easily due to memory bandwidth changes.
I'm just saying that it's hard to tell from these results. L2 is 5X+ larger, memory size varies from 24 to 32 to 48 GBs so it's hard to say if it's even four stacks and not six now for example.You think? It's the same memory setup as Volta except for clocks, and perf on Volta vs Turing is more consistent with TFLOPS than it is with Volta's 50% higher memory BW. In this particular benchmark anyway.
I'm just saying that it's hard to tell from these results. L2 is 5X+ larger, memory size varies from 24 to 32 to 48 GBs so it's hard to say if it's even four stacks and not six now for example.
Also how does Geekbench count the number of SPs? Shouldn't it detect the proper number if it's 2X per SM now? Or does it just use some fixed number per SM and multiply it by SM count?
Since it uses OpenCL and CUDA, it can read it directly from what the driver reports.Also how does Geekbench count the number of SPs? Shouldn't it detect the proper number if it's 2X per SM now? Or does it just use some fixed number per SM and multiply it by SM count?
FWIW, a 2080 Ti @1100 MHz is around 106k in GB5.
Since it uses OpenCL and CUDA, it can read it directly from what the driver reports.
hmmm...
NAVI10 is 251mm2 on 7nm for 10.3 billion transistors
TU106 is 445mm2 on 16/12nm for 10.8 billion transistors
Number of transistors is in the same ballpark (within 5%), TDP is also very close despite AMD full node advantage, but RDNA lacks VRS, Ray Tracing acceleration, Tensor cores, DLSS, good video encoder and support of INT4/8 for fast inference !!! It's very clear which one has the upper hand.
So IMHO, RDNA is far away from Turing. AMD can only compete because of their node advantage. They already made their big move with RDNA and RDNA 2 will be a small architecture evolution (mostly bringing VRS and RT) on a refined mode, where Ampere is a totally new architecture with full mode shrink. Like everybody, I want close competition for the sake of reasonable prices. But let's be pragmatic, even with only a node shrink and zero uarch improvement (and it's not), it will be a bloodbath for AMD...
Navi isn’t faster in games, it’s roughly on par with Turing while having less features. Also, please don’t peddle this “Navi/RDNA is a hybrid” that came from the lowest of the low tech publications...AND...? (is it not obvious?)
Navi10 is faster in games, not equal in games. That is why nVidia released SUPER. (that still can't compete in some games). Understand?
Secondly, navi-10 uses a hybrid design rdna(1), and the long held secret of rnda2 (AMD's full new gaming architecture) has yet to be seen. But we know it is not weighed down with gcn, or what limited that design... it free from that.
TU-106 equalized (for mhz, transistors, etc) can't beat hybrid navi, then how will it compete whit rdna2's gaming efficiency ? We are talking about a uArch, that AMD has been working on (in secret) for 3 years and once shown to Clients years ago, jumped on board. Both the new Xbox & PlayStation will be using rdna2, not to mention we've seen some of the specs. Xbox might have the gaming performance of the rtx2080, using rdna2.
You might want to stop on let that sink in.
Thridly, Ampere is not new, it's architecture is 100% based off of Turing, just further refined. You are fabricating. And yes, on a full node shrink, that is totally new to nVidia. They will have growing pains.
But you still have not refuted the fact that rdna(1) is more powerful (at gaming) than turing architecture. And more of nvidia's design (turing 2.0?), is not going the help in games, because (again) it's based on an antiquated design.
I don't want a bigger 2080 shrink down, I want a revolutionary one. And this ampere, as we know it, is only nvidia's next volta business sector dGPU.
Navi isn’t faster in games, it’s roughly on par with Turing while having less features. Also, please don’t peddle this “Navi/RDNA is a hybrid” that came from the lowest of the low tech publications...
RDNA2 will be a solid improvement but nothing revolutionary.
full stop. I just proved my case above, there is no argument here.
If navi10 and TU-106 are the same size and navi-10 is on average +15% faster... how is it not more efficient. More performance ("ipc"), using less transistors..? No matter how you scale it, rdna(1) comes out on top for freq vs output.
And, I am not peddling anything. rdna2 is different, period!
I just proved my case above, there is no argument here.
If navi10 and TU-106 are the same size and navi-10 is on average +15% faster... how is it not more efficient. More performance ("ipc"), using less transistors..?
Not really. You're just repeating unfounded statements over and over. That doesn't amount to proof.
How did you calculate the transistor count? How many transistors did you allocate to tensors and RT?
Instead of random guessing why don't you compare the 5700xt and 2070 super. They literally have the same specs.