So, why Nvidia decided to double texturing power with GK104 (compared to GF114/110)? TMUs aren't really small (cheap) units. :smile:
You could eventually expand that question and ask why GK110 doesn't have just 8 instead of 16 TMUs/SMX.
So, why Nvidia decided to double texturing power with GK104 (compared to GF114/110)? TMUs aren't really small (cheap) units. :smile:
So, why Nvidia decided to double texturing power with GK104 (compared to GF114/110)? TMUs aren't really small (cheap) units. :smile:
Extrapolating GK110 desktop performance based on sterile unit amounts compared to GK104 is somewhat nonsense, since it would mean that there's no single difference between those two chips that could affect 3D performance. If you'd even have a corner case of 3D with a pinch of compute added to the mix it could get even more colourful.
I wonder where that GK110-flavoured magic sauce is? Apart from the larger L2-cache I really cannot grasp it.
There's:
• Hyper-Q - requires multiple concurrent threads at the host system to feed the GPU. Not in DX, there's a single queue built before dispatching to the driver.
• Dynamic Parallelism - you need Cuda-code tailored to this function to use it
• Load-path through texture cache
• 255 regs/thread - mainly useful for DGEMM
• Atomic Ops - not sure if applicable to gaming at all.
Apart from that, there's higher pixel- and triangle throughput, yes.
But that's more or less in balance with higher throughput in other parts of the chip, nothing really "accelerating" GK110 beyond measure.
What did I miss?
More bandwidth and probably slightly more usable bandwidth due to larger caches.
Errr no. Compared to ALU or texel throughput for instance pixel and triangle througput differences are way smaller. Assuming Titan is clocked north of 800MHz that's a 4-6% difference in triangle throughput, which is a moot point in any case because Lord knows how much geometry throughput will be strangled artificially in order to justify Quadro sales.
Alexco missed bandwidth or better the fillrate to bandwidth ratio to be more precise in his former estimates. Even if GK110 would contain anything that would theoretically accelerate it further from GK10x SKUs, would it be really an as much worthwhile hw investment given that they already have an already overblown transistor budget and the majority of bottlenecks in today's games lies were exactly if not bandwidth?
I didn't mention this because the GTX 680 is not particularly bandwidth-constrained, and because I suspect a GK110-based GeForce is likely to have slower memory, so perhaps a ~40% bandwidth improvement overall, which is pretty much in line with shader power. I think the fillrate should follow the same trend, but maybe I'm missing something.
This is not meant to be an estimate of actual performance, of course, just an upper bound. I don't expect the additional cache to have a significant impact on games, but I could be wrong about that.
Ailuros,
More bandwidth but only in total, compared to other throughput measures, it should not move much with GK110. The only real difference I see is the doubled L2. Pixel throughput could be as high as 60 ppc, raster rate could be 40 ppc, triangles could be at 7,5 tpc, scaling with number of SMX.
I don't really see where clock speed comes into play. That's only important when were comparing not architectures but SKUs.
You where talking about "more bandwidth", Ail. This has to be put into relation to other resources, which increase as well. That's what I was talking about.
Clock rate is important for a SKU level comparison for sure. But we were talking about the magic sauce in GK110.
BTW - I like your 850 MHz number ;-)
So to sum it up you're expecting only a quite small frequency increase to a K20X with even less bandwidth than the latter has. Don't tell me that you also expect to see a $899 MSRP for a solution like that
The K20X has only 30% more bandwidth than the GTX 680 (250GB/s vs. 192.2GB/s). So yes, I believe something like a 40% improvement for a GK110-based GeForce is reasonable.
http://www.nvidia.com/object/tesla-servers.html
http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-680/specifications
Damn and I recalled it has 288GB/s for whatever reason
Maybe cause it is the bandwith of the 7970Ghz lol. ?
I never said I expect or believe there's any magic sauce in GK110.
Let me do some more dumb math:
GF110 vs. GF114:
GFLOPs = +25%
GTexels = -6%
MTris = +88%
GPixels = +31%
GB/s = +50%
(real time average performance difference ~42% and call me bold here but I wouldn't be surprised that if GF110 would have had quite a bit more fillrate the difference would had been closer to <50%)
GK110@850/1500MHz (theoretical) vs. GK104:
GFLOPs = +58%
GTexels = +58%
MTris = +6% [it'd be +58% actually]
GPixels = +27%
GB/s = +50%
Yes those are naive peak numbers, but I still have a quite hard time believing that under those conditionals the difference between the latter two will be at only 30%. In fact if the real difference should be at 40-50% it's not real stunt for upcoming comparisons either; most likely it'll just mean that history repeats itself against the top dog single chip SKU of the competition.
I would wager it is mostly the bandwidth.CarstenS said:I firmly believe that quite a bit of GF110's higher performance compared to GF114 is coming from this and not all is attributable to higher bandwidth.
A bit off-topic but here's a little more evidence of Kepler's weakness in general compute. CUDA accelerated raytracing in Adobe After Effects.
http://www.legitreviews.com/article/2127/1/
Kepler's consumer variants maybe. GK110 is an entirely different matter.
Kepler's consumer variants maybe. GK110 is an entirely different matter.