NVIDIA Kepler speculation thread

Look at the die shot. It matches with 4 schedulers as well.


Instruction issue is done at the scheduler/dispatcher level. There's no requirement that all four schedulers dual-issue to the SIMDs every clock for the arch to be considered dual-issue. There can be SFU, L/S ops intermixed there too.
Dual issue means executing 2x more instructions. SFU/L/s ops, in all probability cannot be issued on top of 192 ALUs present. If single issue feeds 128 ALUs, then what is 192 ALUs?
 
Dual issue means executing 2x more instructions. SFU/L/s ops, in all probability cannot be issued on top of 192 ALUs present. If single issue feeds 128 ALUs, then what is 192 ALUs?

I'm not sure what you're asking. The number of ALUs alone doesn't dictate the maximum number of instructions issued per clock. Each of the four schedulers can issue 2 instructions per clock for a total of 64 across the chip. Granted it can't do this on every cycle because some execution units can't accept a new instruction every clock (SFU, L/S) and there aren't enough SIMDs to go around. There might not be enough instructions to go around either.
 
I'm starting to wonder if there is a "big kepler" for the desktop or if this is just a tesla/compute oriented modification of current kepler. With 680 at $500 is not like they can really charge more for a single GPU card... Would they even bother making a bigger GPU for a card that might sell 100k units or less? I suspect a 690 dual card is the new planned high end and GK110 is a tesla specific part.
 
I'm starting to wonder if there is a "big kepler" for the desktop or if this is just a tesla/compute oriented modification of current kepler. With 680 at $500 is not like they can really charge more for a single GPU card... Would they even bother making a bigger GPU for a card that might sell 100k units or less? I suspect a 690 dual card is the new planned high end and GK110 is a tesla specific part.

Selling products for more than $500 has never been a problem for them in the past. And it's possible that the 680 won't be positioned where it is now when GK110 is ready.
 
I'm starting to wonder if there is a "big kepler" for the desktop or if this is just a tesla/compute oriented modification of current kepler. With 680 at $500 is not like they can really charge more for a single GPU card... Would they even bother making a bigger GPU for a card that might sell 100k units or less? I suspect a 690 dual card is the new planned high end and GK110 is a tesla specific part.
Current prices will come down as the manufacturing process matures.
No matter how good the 680 is, the market for it will be limited so long as the price remains at US$500.
It is better for Nvidia if they can sell 200,000 GK104s at an average of $350 each rather than 40,000 GK104s at $500 each (using completely hypothetical numbers).
 
Current prices will come down as the manufacturing process matures.
No matter how good the 680 is, the market for it will be limited so long as the price remains at US$500.
It is better for Nvidia if they can sell 200,000 GK104s at an average of $350 each rather than 40,000 GK104s at $500 each (using completely hypothetical numbers).

Only if you get enough wafers to make 200.000 GK104s, while honoring the OEM deals for the smaller chips as well. And before the end of the year, it unlikely that TSMC will be able to meet the demand for 28nm wafers by AMD, Qualcom and NV.
 
What if GK110 will be marketed as 780, and GK114 760?

Yes, and 780's performance might be something like this:

zydb0w.jpg


If you believe the competition hasn't learned anything from their first effort. Or do you believe that Kepler is the best possible implementation of 28nm? I find that highly unlikely and even disputable right now.

Well, if the performance improvements presented in the previous slide are close to real (even if the slide is fake, that doesn't mean nvidia can't launch such a monster), then AMD can do almost nothing to escape the humiliation (let's use this wordie). ;)
 
I'm starting to wonder if there is a "big kepler" for the desktop or if this is just a tesla/compute oriented modification of current kepler. With 680 at $500 is not like they can really charge more for a single GPU card... Would they even bother making a bigger GPU for a card that might sell 100k units or less? I suspect a 690 dual card is the new planned high end and GK110 is a tesla specific part.

GK110 will be the GTX780 ready for the end of this year (probably around November) and the 104 will be bumped down to GTX760Ti.
 
I haven't tracked down all the benchmarks, and that helpful watermark blots out some pretty important settings. However, from the few I've compared, it looks like the big chip is 30-50% better in several of those games, which seems to fit the likely bump in die size and TDP.
 
I wonder how it does with my volume fluid demos, as GTX 580, was extremely poor at it. If someone could post a screenshot of ComputeMark, that would be appreciated too.
Did some test.
GTX480, GTX680 in fps

Fluid3D_2
11, 40
------------

Mandeldx11

iterations = 2048
vector: 178, 372
Scalar: 195, 404
Double: 18 , 15
------------

Julia4D
dx11 compute shader

full detail
without shadows: 146, 300
with shadows: 99, 210
---

Seems like doubles took a hit, but otherwise there seems to be some progress.
 
Did some test.
GTX480, GTX680 in fps
Seems like doubles took a hit, but otherwise there seems to be some progress.
There's progress in singles as long as the compiler has been optimized for it / for the things the program does. If it's not, Luxmark happens.
 
jlippo said:
Seems like doubles took a hit, but otherwise there seems to be some progress.
Interesting. Are those CUDA, OCL, or DirectCompute?(edit: never mind, it's DirectCompute.)
The numbers here are more or less what you'd expect given the clocks and numbers of ALUs.

Is there something particularly taxing on LuxMark that's not present in these tests?
 
Well, if the performance improvements presented in the previous slide are close to real (even if the slide is fake, that doesn't mean nvidia can't launch such a monster), then AMD can do almost nothing to escape the humiliation. ;)
Tahiti is about 70% bigger than Pitcairn - and about 30% faster @2560x1600 in real gaming benchmarks (while running @ a mere 925Mhz).

Some people consider that a complete and utter fail.

Now GK110 is rumored to be about 80% bigger than GK104 - and there's an alleged slide that says it's about 40% faster than GK104 @2560x1600 in some marketing picked benchmarks.

The same people that consider Tahiti an utter fail now proclaim that GK110 is a chip of many wonders - and ponder it's about to totally humilate AMD.

EDIT: Did I mention that Tahiti was 3 months early (compared to Pitcairn) - and GK110 will probably be 6-9 months late?
 
Last edited by a moderator:
Interesting. Are those CUDA, OCL, or DirectCompute?(edit: never mind, it's DirectCompute.)
The numbers here are more or less what you'd expect given the clocks and numbers of ALUs.

Is there something particularly taxing on LuxMark that's not present in these tests?

I believe that RayTracing, generally speaking, is heavily reliant on cache, which GK104 has little of. But given the extent to which the GTX 680 tanks in LuxMark, there might be other factors at play.
 
I believe that RayTracing, generally speaking, is heavily reliant on cache, which GK104 has little of. But given the extent to which the GTX 680 tanks in LuxMark, there might be other factors at play.
The bandwidth to the L2 in Kepler has been doubled, though. The larger RF quarantines more threads in flight, though the unchanged data L1 cache size won't make spilling more graceful.
Sadly, with the poor state of the NV's OpenCL 1.1 run-time, there's no easy way to qualify how the new architecture handles more complex compute tasks using this API. And I think there's still a pending update of CUDA that covers the new Compute Capability 3.0 for Kepler.
 
There's progress in singles as long as the compiler has been optimized for it / for the things the program does. If it's not, Luxmark happens.

This is incorrect IMHO. You're jumping to a rather strong conclusion based on the simple fact that their CL stack is somewhat bugged, and not quite a priority. Whilst this looks bad for LuxMark and OpenCLBench, its relevance in the real-world is pretty tame, and I'd not look at it for actual insight into how the constraints on their compiler efforts have shifted.
 
I'm still not getting how "inferred power consumption based on chip utilization" is a whizzbang advanced approach. Sitting on the outside it simply looks like "guessing" versus the direct power consumption readings nVidia is doing. I understand the differences just not getting the "more advanced" part....

That was how I read it too which is confusing. I would rather have actual data vs. inferred data any day of the week.
 
Back
Top