NVIDIA Kepler speculation thread

Blender/Cycles (a CUDA path tracer) users are reporting the same kind of results shown by LuxMark: the 580 is faster than the 680. So it doesn't look like a OpenCL specific problem.
Maybe the driver still needs tuning for Kepler, since the scheduling now must be JIT-ed?
 
Or... memory access isn't very coherent, and smaller cache size/compute results in more misses. Also, long kernels should have high register usage, which might limit the number of warps per SMX and accordingly hinder latency hiding ability, perhaps more than on 580. (I could of course be wrong, but the static scheduling thing doesn't sound so incredibly complicated compared to e.g. scheduling in AMD's VLIW4/5 regime).
 
Maybe the driver still needs tuning for Kepler, since the scheduling now must be JIT-ed?

I think the compiler is being way more hyped than what it is responsible for. From what nv has described it doesn't appear to be anything more than what GCN does.

It's simple dual issue for crying out loud. It's pretty simple with a half decent ISA, assuming it's possible given the workload. Which it should be since AMD used VLIW5 not too long ago.

Far bigger is the castration of latency hiding. Vis a vis GF104, there's 4x more compute and only 2x more registers. Shared/L1$ hasn't increased at all. It's not surprising that in benchmarks like LuxMark which are pretty memory intensive, it's getting hammered.
This is incorrect IMHO. You're jumping to a rather strong conclusion based on the simple fact that their CL stack is somewhat bugged, and not quite a priority. Whilst this looks bad for LuxMark and OpenCLBench, its relevance in the real-world is pretty tame, and I'd not look at it for actual insight into how the constraints on their compiler efforts have shifted.
Their CL stack must be equally borked for both 580 and 680, since it generates PTX.
 
It's all about expectations. I've met women who were floored because I opened a door for them or asked if they got home ok cause their expectation is that all men are douchebags (as if opening doors changes that fact :LOL: ).

People expect nVidia to make big, power hungry compute focused chips with relatively low gaming efficiency because nVidia has made its priorities blatantly clear. This is why Kepler was a positive surprise - it broke that expectation. Pitcairn is equally impressive yet got nowhere near the same reaction because it "only" met expectations.

Yeah but, does pitcairn suck at compute? Cause, 680 does.

I mean what I gather is, 680 is so efficient because Nvidia decided to finally separate "game focused GPU" and "compute focused GPU", with 680 being the former.

7970 still seems to follow the "compute+game together" model, therefore it's gaming efficiency is lower.

But my question is, does pitcairn suck relatively at compute? If not then it's definitely pretty impressive, as it would appear to have 680 class game efficiency while retaining strong compute ability.

Overall, I think what Nvidia did splitting compute/game was smart, and it will be best for AMD to follow suit (especially if process shrinks are stalling, it will be necessary), and hey, it's time for AMD to copy Nvidia for a change :p

It also might sadly be a necessity for AMD to unveil it's own boost type system, not because it's really better, but because it gives you a benchmark edge for reviews, which as SB has been pointing out even if it's 5%, that's a big deal. But on that front, I think it's best to see how enthusiasts cotton to boost. So far it doesnt seem to be hurting them, the desire to own the card that has the longer bars in all those review graphs far outweighing any user trepidation at the loss of traditional overclocking.


I still dont think 7970 is too bad at all, it's pretty close to 680 despite the compute burden, and again AMD really needs to kick the default clock to 1ghz, that would help close the perceived gap.

My guess is with it's compute part already close to 680, a game centric AMD part would once again trounce Nvidia in perf/mm. It doesn't appear AMD engineering has lost it's superiority.

Edit: well I could have googled it before asking but it does appear pitcairn's compute is just fine. Impressive indeed then http://images.anandtech.com/graphs/graph5699/45164.png
 
We can't really say that GTX 680 sucks at consumer focused compute yet or not as there are scenarios where it does quite well.

It's too early to determine whether its tragic failing in other situations is due to shortcomings/compromises in the architecture or if Nvidia just didn't care enough or didn't have the time to work on the driver frontend for it.

Obviously HPC/professional level of compute is certainly compromised, but that has little impact on the consumer market.

Regards,
SB
 
Well I mean, this was posted on GAF, and they really did consciously take compute out of 680. And kudos, it was smart.

http://www.embedded.com/electronics...s-Kepler-to-get-compute-cousin--says-analyst-

David Kanter of Real World Technologies said the newly launched Kepler platform was specialized for high-end graphics, whereas more general purpose workloads would necessitate a differentiated version of the chip in order for Nvidia to remain competitive in the HPC space.

“It will be a derivative of Kepler, to re-use as much of the engineering effort as possible, but with several significant changes,” he said, hinting that a Kepler cousin could be announced as soon as May.

When it comes to different workload requirements, Kanter said it was clear that a graphics centric chip would not require much in the way of cross-system communication, whereas scientific computing did need it for algorithms to be efficient.

“For purely graphical use, the pixel that’s on the bottom left corner of your screen doesn’t care what the pixel in the middle of the screen is doing at all,” he explained, noting this wasn’t the case when trying to accelerate flow calculations or other more complex HPC data.

Thus, said Kanter, the hardware for each specific purpose -- gaming or scientific-- would have to be slightly different.

“Fermi had great resources for communicating between different parts of the application, but Kepler doesn’t have nearly as much capability to communicate between various levels of the system,” said Kanter, explaining that this would require a two-fold approach from Nvidia.


The approach is somewhat similar to what Intel did with its server and client version of SandyBridge, though the firm effectively used the same core for both but simply added a lot more cache and memory bandwidth to the server version, which also had twice as many cores, more PCI express and QPI. “That’s the ideal thing to do,” said Kanter, though he said Nvidia’s plan would be to build similar cores but not quite the same.

“Nvidia has to scale it up to do the compute side, so it will probably be a much bigger chip,” he said.
 
Yes, we already know that GTX 680 isn't appropriate for the HPC/professional market when it comes to compute. That link just reinforces that.

What we don't know is whether any of that impacts the consumer compute workloads. Or if those corner cases are just due to lack of attention by Nvidia within the driver frontend. Or if it's a direct result of those compromises in making a more efficient consumer oriented GPU.

Hence why Pitcairn is a more appropriate architechtural comparison despite the size and marketplace discrepency. And as such in those areas where Pitcairn doesn't fall off a cliff like GTX 680, that's where we don't know if it's purely architechture or insufficient work on the driver/software frontend of GK104 that is causing those pitfalls.

Regards,
SB
 
Is NVidia measuring current and volts or just current and assuming volts? And how does current off-die tell you about the heating effects of that current on die?

Thermal management and power throttling are two different issues. So better to not combine them into a single discussion.

Current measurement helps with staying within the bounds of EDP/TDP. Thermal is altogether another story.
 
18 months ago, all we heard was how NVIDIA was abandoning graphics and was going to build HPC-only parts (or somesuch silliness). Today, I hear the opposite. Times sure change.
 
David Kanter of Real World Technologies said the newly launched Kepler platform was specialized for high-end graphics, whereas more general purpose workloads would necessitate a differentiated version of the chip in order for Nvidia to remain competitive in the HPC space.

“It will be a derivative of Kepler, to re-use as much of the engineering effort as possible, but with several significant changes,” he said, hinting that a Kepler cousin could be announced as soon as May.

This sounds vaguely familiar...

I'm starting to wonder if there is a "big kepler" for the desktop or if this is just a tesla/compute oriented modification of current kepler. With 680 at $500 is not like they can really charge more for a single GPU card... Would they even bother making a bigger GPU for a card that might sell 100k units or less? I suspect a 690 dual card is the new planned high end and GK110 is a tesla specific part.

:p
 
Well, nVIDIA does have past history about selling single GPU based video cards above the $499~$599 mark, ala 8800Ultra ($799?), 7800GTX 512MB($999?), 6800Ultra EE etc
 
ATI needs to release 7970XTX / 7980 clocked at 1.1GHz+ - since at that speed it beats stock clock GTX680, ~95% percent most of benchmarks.
http://www.xbitlabs.com/articles/graphics/display/nvidia-geforce-gtx-680.html

-OR-

Maybe sapphire atomic or vapor-x HD7970 1.1Ghz+

http://www.fudzilla.com/home/item/26531-powercolors-hd-7970-vortex-ii-comes-soon
powercolor_HD7970VortexIIoff_1.jpg
 
Last edited by a moderator:
rpg.314 said:
Far bigger is the castration of latency hiding. Vis a vis GF104, there's 4x more compute and only 2x more registers.
Don't forget GF104 had its ALUs running at 2x frequency.
48 ALUs * 2 Ops/Hz = 96 ALU Ops/Hz (GF104)
Versus
192 ALUs * 1 Op/Hz = 192 ALU Ops/Hz
(GK104)

So GK104 doubled the compute and also doubled the registers.
 
Yeah but, does pitcairn suck at compute? Cause, 680 does.

I mean what I gather is, 680 is so efficient because Nvidia decided to finally separate "game focused GPU" and "compute focused GPU", with 680 being the former.

7970 still seems to follow the "compute+game together" model, therefore it's gaming efficiency is lower.

But my question is, does pitcairn suck relatively at compute? If not then it's definitely pretty impressive, as it would appear to have 680 class game efficiency while retaining strong compute ability.

Overall, I think what Nvidia did splitting compute/game was smart, and it will be best for AMD to follow suit (especially if process shrinks are stalling, it will be necessary), and hey, it's time for AMD to copy Nvidia for a change :p

It also might sadly be a necessity for AMD to unveil it's own boost type system, not because it's really better, but because it gives you a benchmark edge for reviews, which as SB has been pointing out even if it's 5%, that's a big deal. But on that front, I think it's best to see how enthusiasts cotton to boost. So far it doesnt seem to be hurting them, the desire to own the card that has the longer bars in all those review graphs far outweighing any user trepidation at the loss of traditional overclocking.


I still dont think 7970 is too bad at all, it's pretty close to 680 despite the compute burden, and again AMD really needs to kick the default clock to 1ghz, that would help close the perceived gap.

My guess is with it's compute part already close to 680, a game centric AMD part would once again trounce Nvidia in perf/mm. It doesn't appear AMD engineering has lost it's superiority.

Edit: well I could have googled it before asking but it does appear pitcairn's compute is just fine. Impressive indeed then http://images.anandtech.com/graphs/graph5699/45164.png

Pitcairn is just the same as Tahiti, it has the same Compute Units with the scheduler, the same memory hierarchy (scaled, of course) but simply lacks ECC and has slower DP support. In SP workloads it should perform just like Tahiti (proportionately, obviously).
 
Last edited by a moderator:
My guess is with it's compute part already close to 680, a game centric AMD part would once again trounce Nvidia in perf/mm. It doesn't appear AMD engineering has lost it's superiority.
An AMD gaming centric part , would require going back to VLIW5 arch , and We are not sure that design is able to properly utilize resources beyond a certain limit , giving that the ALU count is contstanly on the rise , while other aspects like software are not .
 
Just saw this...(Nvidia future roadmap) http://forums.overclockers.co.uk/showpost.php?p=21532597&postcount=29

Week old post but new to me.

Not very exciting if true. Maybe I'll pull the trigger on that 7850 then.

Wish it had projected USA prices though instead of just UK ones, as I'm not sure what exactly those translate too. I'm assuming roughly straight 1:1 conversion to dollars though.

Edit: appears USA dollar prices are somewhat higher, so it's even worse. >> http://videocardz.com/31551/geforce-600-roadmap-partially-exposed-gtx-670-ti-coming-in-may
 
Back
Top