NVIDIA Kepler speculation thread

Who knows, but if Charlie, Fuad and Kyle are all saying very similar things then it's probably not very far from the truth. Either that or Nvidia has pulled off the mother of all smokescreens.

I can only believe so much that I want to believe. The logic is telling me that Nvidia is going to win this round by a large margin unfortunately. It might not be R300 in absolute performance terms but in terms of swing from AMD to Nvidia it might even surpass that.

If you're an AMD fanboy (like me) it might be time to start baking that humble pie. :p

Read again what I wrote. I believe the data. I believe GK104 will end up equal or faster than HD7970. :p
I just find it "wrong" saying that Kyle is benching the cards, while thats not really what he is saying. Its like me saying to a friend that a common friend could be cheating with his girlfriend, just because I saw both having a coffee together. :LOL:

As far as fanboyisms are concerned, you got the wrong guy. If anything I am NVIDIA fanboy since I do like Physx and prefered a GTS450 over an HD5770 because of it ;) Batman AA sold me :D
 
I don't see how AMD can fail to win in the bandwidth limited case (e.g. at least some compute tasks). I guess NV could still win in other areas through better utilization, but I dunno - I get the feeling it's going to be a tight race.

Yep, that's probably why it's winning in synthetics that are really testing bandwidth.

I wouldn't be surprised to see a reversal of fortunes in many games however. Tahiti might beat out GK104 in Dirt 3 and BF3 for example while Metro and AvP throw up the complete opposite of what we've been seeing over the past 18 months - 2 years.

We might see a few cases where the 680 is uncomfortably close to the 580 as well because of this.
 
Read again what I wrote. I believe the data. I believe GK104 will end up equal or faster than HD7970. :p
I just find it "wrong" saying that Kyle is benching the cards, while thats not really what he is saying. Its like me saying to a friend that a common friend could be cheating with his girlfriend, just because I saw both having a coffee together. :LOL:

Yeah I must have got confused because I said the same thing, that is Kyle is NOT benchmarking the card.
 
Gipsel said:
Yes, exactly.
Let me guess: you used to correct everyone who celebrated 2000 instead of 2001 as the new millennium.

The thing is: we all know here that GPUs are SIMD machines at their core. Really, we do. No need to be condescending about it.

But the fact of the matter is that, within those constraints, there is a lot of variability of architectures that can quite nicely be explained with CPU terminology.

IMHO, it is perfectly justifiable to use scalar and VILW terms when looking at it from the instruction set point of view: Nvidia instructions would map straight onto a canonical scalar CPU. Pre-Tahiti AMD instruction would do the same for VILW machines.

And, yes, that doesn't take into account those instructions that operate across a warp, as you most certainly will feel compelled to helpfully point out.
 
Fermi actually does issue instruction(s) from 2 threads (Warps) simultaneously to one core (aka SM). So it does simultaneous multithreading (what intel calls hyperthreading). AMDs VLIW architecures do fine grained temporal multithreading of two threads (wavefronts switch every 4 cycles, i.e. after each instruction).

Right, depends on how you want to slice the analogy. Within an SM you do have two independent schedulers but they cant arbitrarily issue to all execution units like a hyperthreading CPU can. Even in GF100 the schedulers shared execution units (L/S, TMUs, etc) and could therefore be considered hyperthreaded by that definition.

@Arun, yes of course I recalled GF104 but didn't dare mention the "s word" that would incur Gipsel's wrath. Looks like it was unleashed anyway :LOL:
 
The logic is telling me that Nvidia is going to win this round by a large margin unfortunately.
Why is that unfortunate?

I would think that GPU users would rejoice in that performance increases per $$$ spent.

With AMD currently gouging buyers they will have to reduce pricing thus every new buyer wins.

As for being an admitted AMD fanboi do you really like paying the current high prices that AMD is charging?
 
The thing is: we all know here that GPUs are SIMD machines at their core. Really, we do. No need to be condescending about it.
Actually, I wasn't condescending. Don't know where you got that from. I was just giving a reason, why it is bad to call a GPU scalar, while it is in fact an SIMD machine operating on vectors as we all know. There is no need to do that and it completely misrepresents the underlying hardware. Appears counter-intuitive to me. ;)
But the fact of the matter is that, within those constraints, there is a lot of variability of architectures that can quite nicely be explained with CPU terminology.
Yeah, but one should simply apply it coherently. And calling SIMD units scalar is not using CPU terms, it is wrong.
[edit] And it confuses some newbies to GPU programming like hell. They are sometimes wondering why something is extremely slow and after pointing out some basic stuff like warp divergence they didn't took into account one can often hear something like "but my nV GPU is a scalar architecture and each thread is independently executed" or something along this lines. That's the reason for my "crusade". ;)
[/edit]
IMHO, it is perfectly justifiable to use scalar and VILW terms when looking at it from the instruction set point of view:
AMD's pre-GCN GPUs are actually VLIW, just a combination of VLIW and SIMD, that's orthogonal. Scalar isn't orthogonal to vector, it's basically the opposite. That's the whole reason it doesn't make sense. In my opinion, there is no need to invent a new and contradictory scheme of designations just for GPUs while one can simply use the established terminology in use for decades already.
Nvidia instructions would map straight onto a canonical scalar CPU. Pre-Tahiti AMD instruction would do the same for VILW machines.

And, yes, that doesn't take into account those instructions that operate across a warp, as you most certainly will feel compelled to helpfully point out.
That's actually an inherent property of all SIMD or vector instruction sets. One just have to loop over the vector (aka Warp/Wavefront). That's no argument.
And did I feel a little bit of condescence in your last sentence? :LOL:


Right, depends on how you want to slice the analogy. Within an SM you do have two independent schedulers but they cant arbitrarily issue to all execution units like a hyperthreading CPU can.
No? I wouldn't be so sure. How is the issue to the three vALUs in a GF104 style SM working? Or for DP instructions?
 
Last edited by a moderator:
Why is that unfortunate?

I would think that GPU users would rejoice in that performance increases per $$$ spent.

With AMD currently gouging buyers they will have to reduce pricing thus every new buyer wins.

As for being an admitted AMD fanboi do you really like paying the current high prices that AMD is charging?

It's unfortunate because there is absolutely no reason to believe that Nvidia will not charge even higher. Believing otherwise goes completely against the history of the company, and giving the card the 680 moniker should be proof enough of how it's going to be priced. Just as a reminder the 580 gtx is still ~$500.
 
IMHO as long as performance is increasing a good clip and there are multiple competitors, I'm very, very happy.
 
It's unfortunate because there is absolutely no reason to believe that Nvidia will not charge even higher. Believing otherwise goes completely against the history of the company, and giving the card the 680 moniker should be proof enough of how it's going to be priced. Just as a reminder the 580 gtx is still ~$500.

So if it was the 585 you'd be happier? Marketing 101 "new and improved"
The 7970 isn't gauging by any objective measure. It's $600 and substantially faster than the current $500 part.
Sure there's a possibility that the 680 comes out faster than the 7970 at $700, but then AMD can drop to $500 and slaughter the 680 if it's <=1.2x the 7970.
 
Actually, I wasn't condescending. Don't know where you got that from. I was just giving a reason, why it is bad to call a GPU scalar, while it is in fact an SIMD machine operating on vectors as we all know. There is no need to do that and it completely misrepresents the underlying hardware. Appears counter-intuitive to me. ;)
Yeah, but one should simply apply it coherently. And calling SIMD units scalar is not using CPU terms, it is wrong.
[edit] And it confuses some newbies to GPU programming like hell. They are sometimes wondering why something is extremely slow and after pointing out some basic stuff like warp divergence they didn't took into account one can often hear something like "but my nV GPU is a scalar architecture and each thread is independently executed" or something along this lines. That's the reason for my "crusade". ;)
[/edit]
AMD's pre-GCN GPUs are actually VLIW, just a combination of VLIW and SIMD, that's orthogonal. Scalar isn't orthogonal to vector, it's basically the opposite. That's the whole reason it doesn't make sense. In my opinion, there is no need to invent a new and contradictory scheme of designations just for GPUs while one can simply use the established terminology in use for decades already.
That's actually an inherent property of all SIMD or vector instruction sets. One just have to loop over the vector (aka Warp/Wavefront). That's no argument.
And did I feel a little bit of condescence in your last sentence? :LOL:


No? I wouldn't be so sure. How is the issue to the three vALUs in a GF104 style SM working? Or for DP instructions?

AMD was vector horizontally and VLIW vertically and is now vector horizontally and scalar vertically, ok?
 
Appears counter-intuitive to me. ;)

I think you're overestimating how useful such a pedantic stance is to the average person. Do you have a better suggestion for a simple way of describing the difference between nVidia's and AMD's instruction issue since G80/R600? SIMD vs VLIW SIMD isn't nearly as illustrative IMO.

No? I wouldn't be so sure. How is the issue to the three vALUs in a GF104 style SM working? Or for DP instructions?

The ability to issue to a shared set of execution units from multiple warps isn't unique to GF104. GF100 does the same thing. The SFU, TMU, Load/Store units all count as execution units. The extra dispatch units in GF104 introduce ILP within a warp, not hyperthreading between warps.

It's unfortunate because there is absolutely no reason to believe that Nvidia will not charge even higher. Believing otherwise goes completely against the history of the company, and giving the card the 680 moniker should be proof enough of how it's going to be priced. Just as a reminder the 580 gtx is still ~$500.

Yeah, current 580 pricing doesn't bode well at all for a cheap 680 in two weeks. If it was named 660 Ti or something I would feel differently but it looks like GK104 is going to wear the flagship hat and that means big bucks.
 
I think you're overestimating how useful such a pedantic stance is to the average person. Do you have a better suggestion for a simple way of describing the difference between nVidia's and AMD's instruction issue since G80/R600? SIMD vs VLIW SIMD isn't nearly as illustrative IMO.
I think you may underestimate the degree of confusion for an average person. The average person actually does not know that GPUs are SIMD machines. So "SIMD" is way more illustrative than "scalar" to start with.
Nvidia uses (dynamically scheduled) single or dual issue SIMD architectures, AMD used (statically scheduled) VLIW4 or VLIW5 SIMD architectures. Both are massively multithreaded on multiple levels (and using slightly different methods). In my opinion it doesn't get much more descriptive and illustrative without describing it in detail.

IIRC, there was already a discussion about the terminology almost a year ago or so. I don't see many new arguments from either side. So I guess this will be my last post to this (off)topic. Just to condense my point of view to a short sentence: SIMD can't be scalar because it operates on vectors by definition.


The ability to issue to a shared set of execution units from multiple warps isn't unique to GF104. GF100 does the same thing. The SFU, TMU, Load/Store units all count as execution units. The extra dispatch units in GF104 introduce ILP within a warp, not hyperthreading between warps.
I know. The GF104 with the 3 vALUs was just one example (DP being another also applicable to GF100) that each scheduler may be able to schedule instruction to all attached functional units (some arbitration has to take place to avoid collisions of course, but that doesn't matter for the point, in a CPU the scheduler also does this arbitration), not just a subset as you implied.
 
I think you may underestimate the degree of confusion for an average person. The average person actually does not know that GPUs are SIMD machines. So "SIMD" is way more illustrative than "scalar" to start with.
Nvidia uses (dynamically scheduled) single or dual issue SIMD architectures, AMD used (statically scheduled) VLIW4 or VLIW5 SIMD architectures. Both are massively multithreaded on multiple levels (and using slightly different methods). In my opinion it doesn't get much more descriptive and illustrative without describing it in detail.

It's a matter of where you start from. The average person doesn't know SIMD at all. If people know anything at all about architecture, it's how a plain vanilla Hennessy & Patterson scalar RISC CPU works and the kind of instructions they use.

If you're lucky, they know how it differs from a VLIW machine like Itanium.

The quickest way to explain a GPU shader core is start with a CPU, glue 16 together that all execute the same instruction in lockstep. Tadaaa: You've explained SIMD. And since they already know the difference between the elemental RISC and VLIW instructions, you get the difference between Nvidia and AMD GPUs added for free.

And then you can add the caveats about thread divergence etc.

Just to condense my point of view to a short sentence: SIMD can't be scalar because it operates on vectors by definition.
Sure. But if you want to discuss the architectural differences between the Nvidia and former AMD GPUs, you can discuss major parts of it by forgetting about the SIMD concept entirely and treating each SIMD lane as an individual CPU. A scalar instruction set vs a VLIW instruction set applied to each thread. And since it's a drag to add 'instruction set' as a qualifier, you just drop it because the context is obvious.
And thus you get: scalar SIMD and VLIW SIMD. Equally condensed. And everybody is just as wise.
 
As you mention Henessey and Patterson, later editions added a chapter about GPUs. So if that is our reference, let's just agree to the terms used there.
 
So if it was the 585 you'd be happier? Marketing 101 "new and improved"
The 7970 isn't gauging by any objective measure. It's $600 and substantially faster than the current $500 part.
Sure there's a possibility that the 680 comes out faster than the 7970 at $700, but then AMD can drop to $500 and slaughter the 680 if it's <=1.2x the 7970.

Not really if the rumors of the die size are true. Nvidia has always had better margins then AMD and that's with bigger chips. Similarly sized chips would exacerbate that further and with AMD's CPU division not exactly lighting the world on fire the last thing they want to do is get in a price war with Nvidia killing their GPU margins.
 
Yeah, current 580 pricing doesn't bode well at all for a cheap 680 in two weeks. If it was named 660 Ti or something I would feel differently but it looks like GK104 is going to wear the flagship hat and that means big bucks.

Is it completely out of the question, to have a very well priced GTX 680, if Nvidia is planning to release the GK110 hardware with a GTX 7XX name?

If GK104 can really trade blows with Tahiti, even if it loses, but comes at a more reasonable price (the 1GB GDDR5 edit->less should also help with that), Nvidia could play the good cop for this round, while still maintaining the option to bring 7XX GK110s, with a heftier premium. Everyone in the IT industry is hasty to increase their product's numbers anyway.
 
Last edited by a moderator:
Back
Top