Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
Is that your opinion, or is it DOUBLE CONFIRMED?Nah, it'll be sooner than May.
You're on defensive already?Is that your opinion, or is it DOUBLE CONFIRMED?![]()
A few days ago 3dcenter published some info about Kepler family based on a rumoured specification.
I don't get the connection to the instruction latency.Maybe I'm just daft but still not seeing the difference between operand fetch from the register file and other memory operations. The whole point of buffering is to decouple operand fetch from the execution pipeline.
For a given problem the problem gets more and more critical the wider the GPU gets, i.e. the less threads one has per vecALU.It seems like a small issue compared to all the other latency hiding GPUs do anyway.
I don't get the connection to the instruction latency.
The operand buffering enables that other instruction may overtake there (in-order issue and out-of-order dispatch to the ALUs), i.e. one instruction does not stall another independent one (but it does stall dependent ones until the result is written to the reg file to resolve RAW hazards). It is still in the critical loop defining the latency.
For a given problem the problem gets more and more critical the wider the GPU gets, i.e. the less threads one has per vecALU.
And some constructs (like fine grained control structures) prefer low ALU latencies, because one needs far less threads to fill the bubbles. Just saying one can always throw more threads at the problem isn't exactly true, as this would necessitate the increase of the register files, especially for a heavier threads.
It is quite simple: with scoreboarding, the "read operands" part of the pipeline is solely responsible for resolving RAW hazards.Meh, guess I just won't get it. I see the RAW hazard and operand buffering as two separate operations with only the former being strictly part of the ALU pipeline.
Did you follow what Intel touted as one major advantage of Knights Corner over Fermi? Getting high performance also for smaller problems. In case of smaller matrices in a matrix multiply, the most important thing is not only a lower latency memory system. A lower latency launch of kernels and lower amount of threads (executed with lower latency) is going along the same lines.That theoretical problem would have to be short on both TLP and ILP for this to be a problem even on Fermi. So rather unsuitable for GPUs anyway.
So much for my big bang theory, I would suggest section: physics/rendering.
What's the latest speculation of new NV GPU ?
1024 shaders something ? 512 bit bus ... 4GB @
You do realize that with such specs nV would just have another Fermi in their hands, requiring "2nd gen" to actually get fully working chips out too![]()
Repeating something doesn't make it true.Kaotik said:You do realize that with such specs nV would just have another Fermi in their hands, requiring "2nd gen" to actually get fully working chips out too
Repeating something doesn't make it true.
Actually, thats true on the Internet
"1024 shaders and 512-bit" say nothing about how Fermi-like Kepler will be. Numbers are meaningless without context.
You're right - that was assuming similar architecture to that of Fermi, where biggest differences are made for DP purposes, and gaming wise it would stay more or less the same - of course counts scaled up due process shrink.
But it's quite clear that when one speculates "1024 shaders" he's not going to think nV will suddenly have, say, 1024 VLIW4 shaders similar to those in Cayman or something, but continue on the same scalar route as before, where the single "CUDA core" will stay quite similar to that of Fermi.
Only things I've seen so far being suggested as major change is removing the hotclock, which would allow nVidia to do smaller chips to my understanding (with same cuda core counts), but at the same time they would be slower units too.