If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#476 | |
|
Senior Member
|
Quote:
|
|
|
|
|
|
|
#477 | |
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
Quote:
If there's anything that might widen the gap again, which CPUs can't implement, I haven't heard about it yet and I'm open to learning all about it. Fair enough, but Haswell still won't be up against a 4x faster GCN. |
|
|
|
|
|
|
#478 |
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
Sandy Bridge-E will feature a quad-channel memory controller. And DDR4 guarantees bandwidth to scale for years to come.
|
|
|
|
|
|
#479 | ||
|
Senior Member
|
Quote:
Quote:
|
||
|
|
|
|
|
#480 | ||
|
Senior Member
|
Quote:
Quote:
|
||
|
|
|
|
|
#481 | |
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
I know, but to enable running complex code you need a substantial stack size, which means the thread count has to be minimal, which in turn can only be achieved with out-of-order execution.
Quote:
|
|
|
|
|
|
|
#482 | |
|
Senior Member
|
Like I said, all the advantages of OoO are already there.
Quote:
|
|
|
|
|
|
|
#483 | ||||
|
Meh
Join Date: Mar 2004
Location: New York
Posts: 9,809
|
Quote:
Quote:
GPUs have a lot of FF silicon but much of it is active in parallel and not bottlenecking anything. CPUs are more flexible but also slow - it's a simple tradeoff. You say that CPUs don't need to burn a lot of flops emulating FF hardware so where are the software renderers that prove that out? Quote:
Quote:
In general you're being a little too optimistic about CPUs and too dismissive of GPU performance. All current evidence points in the opposite direction. Today's fastest and most expensive CPUs can't render 10 year old games yet we are to believe they are going to catch up in a year or two from now with acceptable performance in modern games? Sorry, it's just not going to happen
__________________
What the deuce!? |
||||
|
|
|
|
|
#484 | |
|
Meh
Join Date: Mar 2004
Location: New York
Posts: 9,809
|
Quote:
For OoO execution, nVidia is already most of the way there. I wouldn't be surprised to learn that arithmetic instructions can issue out of order in a narrow window. The scoreboarding patent didn't put any restrictions on instruction ordering.
__________________
What the deuce!? |
|
|
|
|
|
|
#485 | |
|
Senior Member
|
Quote:
|
|
|
|
|
|
|
#486 |
|
Senior Member
|
You don't need a scoreboard for that. Just a better compiler will do.
|
|
|
|
|
|
#487 | |
|
Senior Member
|
Quote:
|
|
|
|
|
|
|
#488 | |
|
Member
Join Date: Aug 2004
Posts: 244
|
Quote:
|
|
|
|
|
|
|
#489 | |
|
Senior Member
|
One question up front: Does that image depict Double-Precision numbers or single-precision ones? edit: Ah, SGEMM gave it away; single precision it is!
Quote:
I fail to see the point
__________________
English is not my native tongue. Before flaming please consider the possiblity that I did not mean to say what you might have read from my posts. Work| RecreationWarning! This posting may contain unhealthy doses of gross humor, sarcastic remarks and exaggeration! |
|
|
|
|
|
|
#490 | ||
|
Senior Member
Join Date: Jun 2003
Posts: 2,571
|
Quote:
Quote:
And BTW, most of the things you just listed have nothing to do with GPUs but simple fixed function hardware...
__________________
Aaron Spink speaking for myself inc. |
||
|
|
|
|
|
#491 | ||
|
Senior Member
Join Date: Jun 2003
Posts: 2,571
|
Quote:
Quote:
__________________
Aaron Spink speaking for myself inc. |
||
|
|
|
|
|
#492 | |
|
Senior Member
Join Date: Jun 2003
Posts: 2,571
|
Quote:
__________________
Aaron Spink speaking for myself inc. |
|
|
|
|
|
|
#493 | |
|
Senior Member
Join Date: Jun 2003
Posts: 2,571
|
Quote:
__________________
Aaron Spink speaking for myself inc. |
|
|
|
|
|
|
#494 | ||
|
Senior Member
Join Date: Jun 2003
Posts: 2,571
|
Quote:
Quote:
And from a software perspective: GF104 is Scalar
__________________
Aaron Spink speaking for myself inc. |
||
|
|
|
|
|
#495 | ||||
|
Senior Member
Join Date: Jun 2003
Posts: 2,571
|
Quote:
Quote:
Quote:
Quote:
__________________
Aaron Spink speaking for myself inc. |
||||
|
|
|
|
|
#496 |
|
Senior Member
Join Date: Jun 2003
Posts: 2,571
|
Not really. a large part of it is due to programming model interactions/issues. GPUs are very much inflexible and un-dynamic.
__________________
Aaron Spink speaking for myself inc. |
|
|
|
|
|
#497 |
|
Senior Member
Join Date: Jun 2003
Posts: 2,571
|
IBM doesn't use 4 channels. They use closer to "16" channels on Power7.
__________________
Aaron Spink speaking for myself inc. |
|
|
|
|
|
#498 | |
|
Senior Member
Join Date: Jun 2003
Posts: 2,571
|
Quote:
Or to put it simpler: CPU Flops >> GPU Flops.
__________________
Aaron Spink speaking for myself inc. |
|
|
|
|
|
|
#499 |
|
Senior Member
Join Date: Jun 2003
Posts: 2,571
|
I would say that this is at best speculation. Using the same viewpoint, I could argue that CPUs maintains millions of active thread contexts.
__________________
Aaron Spink speaking for myself inc. |
|
|
|
|
|
#500 | |
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
No they're not. Lots of workloads look something like this:
Code:
sequential code
repeat N times
{
independent iterations
}
sequential code
repeat M times
{
dependent iterations
}
...
So even in the best case scenario for the GPU, with laboriously tuned code, out-of-order execution fares better. For sequential code or loops with dependencies, GPUs slow down to a crawl, and no amount of in-order cores can turn that around. Only out-of-order execution can. Quote:
So if in your opinion GPUs should get a more flexible memory hierarchy like Larrabee, it looks like the compute density will go down. |
|
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|