Multiple GPU cores

zeckensack said:
olivier said:
cell is powerfull dont forget !!!! so i think you are a little hard with it ... it should be attt least powerfull like a radeon 7000 :p and with the feature of a voodoo 1 !!
Why should Cell be any more powerful than any other stream processor with the same transistor budget?

Because it has a cute name? Because IBM designers can ignore the laws of physics while others cannot?

humm i was sarcastic :p
im pretty bored about the cell hype !
and i totaly agree with you law of physics should be the same even for ibm , toshiba and sony !!!!
 
zeckensack said:
I want to know why some people believe that a given amount of transistors can produce higher arithmetic throughput than would otherwise be imaginable just because these transistors form a design known as Cell.

Why do some people think that a given amount of transistors can produce a higher arithmetic throughput than would otherwise be imaginable just because these transistors form a design known as NV3x?

Why can a given amount of instructions produce a faster result than would otherwise be imaginabke just because these instructions form an algorithm that is O(log n)?

Throughput is extremely dependent on architecture. To suggest that all that matters is the number of transistors available is the irrational statement. The NV3x had more transistors than its competitors, but was much slower.

Sure, anyone can design a small CPU core, and slap 8 of them together on a single die. Do you think it's just that easy, and that the buses, cache logic, control logic, power distribution, pipeline design, et al, make no difference whatsoever?

it is the *arrangement* of those transistors that make the difference, not their mere number. And there are an uncountable number of design solutions in design-space that must be searched for, most of them non-optimal, which is why design is hard work and expensive.
 
olivier said:
humm i was sarcastic :p
im pretty bored about the cell hype !
and i totaly agree with you law of physics should be the same even for ibm , toshiba and sony !!!!
Oh my, sorry then. I should have kept my mouth shut. Too late for that now :|

DemoCoder said:
zeckensack said:
I want to know why some people believe that a given amount of transistors can produce higher arithmetic throughput than would otherwise be imaginable just because these transistors form a design known as Cell.

Why do some people think that a given amount of transistors can produce a higher arithmetic throughput than would otherwise be imaginable just because these transistors form a design known as NV3x?
Nobody believes that :D
DemoCoder said:
Why can a given amount of instructions produce a faster result than would otherwise be imaginabke just because these instructions form an algorithm that is O(log n)?
I've specifically used the fp multiplier example. I've specifically and consistently, even in the small snippet you just quoted, referred to arithmetic throughput. Which is exactly what Cell hysteria is about, and what FLOPs/s -- which is the #1 official message regarding Cell -- express.

I don't think there's much room left for improvement in arithmetic circuits for a given number of bits. Not anywhere, not at IBM. This has long been researched to death and everyone knows and applies the same tricks. Which was my point.
DemoCoder said:
Throughput is extremely dependent on architecture. To suggest that all that matters is the number of transistors available is the irrational statement. The NV3x had more transistors than its competitors, but was much slower.
First, I didn't. I recognize there's a dependency between size and clock speed headroom. I wrote that. And I better add process technology and voltage if you're trying to be picky here.

Second, your example is pretty cool. It's not hard to waste transistors, everyone can do that. I actually have a few dozen discrete bipolar transistors on my shelf that do exactly nothing because they're not part of any circuit. See?

The hard part is doing it the other way 'round. I could think of an appropriate car analogy if it becomes necessary :)

DemoCoder said:
Sure, anyone can design a small CPU core, and slap 8 of them together on a single die. Do you think it's just that easy, and that the buses, cache logic, control logic, power distribution, pipeline design, et al, make no difference whatsoever?

it is the *arrangement* of those transistors that make the difference, not their mere number. And there are an uncountable number of design solutions in design-space that must be searched for, most of them non-optimal, which is why design is hard work and expensive.
Did I somewhere say that it was easy to design Cell?
 
Well, as far as I can see, the Cell architecture is nice, but isn't really fundamentally different from any of the multi-core solutions that other companies are supposed to be releasing soon. There is some nice potential in a Cell-like architecture, though, in that you can make many more "dumb" processors, that is, processors that are optimized for transistor count instead of IPC. In doing this, your total peak processor power will increase significantly, and therefore if software can take advantage of the large number of processing units, said software could run many times faster than on an alternative single-threaded design.

This is nice for a console because there's no need for legacy support (usually....). For the PC things will move a bit more slowly, but are heading in the same basic direction. I expect that in the mid-far future (say, 5-10 years), assuming we're still making use of silicon-based designs (by that time we'll either be migrating to new technologies, or there will be incredible pressure to do so), we'll have a sort of hybrid architecture, one that combines parallelism with high IPC processors. How would this be done? A simple way would be to have one "core" processor that is designed for high IPC and high clock speeds, as well as a number of periphery processors that are optimized for low transistor count per instruction executed.

Such an architecture would still be capable of running your single-threaded applications at high speeds, but would be capable of running multi-threaded apps many times faster than the single processor. It would also be amenable to entirely software-based parallelism (as would be desireable due to the legacy of the x86 architecture and the Windows OS/software paradigm), as you'd simply give processes to the high IPC core first, and to the periphary cores as much as possible. Future instruction hints may allow specialization of these periphery processors and give hints to the scheduler as to where tasks should be sent.
 
Cell is the only processor of it's kind designed by an IDM who designs high performance processors, custom dynamic logic still makes a difference. There wont be anything like Cell produced in the same kind of volumes as Cell for quite some time, the only markets capable of pushing processors like this in volume are GPUs and consoles ... and GPUs are sticking to far more limited programming models for now.

Cell is special.
 
Back
Top