Is there something that CELL can still do better than modern CPU/GPU

Xenus · Oct 25, 2010

Efficiency versus flexibility assuming you mean separate designs for cpu/gpu. if you meant separate chips it's all about manufacturability aka it is easier to produce 2 smaller chips then one large monolithic one.

Crossbar · Oct 25, 2010

Xenus said:
Efficiency versus flexibility assuming you mean separate designs for cpu/gpu. if you meant separate chips it's all about manufacturability aka it is easier to produce 2 smaller chips then one large monolithic one.

I am not sure what you mean by efficiency versus flexibility, all GPU designs seem to be heading towards more fexibility. Could you elaborate?

Concerning manufacturability, it is even easier to produce two identical chips instead of two separate designs, i.e. Sony could put two of them in the PS4 as I suggested earlier.

Shifty Geezer · Oct 25, 2010

Crossbar said:
I am not sure what you mean by efficiency versus flexibility, all GPU designs seem to be heading towards more fexibility. Could you elaborate?

I think Xenus means a single core is more flexible if it's a super-versatile chip, but performance will be less than the equivalent of two discrete, more specialised chips. eg. Larrabee was more flexible but slower than a discrete GPU designed just for graphics.

upnorthsox · Oct 25, 2010

Crossbar said:
I am not sure what you mean by efficiency versus flexibility, all GPU designs seem to be heading towards more fexibility. Could you elaborate?

Concerning manufacturability, it is even easier to produce two identical chips instead of two separate designs, i.e. Sony could put two of them in the PS4 as I suggested earlier.

IMO it's efficiency and flexibiity vs initial design costs and yields with yields being the greast negator. Poor yields are just hard to get around no matter what benefits you may derive from other factors.
The hardest part is you need to commit to the process before you even know what ballpark your yields will be in. Because of this, whatever knowledge you have in manufacturing these components is vital. You certainly wouldn't want to be wedding a cpu with a gpu on a process node where everything is new to you.

Crossbar · Oct 25, 2010

Shifty Geezer said:
I think Xenus means a single core is more flexible if it's a super-versatile chip, but performance will be less than the equivalent of two discrete, more specialised chips. eg. Larrabee was more flexible but slower than a discrete GPU designed just for graphics.

I agree to a certain degree, but these kind of comparisons are a bit slippery because we don´t know how well Larrabee could have been with some tailor made graphics routines optimised to its strength. Running some bog standard DX ?.? performance tests and compare the results to some GPU hardware that has been tailor made to support that DX version may not do Larrabee complete justice. Larrabee is just an example, I don´t know the exact reasons why it was postponed.

Then we also have the efficiency consideration that it is easier to have a more continous utlisation and a more evenly distributed load through general purpose hardware than some specialised hardware that can be used only for some special type of calculations.

Deciding what is most efficient isn´t always easy, that general purpose hardware have less ALUs idling may compensate that is not as fast as dedicated hardware for certain tasks.

Crossbar · Oct 25, 2010

upnorthsox said:
IMO it's efficiency and flexibiity vs initial design costs and yields with yields being the greast negator. Poor yields are just hard to get around no matter what benefits you may derive from other factors.
The hardest part is you need to commit to the process before you even know what ballpark your yields will be in. Because of this, whatever knowledge you have in manufacturing these components is vital. You certainly wouldn't want to be wedding a cpu with a gpu on a process node where everything is new to you.

Wedding a CPU with GPU can be done in different ways, We haven´t seen everything yet. Just adding a tiny ARM core to Fermi would make Fermi capable of booting Linux, you could add some texture units to Cell and it would become a more versatile graphics processor. They would not be perfect but they, but neither is paring a bog standard CPU with a bog standard GPU on the same die.

Just wanted to say that there are many ways to skin a cat.

Npl · Oct 25, 2010

Crossbar said:
Wedding a CPU with GPU can be done in different ways, We haven´t seen everything yet. Just adding a tiny ARM core to Fermi would make Fermi capable of booting Linux, you could add some texture units to Cell and it would become a more versatile graphics processor. They would not be perfect but they, but neither is paring a bog standard CPU with a bog standard GPU on the same die.

Just wanted to say that there are many ways to skin a cat.

If you want competitive performance, the GPU will have to be reasonably big. I could expect adding a few tiny CPU cores, but a beefed up Cell wouldnt be tiny. In terms of cost you`d rather have 2 chips than 1 obscenely big one.
The only question is if you want 1 CPU and 1 GPU or 2 hybrids, Im not sure if you meant that - it sounds like you would want a single chip. Separate GPU/CPU would have the advantage that you can have large cache and low-latency memory for the CPU and high ban width memory for the GPU, having memory thats both low-latency and high-bandwith aint feasible.

If a PS4 would come out the next year it would have separate CPU/GPU.... but there are a few things Im not certain of in the longer term.
There could be some killer-apps that would require a Hybrid and wont run comparable nice on a Cell(2), Im sure stuff like physics will benefit from hybrids but not to what extend. Which could mean 2 hybrids would be the better solution
Quite possible in a few years you will be more limited by powerdraw than die-size (in a small form-factor like console), which means you fill your powerbudget with rather small chips - at which point you just could use one hybrid. (But if power and thus efficiency is an issue you might aswell be better of with diverse specialized units instead of similar complex ones. rather have half of your chip idle all the time and the other half doing the best with the power)

Crossbar · Oct 25, 2010

Npl said:
If you want competitive performance, the GPU will have to be reasonably big. I could expect adding a few tiny CPU cores, but a beefed up Cell wouldnt be tiny. In terms of cost you`d rather have 2 chips than 1 obscenely big one.
The only question is if you want 1 CPU and 1 GPU or 2 hybrids, Im not sure if you meant that - it sounds like you would want a single chip. Separate GPU/CPU would have the advantage that you can have large cache and low-latency memory for the CPU and high ban width memory for the GPU, having memory thats both low-latency and high-bandwith aint feasible.

Yes I was thinking in terms of two hybrids, maybe I was unclear about that.

Npl said:
If a PS4 would come out the next year it would have separate CPU/GPU.... but there are a few things Im not certain of in the longer term.
There could be some killer-apps that would require a Hybrid and wont run comparable nice on a Cell(2), Im sure stuff like physics will benefit from hybrids but not to what extend. Which could mean 2 hybrids would be the better solution
Quite possible in a few years you will be more limited by powerdraw than die-size (in a small form-factor like console), which means you fill your powerbudget with rather small chips - at which point you just could use one hybrid. (But if power and thus efficiency is an issue you might aswell be better of with diverse specialized units instead of similar complex ones. rather have half of your chip idle all the time and the other half doing the best with the power)

That is an interesting point, do you have some certain reasons do believe this? Is the SemiConductor industry expecting that power draw will stop scaling down as they approach 20-22 nm litography?

liolio · Oct 25, 2010

IT's already the case, some members here gave various numbers on the matters, it's "gross" approximation but that gives an idea. One figure I remember was a shrink will allow for 50% more transistors per mm² but only a decrease 20% in power usage per transistors, so as you cram more and more transistors the power goes up. The trend is pretty clear and not new in the GPUs realm.

Crossbar · Oct 25, 2010

liolio said:
IT's already the case, some members here gave various numbers on the matters, it's "gross" approximation but that gives an idea. One figure I remember was a shrink will allow for 50% more transistors per mm² but only a decrease 20% in power usage per transistors, so as you cram more and more transistors the power goes up. The trend is pretty clear and not new in the GPUs realm.

Thanks, I have not been paying close attention to those discussions.

Let´s hope Graphene will change those trends.

Shifty Geezer · Oct 25, 2010

Crossbar said:
I agree to a certain degree, but these kind of comparisons are a bit slippery because we don´t know how well Larrabee could have been with some tailor made graphics routines optimised to its strength.

No arguments from me - that's my view!

ihamoitc2005 · Oct 27, 2010

God of War MLAA

Removing MSAA from GPU (RSX) saves 5ms.

Cost: 5 SPU @ 4ms + scheduling overhead

Benefit: 17.6% extra performance available by RSX for other rendering task at 30hz.

http://www.realtimerendering.com/blog/morphological-antialiasing-in-god-of-war-iii/

jonabbey · Dec 22, 2010

Sony buys back Toshiba's Cell plant for 50 billion yen, makes a killing and plans a CMOS fab -- from Engadget

Sony's bought back the Cell fab they sold to Toshiba, for less than Toshiba originally paid for it. Sony apparently told Nikkei Business Daily that they were looking to make imaging semiconductors with the plant.

patsu · Dec 23, 2010

Ha ha, since all major video streamers, WebKit, games, 3D Blu-ray, DTS, Dolby, DVR, DLNA client, RemotePlay, camera and sensor based tools run on the Cell chip now; may be they can sell Built-to-Order and upgradable HT gears (Everything done using Cell software) ? ^_^

corduroygt · Dec 23, 2010

A SPE is 7 million transistors without the LS. 8 of them makes 56M transistors, or about the same number of transistors intel uses for x86 decoding on a core processor. With all the sunk costs for SPE programming, maybe 8 SPU's could be Sony's version of x86, they'll put it in the chip to use their existing programming work and libraries and to still have all PS3 software compatible with PS4.

That's assuming they can make part of the cache (2M) reconfigurable as 8 local stores when the spu's are used, or have the option to use it as a part of a larger cache pool when the next gen enchanced PPU's are being predominantly used.

Silent_Buddha · Dec 23, 2010

patsu said:
Ha ha, since all major video streamers, WebKit, games, 3D Blu-ray, DTS, Dolby, DVR, DLNA client, RemotePlay, camera and sensor based tools run on the Cell chip now; may be they can sell Built-to-Order and upgradable HT gears (Everything done using Cell software) ? ^_^

They could, but it sounds like Sony plan on converting that Fab into CMOS chip production for HD camera's.

What other plants are there still making Cell chips?

Regards,
SB

patsu · Dec 23, 2010

Outsourced to TSMC and may be IBM.

jonabbey · Dec 23, 2010

corduroygt said:
A SPE is 7 million transistors without the LS. 8 of them makes 56M transistors, or about the same number of transistors intel uses for x86 decoding on a core processor. With all the sunk costs for SPE programming, maybe 8 SPU's could be Sony's version of x86, they'll put it in the chip to use their existing programming work and libraries and to still have all PS3 software compatible with PS4.

That's assuming they can make part of the cache (2M) reconfigurable as 8 local stores when the spu's are used, or have the option to use it as a part of a larger cache pool when the next gen enchanced PPU's are being predominantly used.

It would be interesting if that sort of thing was what IBM were looking at with their 'Cell-related work for next-gen consoles', but I don't know if they've got the ability to make cache behave identically to Cell's local store, given the extremely short cycle time the LS has.

Can't wait until we start hearing real details about the next gen.

patsu · Dec 23, 2010

Yap, should be hard to substitute Local Store for cache at the same performance level.

*If* Cell is involved nextgen, then IMHO, Sony has much to gain to spread its use as much as possible this gen.

corduroygt · Dec 23, 2010

LS latencies are right between L1 and L2 cache latencies of intel x86 processors. So it could be treated as a sort of L1 cache (that's huge) and L2 cache could be off chip in the form of edram.

Is there something that CELL can still do better than modern CPU/GPU

Xenus

Crossbar

Shifty Geezer

uber-Troll!

upnorthsox

Crossbar

Crossbar

Npl

Crossbar

liolio

Aquoiboniste

Crossbar

Shifty Geezer

uber-Troll!

ihamoitc2005

jonabbey

patsu

corduroygt

Silent_Buddha

patsu

jonabbey

patsu

corduroygt

Similar threads