Cell/CPu architectures as a GPU (Frostbite Spinoff)

kagemaru · Mar 15, 2011

Lucid_Dreamer said:
It's only a head/face. That looks and reacts like a Heavy Rain character. Of course, I think ND's recent characters look better than that video (not including the eye animations).

lol if you say so. I don't agree at all, especially when it comes to things like shader quality, animation, and how smooth the head looks overall.

I don't want to get into a discussion about heads and how well they are rendered, I was just giving an example to how Sony poured a lot of kool-aid before the PS3 launched.

Lucid_Dreamer said:
According to IBM, those theoretical max numbers are correct. Only one of those was proven to get anywhere near it's theoretical max number. And, I agree. Theoretical peak numbers are almost always irrelevant. In this case, Cell's number was proven relevant.

Yes according to IBM, which got the results from a very controlled environment. A situation or an environment I don't think developers experience often, if at all.

Peak is not the same as sustained performance, so I still don't put much faith in such numbers.

However if you wish to believe pointless tech demos and theoretical speeds just because they favor your console of choice, go nuts. I rather live in reality.

makattack · Mar 15, 2011

3dilettante said:
To clarify one thing, Cell is not a fundamentally new idea. There were heterogenous processors with DSP-like coprocessors done years earlier, and they were abandoned years earlier.
Difficulty in programming and getting decent utilization out of the exotic design seemed to be a major factor.

What? Really? Try telling that to TI and all the other DSP producers out there:

http://focus.ti.com/dsp/docs/dsphome.tsp?sectionId=46&DCMP=TIHomeTracking&HQS=Other+OT+home_p_dsp

Full disclosure: I was a TMS320 DSP developer in 1993-5 working on cryptographic devices.

Sure, it's been a while... but I personally have friends who still program firmware for systems that include DSPs (hi Todd@Phillips: http://www.healthcare.philips.com/main/products/resuscitation/products/aeds/index.wpd)

I should add though, what we worked on were not heterogeneous processors... more like separate, bus connected processors. e.g. a 80286 microprocessor sending commands/data to my module, which runs on the TMS320C26 DSP.

Npl · Mar 15, 2011

3dilettante said:
To clarify one thing, Cell is not a fundamentally new idea. There were heterogenous processors with DSP-like coprocessors done years earlier, and they were abandoned years earlier.
Difficulty in programming and getting decent utilization out of the exotic design seemed to be a major factor.

alot of good designs and ideas failed aswell, but primary because they had no time to prove themself. Having a secured market is quite possible the only way to push new designs and bring them mainstream enough to be not "exotic" anymore.
There are enough examples, x86 (and the even worse x87 FPU) dint make it through the 90` because they were easy to program or efficient but because IBM-compatibles depended on them, multicore dint raise any interest whatsoever until a couple years ago and Im sure wouldve been called a ridiculous bad idea earlier.

3dilettante said:
These were for a different market, however, and did not have a major driver that allowed them to persist.
Sony's largess, a stable of developers, and solid branding allowed at least some money to go into making Cell an ongoing concern, while the earlier attempts had no cash flow or backing to last for long.
I would argue that it also helped that it had RSX to prevent the PS3 from being discarded for all those years where the Cell processors magical programmability was not sufficient.

Cell persists, but it dint got any kind of development (architecture-side). Just because it has some flaws in its first incarnation doesnt invalidate the whole concept. OpenMP is just a recent effort to push such architectures to use and on another note there still isnt a single portable gp language that supports SMP directly (C++0x should be the first, I dont count emulated languages as portable).

3dilettante said:
I'm not sure, Sony's money aside, that the lesson learned is necessarily different.
Cell was a throwback to earlier failures when it was released, and while elements of the design may show up in the future, the heterogenous model with those quirky SPEs has not shown itself to be ideal.

quirky SPUs, ok... but are they fundamentally broken so that you couldnt create a backwards-compatible SPU with better characteristics? And I hope you write this from a PC without GPU since you dont believe in heterogenous models.

3dilettante said:
I had initially predicted that the PS3 would have the most technically advanced AAA 1st party games, and this might be arguably the case for a period of time. It is a few years later than I had thought.
How much of those gymnastics were done out of necessity to achieve acceptable results, I cannot say. I had expected more non-graphical advances with Cell, but with everyone spending 4-5 SPEs on graphics or stuck with the limits of console memory, that lead doesn't seem to be all that great.

None of which you can attribute to Cell, a better GPU and more memory wouldve helped alot.

3dilettante · Mar 15, 2011

makattack said:
What? Really? Try telling that to TI and all the other DSP producers out there:

http://focus.ti.com/dsp/docs/dsphome.tsp?sectionId=46&DCMP=TIHomeTracking&HQS=Other+OT+home_p_dsp

Are those DSP coprocessors in the same memory space as the CPU?
I'll concede that my earlier point was mistaken. I wasn't thinking of SOC solutions in mobile.

I should add though, what we worked on were not heterogeneous processors... more like separate, bus connected processors. e.g. a 80286 microprocessor sending commands/data to my module, which runs on the TMS320C26 DSP.

That may not be a fundamental difference, or at least no more than how a multi-core CPU is different from a multi-socket board, besides location.

Npl said:
quirky SPUs, ok... but are they fundamentally broken so that you couldnt create a backwards-compatible SPU with better characteristics? And I hope you write this from a PC without GPU since you dont believe in heterogenous models.

Backwards compatible seems difficult.
The SPE model works on a private non-coherent space with instructions and data residing in the local store, with DMA to get data from the global memory in batches.
Even GPUs have instruction caches, and their local stores tend to be hundreds of MiB to GiB in size. In practice, these memories are not quite as isolated as they are with the SPE.

None of which you can attribute to Cell, a better GPU and more memory wouldve helped alot.

That's more of a point about the idea that programmability overcomes all.
In this case, by the time Cell's programmability begins to make a difference, the realities of the platform and the age of the design kneecap it.

Arwin · Mar 15, 2011

3dilettante said:
That's more of a point about the idea that programmability overcomes all.
In this case, by the time Cell's programmability begins to make a difference, the realities of the platform and the age of the design kneecap it.

I don't quite agree about the age of the design kneecapping it at all. I'm not clear about what realities of the platform you are talking about?

Npl · Mar 15, 2011

3dilettante said:
Backwards compatible seems difficult.
The SPE model works on a private non-coherent space with instructions and data residing in the local store, with DMA to get data from the global memory in batches.

Increasing the LS should help with fitting code, adding scatter/gather should allow random access to whole system RAM. Quite possible you could allow a configurable part of the LS to act as cache, this can be done in software right now.
scatter/gather would then could look up (optionally) in LS and then the normal cache hierarchy.

3dilettante said:
Even GPUs have instruction caches, and their local stores tend to be hundreds of MiB to GiB in size. In practice, these memories are not quite as isolated as they are with the SPE.

isnt the part of memory that they can access randomly limited to previously defined blocks (textures)?
can they actually access their own code?
AFAIK, no and thus you still are limited to defined areas of memory which are explicitly setup by the host CPU.
SPUs (atleast with the aforementioned extensions) have no disadvantage compared to that, as they can manage themself. The only issue might be that you run out of space for code, then you either need to stream it in, and if thats not possible then maybe it just something that doesnt fit well on the SPU. I doubt anyone argues that its a good idea to run everything on SPUs, but if you`d fix accessing main Ram then they would be very versatile and efficient.

3dilettante said:
That's more of a point about the idea that programmability overcomes all.
In this case, by the time Cell's programmability begins to make a difference, the realities of the platform and the age of the design kneecap it.

Then just is a point that you need specialized hardware for different workloads. A good point for Cell, dont you think?

Lucid_Dreamer · Mar 15, 2011

3dilettante said:
How much of those gymnastics were done out of necessity to achieve acceptable results, I cannot say. I had expected more non-graphical advances with Cell, but with everyone spending 4-5 SPEs on graphics or stuck with the limits of console memory, that lead doesn't seem to be all that great.

I keep hearing how this and that were done out of necessity. I would like to know what isn't done out of necessity to improve things in the gaming world. I'm guessing, not much.

BTW, it's not like they are spending the complete time of 4 or 5 SPUs on graphics.

kagemaru said:
lol if you say so. I don't agree at all, especially when it comes to things like shader quality, animation, and how smooth the head looks overall.

I don't want to get into a discussion about heads and how well they are rendered, I was just giving an example to how Sony poured a lot of kool-aid before the PS3 launched.

Again, it's just a head/face. In other words, the system's resources is only focused on a head. You are saying this was/is not possible on the PS3. I'm saying you seem to be completely incorrect and the face/head rendering from the Heavy Rain crew proves that.

It seems Sony didn't pour Kool-Aid before the PS3 launched. It seems that it was acid.

Yes according to IBM, which got the results from a very controlled environment. A situation or an environment I don't think developers experience often, if at all.

Peak is not the same as sustained performance, so I still don't put much faith in such numbers.

Yes, it was a controlled environment for both CPUs. It's the same test from the same manufactuer on two products. It doesn't get more fairly comparable than that. Yes, sustained performance is another test. However, that points to a rather high number as well. I guess that means we should ignore that too.

However if you wish to believe pointless tech demos and theoretical speeds just because they favor your console of choice, go nuts. I rather live in reality.

Actually, I like as much data as possible. That's the best way to build a picture. I love reality. It's the denial in others that moves me to action. If you want to continue to ignore most of the numbers that don't favor your CPU of choice, go nuts.

i nerini del buio · Mar 15, 2011

Dregun said:
I think the most telling aspect of all of this is how Cell itself changed the mindset of multiplat developers.

Could have sworn I read an article one time about a developer who was struggling with the PS3 to get parity with the 360 version of a game. They started to use the SPU's and during that time they figured they could do the same thing with the 360 and in turn increased the performance of the entire game. Because they had to try and optimise the PS3 version they found they could optimise the 360 version.

.

Bizarre Creation's Blur?

r.i.p. bizarre T_T

kagemaru · Mar 15, 2011

Lucid_Dreamer said:
I keep hearing how this and that were done out of necessity. I would like to know what isn't done out of necessity to improve things in the gaming world. I'm guessing, not much.

BTW, it's not like they are spending the complete time of 4 or 5 SPUs on graphics.

Rewriting code so effects run on Cell instead of the GPU is out of necessity. Using Cell for graphics is necessary to keep pace with the competition which has a stronger GPU. That is the point he's trying to make that you're missing.

Lucid_Dreamer said:
Again, it's just a head/face. In other words, the system's resources is only focused on a head. You are saying this was/is not possible on the PS3. I'm saying you seem to be completely incorrect and the face/head rendering from the Heavy Rain crew proves that.

I played Heavy Rain, loved the demo, but you're wrong. In terms of detail and animation, there is no game out today that has character models that look as good as that tech demo.

Lucid_Dreamer said:
It seems Sony didn't pour Kool-Aid before the PS3 launched. It seems that it was acid.

This doesn't even make any sense. And you're delusional if you think Sony (and their competitors) didn't lay the hype on thick and came up short in the end. IMO Sony played the BS card more than anyone else.

Lucid_Dreamer said:
Yes, it was a controlled environment for both CPUs. It's the same test from the same manufactuer on two products. It doesn't get more fairly comparable than that. Yes, sustained performance is another test. However, that points to a rather high number as well. I guess that means we should ignore that too.

Really what's the rather high number for sustained performance?

I usually ignore such PR numbers because they usually don't directly apply to game applications. You can get performance stats on how fast Cell burns through process X but that stat could be entirely different than process Y that is used in the game engine. Meaning it's pointless, only used by console warriors who care to believe what they are fed.

Lucid_Dreamer said:
Actually, I like as much data as possible. That's the best way to build a picture. I love reality. It's the denial in others that moves me to action. If you want to continue to ignore most of the numbers that don't favor your CPU of choice, go nuts.

That's the thing, I don't have a CPU of choice like you. I think it's silly and a waste of time. Try developing the comprehension to apply this data you hold so dear, maybe then your points will have some merit.

Ruskie · Mar 15, 2011

Since there seems to be an argument is Cell as gpu right way to go(architecture wise) JCs newest interview just popped in.He seems to think so...well,sort of

"The PS3 is still far and away better than anything else that’s ever been made... except maybe the 360," Carmack told games. "It’s a great time to be a developer," he added. "It’s not like working with the Sega Saturn or the PS2, where these are really kind of quirky, cranky, architectures that are not, well, architected, I would say."

“Because I do favour the 360, it doesn’t mean I have anything all that negative to say about the PS3,” he said.

http://www.nowgamer.com/news/5336/carmack-ps3-better-than-anything-except-360

3dilettante · Mar 15, 2011

Arwin said:
I don't quite agree about the age of the design kneecapping it at all. I'm not clear about what realities of the platform you are talking about?

The split RAM pool, the overall amount of RAM, the late implementation of the more advanced streaming engines, the slow improvement of OS RAM usage.
There's an asymmetry in the link bandwidth between RSX and Cell that I am curious about, as I've not seen an explanation as to why.

Then there is the reality of increasing costs and the demand for parity in multiplatform development. Any outsize advantages on a given platform tends to be whittled down.

Then there's the age. Technology has moved on from the point the consoles were set in stone.
There are latency numbers for the PC in the DICE presentation for certain operations that are orders of magnitude lower than either console at default.

Npl said:
Increasing the LS should help with fitting code, adding scatter/gather should allow random access to whole system RAM. Quite possible you could allow a configurable part of the LS to act as cache, this can be done in software right now.

Increased LS size would mean little for backwards compatibility, since old code wouldn't know to address the additional capacity.
Sticking cache coherence to the LS is easier said than done. It's a pretty fundamental part of the memory pipeline, and it would be noticeable from a latency standpoint.
Tightly scheduled SPE code may run on the next-gen SPE, but it would not be faster.
I would think one improvement would be to go to a Harvard architecture internally, but since the LS is explicit in software and freely mixes code and data, I'm not sure what complications may arise in doing so.

isnt the part of memory that they can access randomly limited to previously defined blocks (textures)?

The texture units are capable of standard linear addressing.

can they actually access their own code?

They can cache their own code freely. In standard usage, they don't write to their code, though I am not aware of a fundamental limitation on doing so.

Npl · Mar 15, 2011

3dilettante said:
Increased LS size would mean little for backwards compatibility, since old code wouldn't know to address the additional capacity.
Sticking cache coherence to the LS is easier said than done. It's a pretty fundamental part of the memory pipeline, and it would be noticeable from a latency standpoint.
Tightly scheduled SPE code may run on the next-gen SPE, but it would not be faster.

old code wouldnt use cache, wouldnt be a problem to run it at least as fast as before, so I dont see why this is a problem with BC. Unless you expect code for the new SPU run on the old ones aswell??

Yeah, there would be drawbacks, adding scatter/gather would require interlocking every instruction.
The normal loads/stores would still be restricted to the LS so aslong you have tasks with a largely private working set it should be fine and not deal with high latencies.

3dilettante said:
The texture units are capable of standard linear addressing.

Within their max res AFAIK (4k*4k), thus needs setup from host

3dilettante said:
They can cache their own code freely. In standard usage, they don't write to their code, though I am not aware of a fundamental limitation on doing so.

wont be used bc of lack of defined ISA anyway.

Arwin · Mar 15, 2011

3dilettante said:
The split RAM pool, the overall amount of RAM, the late implementation of the more advanced streaming engines, the slow improvement of OS RAM usage

Of these, I agree with the last one, but I don't feel the others quite so much. They are trade-offs that primarily influence 'ease of development', particularly for multi-platform development, and even more particularly, porting. While I do agree that the reality is that being friendly to developers and friendly to multi-platform development and ports has become more important (this is something that Sony grossly underestimated, and that they feel the pain is borne out strongly by their sea-change for the NGP), the overall amount of RAM is not different, the split RAM pool has bandwidth benefits in various areas, and the implementation of more advanced streaming engines wasn't so much late, as just not implemented because of cross-platform reasons more than anything else (though here too, Microsoft was smarter in providing an easy SDK solution that automatically used implicit streaming/caching to HDD).

I am slighly suspicious however that the way Sony has implemented its harddrive encryption has crippled its harddrive speed somewhat more than would have been desireable, though I haven't heard anyone about this in particular, and I don't know if anyone has done comparative speed-tests of the same harddrive in the PS3 and in the PC.

There's an asymmetry in the link bandwidth between RSX and Cell that I am curious about, as I've not seen an explanation as to why.

Which one are you referring to? Because this has been discussed quite extensively. There are only two standout asymmetrical parts, the 4GB from Cell straight to framebuffer (typically used by the BluRay player type software), and the 16MB connection for Cell initiated reading from GDDR.

Then there is the reality of increasing costs and the demand for parity in multiplatform development. Any outsize advantages on a given platform tends to be whittled down.

Fully agree with this one, as also mentioned partly above.

Then there's the age. Technology has moved on from the point the consoles were set in stone.
There are latency numbers for the PC in the DICE presentation for certain operations that are orders of magnitude lower than either console at default.

Correct, but latency needn't be that important either, as long as it is low enough - throughput matters more. In the rest of your story, you seem to be glossing relatively easily over the various very big issues DICE had with DirectX11, that they have contacted and tried to solve with Microsoft?

As for SPE backward compatibility, software emulation may be easier than expected, because each SPE is a fairly independent unit with a limited set of features and very predictable behaviour. It may end up being quite less of a challenge than EDRAM posed for PS2/PS3 backward compatibility.

Lucid_Dreamer · Mar 15, 2011

kagemaru said:
Rewriting code so effects run on Cell instead of the GPU is out of necessity. Using Cell for graphics is necessary to keep pace with the competition which has a stronger GPU. That is the point he's trying to make that you're missing.

I didn't miss it. Most things are only done out of necessity. "Necessity is the mother of invention." The illusion of necessity can be just as effective. nAo and DeanoC believe the GPUs are similar. When people come up test cases of superior Xenos performance, DeanoC or nAo usually can tell them how to get similar performance from RSX (with sample code). There is real information to pull from that.

I played Heavy Rain, loved the demo, but you're wrong. In terms of detail and animation, there is no game out today that has character models that look as good as that tech demo.

What does the retail game (Heavy Rain) have to do with the devs head/facial rendering on the PS3? This was done before the retail game released. It's on Youtube. I guess you just weren't aware of it's existence. I will try to find it for you.

http://www.youtube.com/watch?v=HpJOjvXXZQY

This doesn't even make any sense. And you're delusional if you think Sony (and their competitors) didn't lay the hype on thick and came up short in the end. IMO Sony played the BS card more than anyone else.

It makes at least as much sense as your Kool-Aid statement. To break it down, some people are acting like they were burned/attacked by Sony's PS3 capabilities annoncements. It seems like some people just couldn't wait to line against it, after that.

Really what's the rather high number for sustained performance?

I usually ignore such PR numbers because they usually don't directly apply to game applications. You can get performance stats on how fast Cell burns through process X but that stat could be entirely different than process Y that is used in the game engine. Meaning it's pointless, only used by console warriors who care to believe what they are fed.

So nothing besides what can't be told to us, numbers-wise, will do, huh? Interesting stance.

That's the thing, I don't have a CPU of choice like you. I think it's silly and a waste of time. Try developing the comprehension to apply this data you hold so dear, maybe then your points will have some merit.

You don't have a CPU of choice? I guess that means you have a CPU you wouldn't choose? Or, maybe you would prefer for a choice not to made, no matter what.

Squilliam · Mar 15, 2011

3dilettante said:
There's an asymmetry in the link bandwidth between RSX and Cell that I am curious about, as I've not seen an explanation as to why.

One explanation I heard was that the RSX part still based off an AGP desktop computer design. This makes sense given the limited time Nvidia had to customise the part. They didn't have native PCI-E designs with the G70/G71?

Anyway to other discussions:

It's one thing to look at the primacy of a CPU in a design set in stone in 2006 and compared to other designs of that era. It is one thing to say that the Cell processor is relevant from that era and consoles from that era but it is another thing entirely to say that the design is even competitive in a console designed with technology from 2012 or 2013. Theres another strong architectural paradigm shift which has happened since the release of the PS3 and thats GPGPU processing. If Microsoft and the industry are true to form they will release DX12 at the time Windows 8 releases in late 2012.

How much would the Cell concept have evolved by the time DX12 GPUs are released? This is the same development gulf that other architectures faced against the huge investments by Intel and the wider software industry in X86. It is extremely likely if you compare the development put into a next generation Cell and the GPU architectures of Nvidia and AMD that the former will look like it has been standing still relative to the latter. Is anyone really willing to make the bet that spending 500M transistors on SPUs is more performance and power efficient than spending the same 500M transistors on a more powerful GPU when modern GPUs simply don't have the weaknesses the RSX had which the Cell processor seems to spend the majority of it's runtime covering for?

Is the Cell processor just an architecture which had the right backing by the right people with no future in a world where the competing architectures have 10-100* the development resources invested?

Arwin · Mar 15, 2011

Squilliam said:
Is the Cell processor just an architecture which had the right backing by the right people with no future in a world where the competing architectures have 10-100* the development resources invested?

Certainly possible, but wouldn't you agree that when you are looking at games exclusively, there are a lot more games on PS3 that use Cell extensively than that there are games using GPGPU even remotely extensively on PC? Because it sure looks that way to me ...

3dilettante · Mar 15, 2011

Npl said:
old code wouldnt use cache, wouldnt be a problem to run it at least as fast as before, so I dont see why this is a problem with BC. Unless you expect code for the new SPU run on the old ones aswell??

Is the new SPU using a hardware cache? If so, then it will not be as fast as the non-coherent local store. If it is using a software cache, it will be used as much as software caching is currently used.

Within their max res AFAIK (4k*4k), thus needs setup from host

Modern GPU ISAs have general memory instructions that do not have a 2D layout and can access RAM in a more conventional fashion.

wont be used bc of lack of defined ISA anyway.

For Nvidia, the internal ISA is not documented publically. AMD does document the ISA.

Arwin said:
the overall amount of RAM is not different, the split RAM pool has bandwidth benefits in various areas, and the implementation of more advanced streaming engines wasn't so much late

The overall amount for both is very small by contemporary standards. Splitting it heightens capacity concerns. A modern console would have more memory, and so would relax this significantly.

I am slighly suspicious however that the way Sony has implemented its harddrive encryption has crippled its harddrive speed somewhat more than would have been desireable, though I haven't heard anyone about this in particular, and I don't know if anyone has done comparative speed-tests of the same harddrive in the PS3 and in the PC.

I'm not up on what Sony's implementation is, only that it's a significant bottleneck even when the drive is replaced with an SSD, outside of cases where there are a large number of tiny files.

Which one are you referring to? Because this has been discussed quite extensively. There are only two standout asymmetrical parts, the 4GB from Cell straight to framebuffer (typically used by the BluRay player type software), and the 16MB connection for Cell initiated reading from GDDR.

The 16MB one.

Correct, but latency needn't be that important either, as long as it is low enough - throughput matters more. In the rest of your story, you seem to be glossing relatively easily over the various very big issues DICE had with DirectX11, that they have contacted and tried to solve with Microsoft?

My concern was that technology has progressed so long in this overextended generation that modern hardware is able to to best obsolete tech by orders of magnitued. If there are DX11 problems, it might not affect a console built with modern hardware, since access would probably be lower-level.

As for SPE backward compatibility, software emulation may be easier than expected, because each SPE is a fairly independent unit with a limited set of features and very predictable behaviour. It may end up being quite less of a challenge than EDRAM posed for PS2/PS3 backward compatibility.

Emulation may be easier than it was for the PS2, but decent software emulation has always required a very large jump in straight-line speed over the architecture being emulated. Modern hardware may not supply significantly higher clocks. It might be helped by hardware-assist.

Squilliam · Mar 15, 2011

Arwin said:
Certainly possible, but wouldn't you agree that when you are looking at games exclusively, there are a lot more games on PS3 that use Cell extensively than that there are games using GPGPU even remotely extensively on PC? Because it sure looks that way to me ...

Well considering the Cell is a CPU of the PS3, it's use is about as common as a game using the X86 CPU on the PC. With respect to GPGPU the hinderence the latency between the CPU and GPU on the PC and not the capabilities of the GPU with respect to real time applications like games. When comparing potential console architectures this disadvantage evaporates.

Arwin · Mar 16, 2011

Squilliam said:
Well considering the Cell is a CPU of the PS3, it's use is about as common as a game using the X86 CPU on the PC.

Wouldn't you agree that's kind of a pointless remark?

With respect to GPGPU the hinderence the latency between the CPU and GPU on the PC and not the capabilities of the GPU with respect to real time applications like games.

Which is why GPGPU is used in a similar way as Cell + GPU. But how Cell and GPU integrate in PS3 is done more extensively as yet

When comparing potential console architectures this disadvantage evaporates.

I am not so sure that is even the original point of this thread, which is how useful Cell is for GPU functions right now. If you want to discuss future applications/designs, I'm thinking we'll be talking more about if it makes sense to have SPE like cores in the future CPU/GPU hybrid, because that is what I would consider to be the most likely candidate. Then the question is how to divide fixed function, programmable simple shaders, programmable complex shaders (more like SPE) and what kind of memory interface would bind them.

But that's a topic for the next-gen thread. I am thinking that perhaps something not so dissimilar from the EIB could work, in that perhaps all different components could tap into the data-streams in similar ways, simple shaders with groups at the same time in paralel, and complex shaders in smaller numbers or individually, with the option to have the data circulate to make multiple passes, with the ideal case where you could do several full circles within 16ms.

Squilliam · Mar 16, 2011

Arwin said:
Wouldn't you agree that's kind of a pointless remark?

Which is why GPGPU is used in a similar way as Cell + GPU. But how Cell and GPU integrate in PS3 is done more extensively as yet

It's a question of circumstance, I.E. CPUs doing GPU work vs GPUs doing CPU work, the latter isn't as architecturally favourable in PCs and consoles as the former due to overall hardware architecture, therefore we wait for more favourable circumstances to show off this potential such as AMD fusion for instance. Arguably you could almost say if it weren't for the lack of certain fixed function GPU hardware, the PS3 had the first CGPU processor. So it seems we step more into semantics than technical discussion.

I am not so sure that is even the original point of this thread, which is how useful Cell is for GPU functions right now. If you want to discuss future applications/designs, I'm thinking we'll be talking more about if it makes sense to have SPE like cores in the future CPU/GPU hybrid, because that is what I would consider to be the most likely candidate. Then the question is how to divide fixed function, programmable simple shaders, programmable complex shaders (more like SPE) and what kind of memory interface would bind them.

Forgive me if im wrong here but what we really seem to need in general is some sort of primer on things like how next generation silicon performance and general charactaristics of silicon design to make best use of it in the context of a games console. As factors like static leakage become more important, what kind of design considerations is that going to force for instance? As of the moment im coming up with a blank on this. Perhaps static leakage shifts the balance towards higher utilization and the balance between the SPU and GPU models favours higher utlilization and the best power charactaristics when performing the most common tasks. So even if the GPU model sucks totally at Cell tasks, so long as it is more efficient more of the time it may win out. However the opposite may be true as well... Argh!

Cell/CPu architectures as a GPU (Frostbite Spinoff)

kagemaru

makattack

Npl

3dilettante

Arwin

Now Officially a Top 10 Poster

Npl

Lucid_Dreamer

i nerini del buio

kagemaru

Ruskie

3dilettante

Npl

Arwin

Now Officially a Top 10 Poster

Lucid_Dreamer

Squilliam

Beyond3d isn't defined yet

Arwin

Now Officially a Top 10 Poster

3dilettante

Squilliam

Beyond3d isn't defined yet

Arwin

Now Officially a Top 10 Poster

Squilliam

Beyond3d isn't defined yet

Similar threads