Predict: The Next Generation Console Tech

Status
Not open for further replies.
Compared to the whole platform, sure. But when comparing just the CPU part and ignoring that half or more of the Cell was occupied doing RSX work I'm not all that sure.

Exactly, people underestimate the power of the CELL processor especially its SPUs simply because they are doing GPU jobs, it was very disappointing for ps3 to have a relatively weak GPU, imagine CELL+XENOS :oops: that would have allowed the developers to do crazy physics, AI and animations...maybe even next gen wont be needed for 2013....

I mean it is crazy, but the CELL is single handidly rendering EVERY post process effect in the majority of the first party sony games : motion blur, depth of field, color grading, anti aliasing....+ doing the lighting + the particle effects + vertex shading and basic polygons....+ the physics + animations all of this running in SPUs :oops:

for those saying Liano is better than CELL : try program and run all of these things in Liano, good luck with that ;)

but just imagine if the CELL was free from any GPU stuff :oops: what could have happened to the games physics, AI, animations, interaction with environment ? I am very excited for next gen....
 
I don't see how Cell would be able to compete with 4 Piledriver cores running at 3.8 Ghz (rumored Trinity specs) in all but the most corner cases that have little relevance to gaming CPU workloads. Especially given that Cell wouldn't be needed to help out with graphics tasks in this theoretical PS4 where it's high throughput might actually come in handy.
Yes, cell did need more "handiwork" to squeeze the performance out of but if someone did it then it delivered. On x86 you can have random spaghetti code run significantly faster than on SPE's, sure, but peak performance isn't all that good compared to 6-SPE Cell. It might be somewhat better but nothing stellar.

Also, unless AMD fixes their cache architecture to not be horrible and attaches the APU to decent memory bandwidth then that will be more wasted potential.
I got a nice little heater for the room here as well, all classic PC, Raid 0 SSD drives etc etc... I still have just as much fun playing PS3 games, and i still think that some of them can hold their own vs PC games. Not in the details or the funky highend shader stuff, but the complete package of some the AAAA titles really do stand their ground.
That's mostly because there haven't really been any half-decent AAA titles targeted towards PC and actually using it's powers in years so there really isn't anything good to compare against.
 
I dont know if this has or hasent been posted before, some trinity VS bulldozer figures : :D

1231899-citavia-amd-trinity-vs-bulldozer,bWF4LTUxOXgyNjY=.jpg



what this would mean for ps4 ?
 
Yes, cell did need more "handiwork" to squeeze the performance out of but if someone did it then it delivered. On x86 you can have random spaghetti code run significantly faster than on SPE's, sure, but peak performance isn't all that good compared to 6-SPE Cell. It might be somewhat better but nothing stellar.

Also, unless AMD fixes their cache architecture to not be horrible and attaches the APU to decent memory bandwidth then that will be more wasted potential. That's mostly because there haven't really been any half-decent AAA titles targeted towards PC and actually using it's powers in years so there really isn't anything good to compare against.

Do you have an examples of code optimized for x86 that would suggest this. Not saying you're wrong, would just like proof.
 
You mean this is the ps4 main CPU/GPU ?

1231555-amd-trinity-slide,bWF4LTUyMHgyOTU=.jpg


I would like to see the faces of naughty dog programmers trying to achieve some new physics and animation tricks in this CPU compared to the super fast SPUs :LOL: they would be shocked granted....

you do know there is a GPU in that APU right?
 
Do you have an examples of code optimized for x86 that would suggest this. Not saying you're wrong, would just like proof.
Wasn't there some FXAA implementation that ran significantly faster on Cell than on i7? I couldn't find the source for it unfortunately.

Also having 128 registers with fast local memory vs 16 (+ how many rename registers?) and absolutely horrible L1 in the BD isn't too good either. Yes, manual memory management isn't easy to do but when you spend enough resources on it you can squeeze out quite a bit of performance.
 
That's been my prediction as well. IMO, it makes sense to go with an APU like solution as well.

On 32nm an 4 module bulldozer is 1.2B transistors and 315mm. The 4 core llano 1.45B transistors and 228mm. Two years after their introduction (2013), I think you could manufacture an SOC around 300mm and 2B transistors on a pretty mature 32nm process.

I'm expecting some sort of customized GPU consuming about 50-60% of the SOC with performance around a 777x and the rest of the die dedicated to the CPU and edram.

Why an APU & HD 6670? A quad-core Athlon II and HD 7850 have similar power dissipation

and die size with really much better graphical performance.
 
Last edited by a moderator:
Why an APU & HD 6670? A quad-core Athlon II and HD 7850 have similar power dissipation

and die size with really much better graphical performance.

If the rumors of the PS4 devkit are true, that it's a 3850 Llano and a 6670, I think it's only to approximate the final performance of the APU. I think using a current gen APU in the dev kit, will give the devs early exposure to utilizing APU architecture and the shared memory between the CPU and GPU. Unless there's some weird multiple SKU setup with a low budget settop box that only utilizes the APU, I don't think we'll have a APU+discrete GPU.


I agree that even an Athlon + 7850 would probably offer much better graphics performance. But I don't think all decisions are based on what's the best performance.

I could be wrong and this is just my opinion, but I think that by using an APU with one shared memory pool really simplifies you're design and manufacturing process. Sure yields might be worse, but on a mature process the difference might not be that great.

With APU, you have only one chip to test, package, and solder on to the PCB versus two for a CPU and GPU. The PCB design is simpler especially if there is only one memory pool. If you have dedicated system and video memory chances are good that they are different types so you have more components (and probably suppliers) to manage. The cooling design is probably simpler. With an overall simpler design, manufacturing is likely more reliable, quicker and thus cheaper.

Going forward it probably cheaper and easier to shrink one chip instead of two especially if they're on different processes to begin with. MS has to work pretty hard to combine their CPU and GPU into the XCGPU, so it's easier to start that way.

This threads is about predictions so I'm predicting an APU for both consoles in the future. I would love to see a dedicated CPU with 2500k level performance and 7850 level GPU and don't think its out of the real of possibility, I just don't think it will happen. I hope I'm wrong.
 
Why an APU & HD 6670? A quad-core Athlon II and HD 7850 have similar power dissipation

and die size with really much better graphical performance.

If true, big if with these rumors, the answer would probably be something like "because Sony likes to be goofy and not do things the normal way" :LOL:
 
Wasn't there some FXAA implementation that ran significantly faster on Cell than on i7? I couldn't find the source for it unfortunately.

In this comparison, do you recall if the bandwidth and latency differences between Cell<->RSX and i7<->PCIe<->GPU were factored in?
 
That's been my prediction as well. IMO, it makes sense to go with an APU like solution as well.

On 32nm an 4 module bulldozer is 1.2B transistors and 315mm. The 4 core llano 1.45B transistors and 228mm. Two years after their introduction (2013), I think you could manufacture an SOC around 300mm and 2B transistors on a pretty mature 32nm process.

I'm expecting some sort of customized GPU consuming about 50-60% of the SOC with performance around a 777x and the rest of the die dedicated to the CPU and edram.

I believe two modules are enough.
two bulldozer 3.0 modules, i.e. Steamroller, with a rather high frequency.

suitable, embarassingly parallel calculations can be off-loaded to the GCN GPU.
an APU even solves the link between GPU and CPU, being on die it can be ridiculously fast.
 
Wasn't there some FXAA implementation that ran significantly faster on Cell than on i7? I couldn't find the source for it unfortunately.

Indeed, but FXAA and other GPU type operations which Cell is relatively good at wouldn't be amongst the CPU's workload in a console with a modern GPU which could handle such tasks far more efficiently than either Cell or the best x86's.
 
Indeed, but FXAA and other GPU type operations which Cell is relatively good at wouldn't be amongst the CPU's workload in a console with a modern GPU which could handle such tasks far more efficiently than either Cell or the best x86's.
Wasn't that same FXAA implementation also drastically faster on Cell than on GPUs? I mean why did they even bother running it on i7 when PCs are generally equipped with far faster GPUs than consoles!

On the whole I agree that GPU should be left for stuff that GPU is good for but still it does show that Cell has the potential of pulling off some insane stuff. It's only problem is it takes a ton of effort to implement it. Though I kind of wonder how much easier it is to implement the stuff on GPU vs Cell. I would imagine that it might actually be quite hard to do on GPU but it simply has a metric ton of raw power so it won't matter much if you waste it and implement stuff somewhat less efficiently on there than on Cell.
 
Fully agree, but maybe quad core Athlon II and HD 5850(thinking shrink to 28nm) could be more interesting yet.
Let see how those bulldozer cores v2 /piledriver fare before discarding them :)
BD had good perfs in high threaded workloads better than Athlon II if memory serves right.
BD also have better SIMD than Atlhon II or the cores in llano.

If Sony goes with two pretty much off the shelves products there indeed may be better of with a std CPU+GPU set-up.

I was thinking of an APU + GPU a while ago for the xbox (based on the double gpu rumors) but I was expecting a pretty "tiny" SoC (max 170sq. mm).

Llano trinity both are ~230 sq. mm not tiny cheap.
High end SKU consumes also a lot (100Watts of TDP could be more in real world).


Overall if I put schedule issue (more a risk than a fact but a significant one) aside a reworked Kaveri make more sense than a llano + Kurt set up. The weak part in Kaveri is the ddr3 memory controller but I guess it would be pretty straight forward for AMD to replace it.

The issue is AMD delivering on schedule, they have experience with TSMC so it's possible but say they have a new bug in the CpU somewhere or something like that... they won't have time for work around or will have delay.

WRT the gpu In case of a std cpu+gpu set-up, I would discard anything with a 256 bit bus. So a reasonable bet would be a part akin to cap verde or a bit better. Cap verde may top at 12 CU, a replacement may go up to 16 CU. I would expect a CU number between 10 and 16.

EDIT

The more the time passes the more I believe that KB-smoker may be right on that, the 360 came together only a few months before release.
If there is truth to the rumor, Sony may had this 6670 gpu to the devs kits to make up for the lack of raw power and bandwidth of llano.
Kaveri with 2GB of GDDR5 (as more is problematic) should perform / out perform the said solution.
FLOPS comparison are apple to orange between previous vliw5 architecture and the new gcn scalar architecture, you may want to remove 20% to the peak FLOPS figure.
A8+ 6670 is 480+768 so 1248, minus -20% that's 998 MFLOPS. even if kaveri end south of that by a hand few FLOPS or even one hundred FLOPS, it should do the trick.

I got to research more about those stream roller cores I don't know if there BD base like piledriver or based on older architecture.
 
Last edited by a moderator:
Let see how those bulldozer cores v2 /piledriver fare before discarding them :)
BD had good perfs in high threaded workloads better than Athlon II if memory serves right.
BD also have better SIMD than Atlhon II or the cores in llano.

If Sony goes with two pretty much off the shelves products there indeed may be better of with a std CPU+GPU set-up.

I was thinking of an APU + GPU a while ago for the xbox (based on the double gpu rumors) but I was expecting a pretty "tiny" SoC (max 170sq. mm).

Llano trinity both are ~230 sq. mm not tiny cheap.
High end SKU consumes also a lot (100Watts of TDP could be more in real world).


Overall if I put schedule issue (more a risk than a fact but a significant one) aside a reworked Kaveri make more sense than a llano + Kurt set up. The weak part in Kaveri is the ddr3 memory controller but I guess it would be pretty straight forward for AMD to replace it.

The issue is AMD delivering on schedule, they have experience with TSMC so it's possible but say they have a new bug in the CpU somewhere or something like that... they won't have time for work around or will have delay.

WRT the gpu In case of a std cpu+gpu set-up, I would discard anything with a 256 bit bus. So a reasonable bet would be a part akin to cap verde or a bit better. Cap verde may top at 12 CU, a replacement may go up to 16 CU. I would expect a CU number between 10 and 16.

EDIT

The more the time passes the more I believe that KB-smoker may be right on that, the 360 came together only a few months before release.
If there is truth to the rumor, Sony may had this 6670 gpu to the devs kits to make up for the lack of raw power and bandwidth of llano.
Kaveri with 2GB of GDDR5 (as more is problematic) should perform / out perform the said solution.
FLOPS comparison are apple to orange between previous vliw5 architecture and the new gcn scalar architecture, you may want to remove 20% to the peak FLOPS figure.
A8+ 6670 is 480+768 so 1248, minus -20% that's 998 MFLOPS. even if kaveri end south of that by a hand few FLOPS or even one hundred FLOPS, it should do the trick.

I got to research more about those stream roller cores I don't know if there BD base like piledriver or based on older architecture.


Your viewpoints are perfect and fully agree with,but sometimes I think that if the APU will in fact succeed, because although the idea is excellent to marry cpu + gpu + memory controller etc on the same die (this is much more I know ..) perhaps i'm wrong here...I have the impression today and even next year the APUs is still very incipient and too ambitious to become something much more eficcient to compensate more powerfull the current paradigm CPU and GPU "separates,singles etc.


Another interesting point that you touched,were the chances of bugs (similar to the multiplier in the Intel pentium or even worse damaging memory accesses etc) in these new processors coming from AMD ... and imagine if something similar occurs next gen consoles on the production line?

I personally would prefer they used something that was had already tested and approved and if customized (put something extra on SIMD,die shrinks to 28nm,disable or retired pcs things etc.) cpus and gpus for next gen consoles maybe could be very interesting ... my "dream console" is something like quad Athlons II + Radeon HD 5850 (on the paper ...almost 2.1 TFlops ).
 
Last edited by a moderator:
Status
Not open for further replies.
Back
Top