Return of Cell for RayTracing? *split*

Posted June 20, 2018

2

Leaks from some developers that are working the first round of PS5 games say their Sony supplied PC Dev kits have around 20TF. Those are not true console dev kits but mock PC dev kits made to give an early approximation of the real deal
 
Leaks from some developers that are working the first round of PS5 games say their Sony supplied PC Dev kits have around 20TF. Those are not true console dev kits but mock PC dev kits made to give an early approximation of the real deal

The same post makes some weird statements.

First that these mock PC Dev kits usually have >2x the performance of the consoles' final hardware. It's the first time I've heard of such a thing. Dev kits usually do have 2x the RAM because of debugging, so perhaps he mixed things up?

The second is that the XboneX PC dev kits had 16 cores and a 14 TFLOPs GPU. I highly doubt this because such a mock PC dev kit would have come out in mid-2016 at best, and at that point AMD only had either a 5.8 TFLOPs Polaris 10 (RX480) or a 8.6 Fiji XT (Fury X). On the CPU side, they'd need a Xeon E5 v3/v4 CPU, since Zen solutions weren't available at the time.

Is that user a reliable source himself? He mentions dev rumors but provides no links for corroboration. That post is also over 2 months old..
 
First that these mock PC Dev kits usually have >2x the performance of the consoles' final hardware. It's the first time I've heard of such a thing. Dev kits usually do have 2x the RAM because of debugging, so perhaps he mixed things up?

There are a sound reasons why early devkit hardware may be significantly more powerful by certain metrics than the target production hardware.

Because your devkit needs to run early dev code, on early APIs at the performance level of your target final product and your final product is based on a chipset architecture that likely includes new technologies, in order to approximate that without the benefit of those technologies, which could enhance bus performance, reduce bandwidth and/or accelerate other features, you sometimes need to go crazy high.

This was quite common when I worked in the aerospace industry. I remember we used SPARC workstations to emulate target processors in a variety of aerospace applications.
 
Last edited by a moderator:
Posted June 20, 2018

2

Leaks from some developers that are working the first round of PS5 games say their Sony supplied PC Dev kits have around 20TF. Those are not true console dev kits but mock PC dev kits made to give an early approximation of the real deal
So a Vega 56 crossfire rig, why not just a Vega 64 for better single gpu scaling since the "supposed" final spec is way less than 13.7 TF anyway?
8-10 TF PS5 is gonna be highly disappointing, might skip it at launch if that's the case although I doubt that leak's authenticity.
 
Because your devkit needs to run early dev code, on early APIs at the performance level of your final product and your final product is based on a chipset architecture that likely includes new technologies, to approximate that without the benefit of those technologies, which could enhance bus performance, reduce bandwidth and/or accelerate other features, you sometimes to go crazy high.

There are sound reasons, but are they usual on GPUs for high-performance consoles, when the architecture is actually similar to the final one (Vega <-> Navi)?
I remember reports on Wii U PC devkits having HD4850 graphics cards, but that console ended up being a low-performance device.

There's also the fact that there's no AMD GPU capable of reaching 20 TFLOPs or anything near that, nor there will be until late 2018 with 7nm Vega 20, so what could be in said devkit? Two Vega 56 that will scale up terribly and a Liquid Cooled Vega 64 would have better performance?
 
I already did the math on that one. 20 TF Cell ubercomputer would fit onto the same silicon as an RTX2080 (going by transistor counts). Some still try to resist the One True Faith, but they'll come around when Cell2 (codenamed "Neutrophil") is used in PS5, the world's first real-time raytracing console, using techniques inspired by the world's greatest ever computer, the Amiga, and how it handled 3D in realtime (postage-stamp sized resolution...).

Edit: There are plenty of funny numbers to be had with this idea. 80 Cell BBEs would fit on the silicon. Assuming no trouble connecting them up, you'd have 80x the attained ~200 GB/s SPE data access across the EIB, so 16 TB/s internal bandwidth. 160 MBs of SRAM local storage on SPEs and another 40 MBs for the PPEs' cache.
 
Last edited:
I already did the math on that one. 20 TF Cell ubercomputer would fit onto the same silicon as an RTX2080 (going by transistor counts). Some still try to resist the One True Faith, but they'll come around when Cell2 (codenamed "Neutrophil") is used in PS5, the world's first real-time raytracing console, using techniques inspired by the world's greatest ever computer, the Amiga, and how it handled 3D in realtime (postage-stamp sized resolution...).
I superlove this whole idea. I want to dream as a child again, I want to be truly excited about next gen consoles, I want to feel these emotions.

:love: :D

You're a bad person. Don't do that to me ever again! I don't want to face our boring, sad, grey reality.
 
So a Vega 56 crossfire rig, why not just a Vega 64 for better single gpu scaling since the "supposed" final spec is way less than 13.7 TF anyway?
8-10 TF PS5 is gonna be highly disappointing, might skip it at launch if that's the case although I doubt that leak's authenticity.
I agree that anything other than 10TF would be underwhelming. That would influence my preferred platform for sure.
 
There are sound reasons, but are they usual on GPUs for high-performance consoles, when the architecture is actually similar to the final one (Vega <-> Navi)? I remember reports on Wii U PC devkits having HD4850 graphics cards, but that console ended up being a low-performance device.

Yup. And early Xbox 360 devkits were built around Mac's with G5 processors way more powerful than the CPU in the 360.

There's also the fact that there's no AMD GPU capable of reaching 20 TFLOPs or anything near that, nor there will be until late 2018 with 7nm Vega 20, so what could be in said devkit? Two Vega 56 that will scale up terribly and a Liquid Cooled Vega 64 would have better performance?

No single GPU on the market, and don't discount Crossfire.

I'm not swallowing the rumours wholesale but they're not as bizarroworld crazy as in they are technically impossible. It does seem improbable though, cost alone would make this prohibitive for widescale dissemination and they're still be sufficiently divorced technically from final target hardware to be far from idea.
 
Last edited by a moderator:
I already did the math on that one. 20 TF Cell ubercomputer would fit onto the same silicon as an RTX2080 (going by transistor counts). Some still try to resist the One True Faith, but they'll come around when Cell2 (codenamed "Neutrophil") is used in PS5, the world's first real-time raytracing console, using techniques inspired by the world's greatest ever computer, the Amiga, and how it handled 3D in realtime (postage-stamp sized resolution...).

Edit: There are plenty of funny numbers to be had with this idea. 80 Cell BBEs would fit on the silicon. Assuming no trouble connecting them up, you'd have 80x the attained ~200 GB/s SPE data access across the EIB, so 16 TB/s internal bandwidth. 160 MBs of SRAM local storage on SPEs and another 40 MBs for the PPEs' cache.
Perhaps one day this could happen, realistically, if programming for Cell was trivial for all game developers (which perhaps one day it will be) I think this would have been a potential good timeline.
I think as we get further down the line with our current architectures and computational gains generation to generation start to lessen and lessen, these types of architectures will start to make more sense.
 
I'm sure real-world implementation of such a design will incur considerable overhead, but on paper it does look good in principle and what the original vision was aiming towards, I suppose. Utilisation is everything though, and it'd probably take years of focus to develop algorithms and code-bases to make the most of such a multicore architecture. GPUs have had nearly 20 years of gradually evolution, with software shaping hardware shaping software. A complete paradigm shift to a Cell2 design would set software back ages.

That said, I keep looking at Cell and raytracing and keep saying, "Hmmmm," to myself. ;) Once GPUs have moved over to raytracing, moving rendering over to more general purpose cores should be a lot easier, and the overall improvement in silicon versatility should be very welcome.
 
Perhaps one day this could happen, realistically, if programming for Cell was trivial for all game developers (which perhaps one day it will be) I think this would have been a potential good timeline.

Cell was before it's time. It hit consumer hardware before highly parallelised multi-threaded code was prevalent in gaming and Cell relied on these techniques to shine. Had Sony stuck with Cell for PS4 it a) would still have been the wrong choice but b) would have been a whole lot less of an issue because parallelised code was something all devs had learnt during the seventh generation of consoles.
 
Last edited by a moderator:
Cell was before it's time. It hit consume hardware before highly parallelised multi-threaded code was prevalent in gaming and Cell relied on these techniques to shine. Had Sony stuck with Cell for PS4 it a) would still have been he wrong choice but b) would have been a whole lot less of an issue because parallelised code was something all devs had learnt during the seventh generation of consoles.
Heh yea A but B i had pondered about for some time. Had Sony kept with Cell, PS5 would be their 3rd generation and possibly have BC all the way back to PS3 titles to boot.

How dramatically different this generation’s video games would be today for such a timeline.
 
Heh yea A but B i had pondered about for some time. Had Sony kept with Cell, PS5 would be their 3rd generation and possibly have BC all the way back to PS3 titles to boot.

Cell was 100% a good bet in 2004 when Sony committed to it for PS3 but equally was 100% a bad bet in 2010 when Sony likely committed to PS4 hardware. Everything is obvious with the benefit for hindsight. :yes:
 
Cell was 100% a good bet in 2004 when Sony committed to it for PS3 but equally was 100% a bad bet in 2010 when Sony likely committed to PS4 hardware. Everything is obvious with the benefit for hindsight. :yes:
There was a dramatic shift it would appear around that time. Was mobile such a large threat to the industry such that MS went TV TV and PS wanted to go with easier to develop hardware ?

I wonder what type of future threats or considerations they must account for, for next gen. But it looks pretty positive imo, the console and games space is growing well
 
What do we need to get the cell's efficiency from a more traditional x86 architecture? I was thinking cell was fast because of the LS having ridiculously low (and fixed!) latency. It was literally like old school DSPs, 256k of registers.

I wonder if it would be possible to modify the zen architecture to have a local store to emulate cell, maybe repurpose half the L2 to behave like registers or something.

There is a good reason for sony to work on that, if they are serious about expanding psnow significantly, they'll either have to restart producing 40nm cells and waste rack space, or have emulation working efficiently on hardware that can play all of their library. The missing link is obviously getting cell code to work on x86.
 
Cell was before it's time. It hit consumer hardware before highly parallelised multi-threaded code was prevalent in gaming and Cell relied on these techniques to shine. Had Sony stuck with Cell for PS4 it a) would still have been the wrong choice but b) would have been a whole lot less of an issue because parallelised code was something all devs had learnt during the seventh generation of consoles.

It also didn't help that, with the PS3 implementation of Cell at least, developers had to code for three distinct processing elements with very different models between the PPC cores, the SPEs, and the GPU. Without being able to dump one of those elements in the next design, this would likely continue to annoy developers.
 
Back
Top