Supplemental info from Japanese articles about the Cell

one

Unruly Member
Veteran
Now Hiroshige GOTO and Zenji NISHIKAWA are running respective series of articles about the Cell at PC Watch and ASCII24 after they visited ISSCC 2005 as you may find in the other thread, and in addition to the pretty pictures they contain words from the Cell project members. As it contains many new info, for the convenience of discussion I transcribe those words here as many as possible with the compressed speculation by respective writers.

In one of Nishikawa's articles about the Cell at the ISSCC 2005 (he is a GPU-oriented journalist usually writing about new technologies from nVIDIA and ATi and DirectX, while Goto often writes about CPU and occasionally about GPU and game consoles tech) he picks up the Cell-based GPU (VS) which is found in the patent.

At ISSCC 2005 he asked Cell project members about the Cell-based GPU. They told him that the Cell-based GPU was actually in development but they gave it up for the PS3 eventually. They didn't give away the reason why it's discontinued. The interesting thing is that the all Cell project members Nishikawa contacted told him that even they were surprised when they heard nVIDIA was chosen as the partner (When they actually knew the partnership is unknown, but Nishikawa assumes the very secretive 1-2 years Sony-nVIDIA partnership as nVIDIA suggested).

Though the Cell-based GPU was not adopted in the PS3, according to the Cell project members, a test program to render basic 3D graphics is actually running on the Cell processor presented in the ISSCC 2005, in the lab. So Nishikawa concludes the current configuration (1PPE+8SPE) already has the enough potential as a GPU and a Cell-only cost-effective system without a dedicated GPU is doable for applications like a car-navigation system.

In the latest article in the series, Nishikawa speculates the possible configuration of the PS3. He thinks it uses SPEs in the CPU as programmable vertex shaders and the nVIDIA GPU contains only pixel rendering pipelines and eDRAM. The memory configuration is UMA with eDRAM in the GPU as a cache, since to support VS 3.0+ SPEs as vertex shaders have to be able to access texture memory. If all 8 SPEs are used as vertex shaders, it can reach over 6-8 billion vertexs/sec on 3-4 Ghz (the overhead by the EIB of the Cell is not included) and is too much compared to the expected performance of the GPU part, so programmers can use some of them as pipelines for other goodies, such as tessellator, geometry shader or LOD processing.

Goto has a report about SPE with some comments from the Cell architects. In the SPE, the 128 128-bit physical registers are mapped as logical registers.
Masakazu SUZUOKI said:
The 128 registers can be seen (usable) from a program too.
Why it has that many registers is because SPE can be assumed not to switch threads as often as the PPE that runs an OS.
Jim Kahle said:
We've been experienced in registers. The set of 32 registers (of Power) has the advantage that it reduces saving (registers into memory) at a context switch and keeps (thread switching) in a small slice. But it's known that (software model on) the SPE continues execution till the end once started. So it has less context switchs therefore it can have a larger register set.
In the SPE which is a SIMD processor, it required a larger register set to make operations more parallel.
Jim Kahle said:
To make a larger register set is also for loop unrolling. To fill pipelines without techniques such as register renaming, it requires loop unrolling. By the experiences on other processors, we decided about 100 registers are appropriate for loop unrolling.
In the PS2 EE this kind of optimization was done by hand, but in the Cell it 's supposed to be compiler based.
Masakazu SUZUOKI said:
The 128 registers set is a kind of brute-force approach. But the SPE is an in-order machine and runs at 4Ghz so the latency is not small. Hence compiler-based scheduling becomes very important. Loop unrolling is a good example. For advanced scheduling based on a compiler, we thought we needed 128 registers.
Jim Kahle said:
In the environment with 128 registers, a compiler can do many optimizations. It'll trully push up the resource usage efficiency. The 32 registers set so far was the bar for optimization by a compiler.
It means the architects assume compiler-based software scheduling for the Cell and are developing a powerful optimization compiler for the Cell, unlike the Emotion Engine.

The instruction set of the SPE is totally different from the Power/PowerPC ISA of the PPE.
Jim Kahle said:
Though there's a certain base, it's a totally new ISA. We started from 32bit instructions and did RISC-type organization. But it became a new instruction set to address 128 registers. We also tried to make it as simple as possible. Integer instruction, single-precision floating point instruction, double-precision floating point instruction, load-store instrucion, branching instruction, all of them were formatted in the 32-bit instruction format.

In the latest article, Goto discuss the DRM feature of the Cell.
Masakazu SUZUOKI said:
The SPE basically has a protection mechanism. It's called "isolated mode" and if an SPE gets in this mode Local Store is completely locked. It's protected from the outside of the SPE and all other elements in the Cell. Even the OS can't browse the content of the LS. Of course it's very dangerous, for example a program designed to run on a protected SPE may stop with a bug. In such case, the PPE sends a reset signal and the SPE clears the memory then dies. It may look tricky, but reliable protection of contents is very important to be supplied wonderful contents from contents providers.
 
Great job one, I liked what you found: interesting bits and pieces that add to the grea RWT article David Wang wrote (evevn though I wish he talked abit more about Integer proicessing which is not the second class citizen it was on EE's VU's).
 
Thanks one, great work. Some interesting stuff in there. The GPU comments are interesting. If the Cell people were "surprised" that Sony chose not to go with a Cell chip for the GPU, that suggests that performance wasn't the issue that some make it out to be in the decision to switch to NVidia. I'm thinking it was probably a performance per dollar issue. I also hope Zenji Nishikawa is correct in his GPU speculation ;)
 
Titanio said:
Thanks one, great work. Some interesting stuff in there. The GPU comments are interesting. If the Cell people were "surprised" that Sony chose not to go with a Cell chip for the GPU, that suggests that performance wasn't the issue that some make it out to be in the decision to switch to NVidia. I'm thinking it was probably a performance per dollar issue. I also hope Zenji Nishikawa is correct in his GPU speculation ;)

They were surprised that SCE was going with nVIDIA (over the other GPU partner... Toshiba). They were not surprised that they did not go with a CELL based GPU "They told him that the Cell-based GPU was actually in development but they gave it up for the PS3 eventually" IMHO.
 
Panajev2001a said:
They were surprised that SCE was going with nVIDIA (over the other GPU partner... Toshiba). They were not surprised that they did not go with a CELL based GPU "They told him that the Cell-based GPU was actually in development but they gave it up for the PS3 eventually" IMHO.

Indeed, you're correct, I think. Putting vertex ops on the CPU would also make the decision to go with someone like NVidia more understandable too for the pixel shading side of things.
 
Well, the power of APUs in the Cell-based GPU may not be that bad, but it's probable that the Pixel Engine by Toshiba/Sony sucked instead.

The possible reasons are

1. nVIDIA's pixel shader performs better than Toshiba's, if the CPU does vertex processing.
2. nVIDIA's overall GPU performs better than Toshiba's, if the CPU doesn't do vertex processing.
3. nVIDIA's solution is cheaper than Toshiba's.
4. nVIDIA's solution carries more API, shader-tech, know-how, game developers, middlewares, and whole lot of assets than Toshiba's.
5. Sony just wanted to save the overall cost of the PS3 and killed expensive parts.
6. The synchronization issue with the Cell project. The tape-out of the 90nm Cell was in Jan 2004, so the completion of the Cell-based GPU after that may miss the projected PS3 release schedule.
7. Sony loves the Xbox 1 hardware ;)
 
I'm betting the PS3 GPU still has GPU vertex shaders, and is not eDRAM based. My prediction is that the PS3 GPU is what would have been the NV5x GPU for PC desktops, tweaked for a FlexIO/CELL bus archiecture. I do not think NVidia has enough time to rip out shaders, add in eDRAM (for which they have zero architecture experience compared to their Lightspeed Memory stuff, and for which, the fabbing process has a different set of issues)

Whether or not the PS3 GPU vertex shader units actually get used is another issue. The major hurdle is triangle setup performance.
 
DemoCoder said:
add in eDRAM (for which they have zero architecture experience compared to their Lightspeed Memory stuff, and for which, the fabbing process has a different set of issues)

Sony does fabbing for the GPU. In the GameCube, ArtX(ATi) did the design of the GPU with eDRAM and NEC fabbed it. Did ArtX have the experience with eDRAM?
 
In the latest article in the series, Nishikawa speculates the possible configuration of the PS3. He thinks it uses SPEs in the CPU as programmable vertex shaders and the nVIDIA GPU contains only pixel rendering pipelines and eDRAM. The memory configuration is UMA with eDRAM in the GPU as a cache, since to support VS 3.0+ SPEs as vertex shaders have to be able to access texture memory. If all 8 SPEs are used as vertex shaders, it can reach over 6-8 billion vertexs/sec on 3-4 Ghz (the overhead by the EIB of the Cell is not included) and is too much compared to the expected performance of the GPU part, so programmers can use some of them as pipelines for other goodies, such as tessellator, geometry shader or LOD processing.

That's really interesting speculation.
How much pixel shading processing power do you guys expect the Nvidia GPU to have?
What kind of FLOPS rating can achieve the current top products from nVidia?
 
I'm betting the PS3 GPU still has GPU vertex shaders, and is not eDRAM based. My prediction is that the PS3 GPU is what would have been the NV5x GPU for PC desktops, tweaked for a FlexIO/CELL bus archiecture. I do not think NVidia has enough time to rip out shaders, add in eDRAM (for which they have zero architecture experience compared to their Lightspeed Memory stuff, and for which, the fabbing process has a different set of issues)

Adding eDRAM is not trivial, but is converting their lightspeed memory to work with XDR trivial ? Without extra memory bandwidth, with 25 GB shared with Cell and GPU, PS3 is going to be memory bandwidth starve.
 
Shinjisan said:
In the latest article in the series, Nishikawa speculates the possible configuration of the PS3. He thinks it uses SPEs in the CPU as programmable vertex shaders and the nVIDIA GPU contains only pixel rendering pipelines and eDRAM. The memory configuration is UMA with eDRAM in the GPU as a cache, since to support VS 3.0+ SPEs as vertex shaders have to be able to access texture memory. If all 8 SPEs are used as vertex shaders, it can reach over 6-8 billion vertexs/sec on 3-4 Ghz (the overhead by the EIB of the Cell is not included) and is too much compared to the expected performance of the GPU part, so programmers can use some of them as pipelines for other goodies, such as tessellator, geometry shader or LOD processing.

That's really interesting speculation.
How much pixel shading processing power do you expect the Nvidia GPU to have?

If it can afford to use all its processing units only to pixel shading, it will be one hell of a powerful GPU in that regard.
Just think about a NV40 that doesn't need to do vertex shading, only pixel. Then think it will be one generation ahead of NV40...
 
london-boy said:
Shinjisan said:
In the latest article in the series, Nishikawa speculates the possible configuration of the PS3. He thinks it uses SPEs in the CPU as programmable vertex shaders and the nVIDIA GPU contains only pixel rendering pipelines and eDRAM. The memory configuration is UMA with eDRAM in the GPU as a cache, since to support VS 3.0+ SPEs as vertex shaders have to be able to access texture memory. If all 8 SPEs are used as vertex shaders, it can reach over 6-8 billion vertexs/sec on 3-4 Ghz (the overhead by the EIB of the Cell is not included) and is too much compared to the expected performance of the GPU part, so programmers can use some of them as pipelines for other goodies, such as tessellator, geometry shader or LOD processing.

That's really interesting speculation.
How much pixel shading processing power do you expect the Nvidia GPU to have?

If it can afford to use all its processing units only to pixel shading, it will be one hell of a powerful GPU in that regard.
Just think about a NV40 that doesn't need to do vertex shading, only pixel. Then think it will be one generation ahead of NV40...

They don't need that many pixel shader units. They probably need more vertex shader units than pixel shader units. Pixel shader units are also alot cheaper compare to vertex shader units.

Assuming they don't go with eDRAM, NV will probably stick with the number of NV40 pixel shaders, and increased the vertex shaders, perhaps double or even triple the number to 12-18 VS. That is if NV50 doesn't feature unified hardware.
 
V3 said:
They don't need that many pixel shader units. They probably need more vertex shader units than pixel shader units. Pixel shader units are also alot cheaper compare to vertex shader units.

Assuming they don't go with eDRAM, NV will probably stick with the number of NV40 pixel shaders, and increased the vertex shaders, perhaps double or even triple the number to 12-18 VS. That is if NV50 doesn't feature unified hardware.


Oh i know, i just made the assumption that the vertex shading is going to be done on the CPU, like it's being rumoured.
Keeping that assumption, if they keep the same number of pixels shaders, they could use the transistor budget on eDRAM instead of the vertex units that are present now on NVIDIA GPUs.
Just speculation :D
 
london-boy said:
Oh i know, i just made the assumption that the vertex shading is going to be done on the CPU, like it's being rumoured.
Keeping that assumption, if they keep the same number of pixels shaders, they could use the transistor budget on eDRAM instead of the vertex units that are present now on NVIDIA GPUs.
Just speculation :D

Yeah, CPU is going to take over some of the vertex shading work, like animation, but stuff like displacement mapping might still to be done on the vertex shader, since it has that ability now. All NV has to do is speed up that kind of operations.

If eDRAM is in the picture though, its entirely different story IMO.
 
V3 said:
london-boy said:
Oh i know, i just made the assumption that the vertex shading is going to be done on the CPU, like it's being rumoured.
Keeping that assumption, if they keep the same number of pixels shaders, they could use the transistor budget on eDRAM instead of the vertex units that are present now on NVIDIA GPUs.
Just speculation :D

Yeah, CPU is going to take over some of the vertex shading work, like animation, but stuff like displacement mapping might still to be done on the vertex shader, since it has that ability now. All NV has to do is speed up that kind of operations.

If eDRAM is in the picture though, its entirely different story IMO.

See that's where my brain stops understanding, i was thinking about displacement mapping. And with all this talk about the BE being able to process billions of polygons per second, if displacement maps were applied to the already many polygons at rendering stage, how would the GPU cope?
It's either NO displacements maps, or the displacements maps are done on the CPU (which sounds fishy to me), cause it's been made clear that the GPU will never be able to keep up with the polygons that the CPU will be able to process, therefore it will not be able to add polygons to the geometry the CPU sends to it.
I'm sure my post wasn't too clear, and that's because it's all very foggy at the moment.
 
DemoCoder said:
add in eDRAM (for which they have zero architecture experience compared to their Lightspeed Memory stuff, and for which, the fabbing process has a different set of issues)

SCE and Toshiba both have quite a bit of experience in e-DRAM and challenges related to manufacturing technology.
 
V3 said:
They don't need that many pixel shader units. They probably need more vertex shader units than pixel shader units. Pixel shader units are also alot cheaper compare to vertex shader units.

You got both statements the other way around.

Just to make an example: which unit has to have more context to stand high latencies ? Pixel Shader units.

vertex Shaders are important, but the CELL based CPU has that covered quite well even if there are no Vertex Shaders on the GPU: you need to GPU to pack a BIG Pixel Shading punch and have high triangle set-up rates.
 
london-boy said:
See that's where my brain stops understanding, i was thinking about displacement mapping. And with all this talk about the BE being able to process billions of polygons per second, if displacement maps were applied to the already many polygons at rendering stage, how would the GPU cope?
Not what bothers me is all this talk of Cell doing the Vertex processing, billions of polys a second, and GPU chugging a gazillion pixels a second, sounds thus far like PS3 will be a graphics monster but limited to the gameplay potential of a ZX Spectrum. Where's the CPU power to drive the rest of the game if it's all taken up rendering photo-real graphics? Thus far, XB2 sounds to have a more balanced distribution of power. I guess until we hear next month about PS3 in the flesh, it's too early to tell. But all this talk of Cell doing the graphics leg-work I seriously wonder what'll be left...
 
Shifty Geezer said:
london-boy said:
See that's where my brain stops understanding, i was thinking about displacement mapping. And with all this talk about the BE being able to process billions of polygons per second, if displacement maps were applied to the already many polygons at rendering stage, how would the GPU cope?
Not what bothers me is all this talk of Cell doing the Vertex processing, billions of polys a second, and GPU chugging a gazillion pixels a second, sounds thus far like PS3 will be a graphics monster but limited to the gameplay potential of a ZX Spectrum. Where's the CPU power to drive the rest of the game if it's all taken up rendering photo-real graphics? Thus far, XB2 sounds to have a more balanced distribution of power. I guess until we hear next month about PS3 in the flesh, it's too early to tell. But all this talk of Cell doing the graphics leg-work I seriously wonder what'll be left...

That's not really the point of the thread, but for what it's worth, it's obvious only a part of the CPU will take care of graphics and the rest will be left for physics, animation (which is part of graphics anyway) and AI.
Also, it seems the architecture will be flexible enough to leave graphics tasks on GPU's vertex shaders (if they're there), so the CPU can take care of other tasks.
We'll see, i don't think calling PS3 a "all graphics no brains" machine is the right thing to do at present time.
 
one said:
DemoCoder said:
add in eDRAM (for which they have zero architecture experience compared to their Lightspeed Memory stuff, and for which, the fabbing process has a different set of issues)

Sony does fabbing for the GPU. In the GameCube, ArtX(ATi) did the design of the GPU with eDRAM and NEC fabbed it. Did ArtX have the experience with eDRAM?

good statement / question.
 
Back
Top