PS3 Linux Game - SPEs as unified shader GPU?

PS3 Linux has no RSX no? There have been many threads on Cell performance for vertex shader and pixel shader and also posts.

So, with new information from new threads, I am curious what GPU performance (theorerical specs) can be had with all available SPEs (for PS3 Linux) dedicated for graphics only.

So PPE as CPU, and SPEs like unified shader GPU.

Any guesses?
 
RSX runs circles around Cell when it comes to rasterization, so you can't really use it for pixel shading.
 
The paper in that thread uses RSX for rasterization so that all Cell needs to do is march across the screen. It also does point sampling of all the shadow map samples.

If you want to do everything on Cell, there's a lot more work to be done. Rasterization and real texturing on Cell won't get anywhere near the speed of RSX. I doubt it could even match the original Geforce for a real scene.
 
Rasterize

The paper in that thread uses RSX for rasterization so that all Cell needs to do is march across the screen. It also does point sampling of all the shadow map samples.

If you want to do everything on Cell, there's a lot more work to be done. Rasterization and real texturing on Cell won't get anywhere near the speed of RSX. I doubt it could even match the original Geforce for a real scene.

IBM Terrain Ray Casting Demo 3.2Ghz 8 SPE Cell they have texture, 30fps, 720P, 4xMSAA, no?

Here is an article about Cell texture. I would like to know your ideas. Is it true?

http://gametomorrow.com/blog/index.php/2006/03/24/cell-cant-texture/

Also this article is interesting.

http://www.graphicshardware.org/previous/www_2005/presentations/damora-cell4graphicsandviz-gh05.pdf

IBM employee Barry Minor also says that Linux PS3 can raytrace this 720P 333,000 triangle car at "interactive frame-rate" (15fps?) Is this real or just "marketing"? What do you feel?

http://www.ibmcommunity.com/images/side.jpg

He also says Linux Cell can ray-trace this a 69,000 triangle bunny at > 30fps/720P with secondary rays but G80 can only do <10fps.
 
Last edited by a moderator:
raycasting has no "overdraw", it just takes 1 sample per pixel (or 4 on edges, if using 4X MSAA). we dont even know if those samples where bilinear filtered.
Should tex-sampling be the bottleneck, then you have a performance of 30*720*1280 ~ 27,6 "MegaTexels". Pretty abysmal, but its unlikely that thats the case, and a simple rasterizer would likely be capable of more "Megatexels" than that, just so you know that your example is a bad one.

edit: and raytracing is not much different ;) .
 
Texture Filtering

raycasting has no "overdraw", it just takes 1 sample per pixel (or 4 on edges, if using 4X MSAA). we dont even know if those samples where bilinear filtered.
Should tex-sampling be the bottleneck, then you have a performance of 30*720*1280 ~ 27,6 "MegaTexels". Pretty abysmal, but its unlikely that thats the case, and a simple rasterizer would likely be capable of more "Megatexels" than that, just so you know that your example is a bad one.

edit: and raytracing is not much different ;) .

http://www.graphicshardware.org/previous/www_2005/presentations/damora-cell4graphicsandviz-gh05.pdf

30+ frames per second with only one Cell processor
– No graphics adapter assist
– 1280x720 (HD 720P) resolution
Advanced SPE shader function
– Ray/Terrain intersection computation
– Texture Filtering
– Normal computation
– Bump map computation
– Diffuse + Ambient lighting model
– Perlin Noise based clouds
– Atmosphere computation (haze, sun, halo)
– Dynamic multi-sampling (4 – 16 samples per pixel)
– Image based input (16 bit height + 16 bit texture)
– 29 KB of SPE object code
– 224 KB of SPE local store data
M-JPEG compression via SPE
 
Last edited by a moderator:
^^and your point is?

You are asking how the SPEs would compare to a GPU, the example above does do more complex stuff than you throw at a GPU. The SPEs can crunch numbers like crazy, what they cant do very well is texturing.

The only interesting part is:
* Image based input (16 bit height + 16 bit texture)
Which means we have either 2 16-bit Textures or 1 32bit one.
* Dynamic multi-sampling
4-16 samples per pixel.

Now, unlike a GPU, you dont paint triangles, which overlap or occlude each other (called overdraw), but only calculate the pixels you need. You cant conlude anything about how a SPE would perform as Rasterizer from this.
 
SPE & overdraw

^^and your point is?

You are asking how the SPEs would compare to a GPU, the example above does do more complex stuff than you throw at a GPU. The SPEs can crunch numbers like crazy, what they cant do very well is texturing.

The only interesting part is:
* Image based input (16 bit height + 16 bit texture)
Which means we have either 2 16-bit Textures or 1 32bit one.
* Dynamic multi-sampling
4-16 samples per pixel.

Now, unlike a GPU, you dont paint triangles, which overlap or occlude each other (called overdraw), but only calculate the pixels you need. You cant conlude anything about how a SPE would perform as Rasterizer from this.

This is why I mention older threads my friend. SPE for occluding.

http://forum.beyond3d.com/showthread.php?t=44140

How is SPE to compare with PS2 Graphics Synthesizer for drawing transparent triangles?
 
IBM Terrain Ray Casting Demo 3.2Ghz 8 SPE Cell they have texture, 30fps, 720P, 4xMSAA, no?

Also this article is interesting.

http://www.graphicshardware.org/previous/www_2005/presentations/damora-cell4graphicsandviz-gh05.pdf
I only see one or two textures there. A 7 year old GPU could handle that texturing load. That's a demostration of raycasting rather than texturing.

Here is an article about Cell texture. I would like to know your ideas. Is it true?

http://gametomorrow.com/blog/index.php/2006/03/24/cell-cant-texture/
We had a discussion about that. Here is my post calculating texturing performance:

http://forum.beyond3d.com/showpost.php?p=727617&postcount=11

RSX is about 40x faster in pure texturing tests. The problem is that the fractal shader there is rife with dynamic branching, slowing RSX to a crawl.
IBM employee Barry Minor also says that Linux PS3 can raytrace this 720P 333,000 triangle car at "interactive frame-rate" (15fps?) Is this real or just "marketing"? What do you feel?

http://www.ibmcommunity.com/images/side.jpg

He also says Linux Cell can ray-trace this a 69,000 triangle bunny at > 30fps/720P with secondary rays but G80 can only do <10fps.
That's still very slow compared to rasterization for the same models. Also, check this post discussing raytracing, rasterization, and G80 comparisons:
http://forum.beyond3d.com/showpost.php?p=1057810&postcount=21
 
Texturing

I only see one or two textures there. A 7 year old GPU could handle that texturing load. That's a demostration of raycasting rather than texturing.

We had a discussion about that. Here is my post calculating texturing performance:

http://forum.beyond3d.com/showpost.php?p=727617&postcount=11

So SPE can be great shader machine but not great for putting texture? I think I understand. What about texture lookup makes it difficult for SPE?
 
So SPE can be great shader machine but not great for putting texture? I think I understand. What about texture lookup makes it difficult for SPE?
Nothing if your texture is very small and can fit within local storage.
In case of bigger textures you have to fetch samples trough DMA without a proper caching.
 
Crazy question

Nothing if your texture is very small and can fit within local storage.
In case of bigger textures you have to fetch samples trough DMA without a proper caching.

Is it impossible to have many small textures (even copies) than few bigger textures? Or am I crazy?
 
I had this discussion with someone else on this board. DemoCoder? Talking about deferred rendering and breaking up the textures and rendering objects per texture. That is, rather than test a pixel and fetch the texture for it, fetch the texture and then render the pixels that use it. What we had down in theory looked plausible. There's no way anyone can answer your initial question because it'd take a different approach to the traditional GPU to get useful performance from Cell. The only answer is for people to actual create Cell renders in all their different forms and then compare to past hardware.
 
Is it impossible to have many small textures (even copies) than few bigger textures? Or am I crazy?
Possible yes, feasible IMHO not really.
Single big texture could become something like 1024 small tiles for rendering, so I really do not see texture tiling as a solution for the problem.

In usual game scenes there may be easily 4 1024x1024 textures on a single polygon, fetching right texture tiles in this case would be quite bad.
Having software cache would most likely work much better and it really isn't best way to use SPUs neither.

One could always limit texture size to a nice 64x64 or 32x32 and let artists to deal with the problem. ;)
 
Last edited by a moderator:
Coincidentally, this patent from SCE showed up recently, describing basically a texture cache for Cell. Not the first time someone has implemented one, but it might be interesting..

http://appft1.uspto.gov/netacgi/nph....&OS=AN/"sony+computer"&RS=AN/"sony+computer"

Texture unit for multi processor environment

[0011] Direct memory access (DMA) transfers of data into and out of the SPE local store are quite fast. A cell processor chip with SPUs may run at about 3 gigahertz. A graphics card, by contrast, may run at about 500 MHz, which is six times slower. However, a cell processor SPE usually has a limited amount of memory space (typically about 256 kilobytes) available for texture maps in its local store. Unfortunately, texture maps can be very large. For example, a texture covering 1900 pixels by 1024 pixels would require significantly more memory than is available in an SPE local store. Furthermore, DMA transfers of data into and out of the SPE can have a high latency.

[0012] Thus, there is a need in the art, for a method for performing texture mapping of pixel data that overcomes the above disadvantages.

Several SPUs performing Texture Unit operations could be comparable to dedicated graphics hardware for moderate performance. In testing, a range of 80-95% hit rate of texture already in cache was found minimizing the amount of loading of texture blocks from main memory.

Perhaps one novelty it may have over the IBM implementation is that they suggest double buffering the cache so that if there's a cache miss, it may be possible for processing to continue on other pixels using data from the other buffer. Not sure if the IBM implementation did that or not. There may be other interesting details at a lower level, I didn't really read it in much detail.
 
So SPE can be great shader machine but not great for putting texture? I think I understand. What about texture lookup makes it difficult for SPE?
There's just a lot of little calculations that can be implemented very cheaply with fixed function hardware in a GPU. This arithmetic logic has low precision and only one possible input/output source (compared to 32 bit FP or integer units in an SPE, each of which can be fed with data from any of 128 registers). A filtered texture fetch, of which RSX can do 12 billion per second, would probably take tens of cycles on an SPE from start to finish.

Is it impossible to have many small textures (even copies) than few bigger textures? Or am I crazy?
The problem is that each pixel needs data from several textures. If you have to load all your small textures one by one for each group of pixels, you'll waste a lot of memory bandwidth. That's why doing a DMA transfer to directly fetch texels for each pixel, like in the Heirich paper, is a bad idea for real world situations. The only way you can get anywhere near GPU bandwidth efficiency is with a texture cache (see below).

Coincidentally, this patent from SCE showed up recently, describing basically a texture cache for Cell. Not the first time someone has implemented one, but it might be interesting..
A texture cache on the SPE may get nice hit rates, thus reducing texture bandwidth load by maybe a factor of 5-10 and getting near GPU efficiency; however, remember that it's a software cache. It consumes cycles with every memory access to see whether the data is in the local store or not, and if not then make a DMA transfer.

Texturing is exactly the kind of work load that the local store was not designed for, because:
A) The texture does not fit in the local store
B) The texels needed by pixels aren't fixed, and you need to calculate the texture addresses on the fly
C) There is a lot of re-use of texels between pixels.

If any one of these were false, you could probably figure out a way to avoid a software cache. Nonetheless, Cell would still have to worry about all the other math that RSX's texture units do per cycle.
 
Coincidentally, this patent from SCE showed up recently, describing basically a texture cache for Cell. Not the first time someone has implemented one, but it might be interesting..

http://appft1.uspto.gov/netacgi/nph....&OS=AN/"sony+computer"&RS=AN/"sony+computer"







Perhaps one novelty it may have over the IBM implementation is that they suggest double buffering the cache so that if there's a cache miss, it may be possible for processing to continue on other pixels using data from the other buffer. Not sure if the IBM implementation did that or not. There may be other interesting details at a lower level, I didn't really read it in much detail.


PS3 Linux wise, I think this patent is much more interesting (hopefully):

http://appft1.uspto.gov/netacgi/nph...uter"+AND+2007&RS=AN/"Sony+Computer"+AND+2007

:D.

(hint: every XDR memory access coming from the FlexIO bus from the GPU, RSX, is checked in ral-time to see if it goes to touch an allowed memory range or not)
 
Procedural texture

PS3 Linux wise, I think this patent is much more interesting (hopefully):

http://appft1.uspto.gov/netacgi/nph...uter"+AND+2007&RS=AN/"Sony+Computer"+AND+2007

:D.

(hint: every XDR memory access coming from the FlexIO bus from the GPU, RSX, is checked in ral-time to see if it goes to touch an allowed memory range or not)

What does this mean my friend? Also, what about procedural texture shader? Is this possible on SPE?

Procedural texture image
http://download.profxengine.com/gallery/renders/ivy_door.jpg

Procedural Game 64kb only
http://en.wikipedia.org/wiki/.kkrieger

Or instead using small textures. What can work?
 
Last edited by a moderator:
Back
Top