News about Rambus and the PS3

Yeah teapots and bowling pin and an armadillo with like 1200 floating points ops per fragment.

Then reduce the number of FP ops :p

Deferred shading...

Don't need REYES for this.

But this will help the REYES-like renderer...


I think that a micro-polygon based renderer has place in PlayStation 3's lifetime...

Well if it can't operate on per fragment, it needs too, to be competitive with Xbox2 and GC2.

It should operate per fragment too, nothing prohibites the APUs on the Visualizer to process Pixel Programs while the APUs on the Broadband Engine do T&L and run Vertex programs... I just do not see the Pixel Engines in the Visualizer to be OVERLY complex and the architecture being generally designed to push tons of small polygons instead of bigger polygons with high degree of multi-texturing required.


Tons of simple primitives ( single textured or flat shaded ) pushed to a streamlined Rasterizer ( the Pixel Engine part of the GPU ) in large quantities by a monster CPU with tons of local bandwidth...

Shaders will used alot of textured, not just color, but for other things as well, to compute the final fragments either directly or from micropolygons. There aren't going to be alot of single textured stuff.

The Shaders can use how many textures they prefer, I remember the REYES pipeline and what happens in the Shading stage ( texture input is one of those things )... wether they are procedurally generated ones or not...

What I was referring was the configuration of the Rasterizer unit: like DeanoC was commenting in a post about REYES-like renderers a while ago, the Rasterizer doesn't need to be overly complex and doesn't need to do tons of texture layers each cycle...

Textures are sampled in the Shading phase and when the sea of micro-polygons is sent to the rasterizer what we will worry about will be if the Rasterizer can draw them on screen as fast as they come...

The micro-polygon after the Shaders does not arrive with tons of textures to be applied in layers... during the shading phase the textures were sampled and the color of the micro-polygons was processed... if we want to accelarate and use the Shaders to process the micro-polygon until a single texture remains to be applied we can... after all, the GPU will be probably supporting texturing as I do not think they can ask developers to move in mass to a REYES like processing from night to day...

You will still have people using regularl OpenGL pipeline processing ( I think we should see OpenGL 2.0 ).


That Imagine processor is alot like a single PE in BE.

Except with the fact that we should have 4 PEs in the BE and that we have much more local bandwidth thanks to the e-DRAM and the PEs should also be clocked higher than 400 MHz...

Also the BE would have a bit more local memory, the Imagine Stream Processor has 128 KB of SRF ( Stream Register File ) divided between the 8 SIMD clusters while a single PE has 128 KB and thirty-two 128 bits GPRs per each APU...

The BE should have the clock and resource advantage over the Imagine Processor...

I can see the influence of the Imagin on Cell... Sony supposely has been active collaborating with Universities world-wide and they could have brought in the Cell project some of the results they got...

Small polygon, doesn't equate micropolygons and REYES style rendering. Simple polygon, doesn't mean that to compute it you don't required alot of texture.

Look at the Stanford paper comparing REYES and OpenGL... where does the REYES pipeline spends the most time ?

In the Geometry phase ( slicing 'n dicing, Shaders, etc... ) and much less time on the Rasterizing phase...

This kind gives you an idea of where we have the processing bound part of the rendering time... adding to the Pixel Engines capability of doing two/four textures per cycle and then offering loop-back would be wasted for a REYES-like renderer, what we need is a VERY fast CPU to do the Shading part and when we have a 1 TFLOPS class CPU I think we have a nice candidate for the job...

The GPU of PlayStation 3, the Visualizer as described in the patent, contains its fair share of APUs that can assist the Pixel Engines running Pixel Programs or that can assist the over-all rendering of the REYES-like renderer by helping the BE to balance the Geometry processing load...

Distributing the processing load on a Cell system would be facilitated as the architecture was designed to have the standard units of work, the Apulets, travel from Cell to Cell to find the APU that can process them ( in short: software Cells/Apulets can migrate if the host system is running at full capacity and another connected device [the GPU would be "connected" ] has the ability of process the Apulet [and return it back in time... it would be disadvantageous if it took the Apulet more time to be sent, processed and received than waiting for a local APU to be free ).
 
Then reduce the number of FP ops

Procedural texture don't come cheap.


But this will help the REYES-like renderer...

Only if the scene is as complex as the one in the movie.


It should operate per fragment too, nothing prohibites the APUs on the Visualizer to process Pixel Programs while the APUs on the Broadband Engine do T&L and run Vertex programs... I just do not see the Pixel Engines in the Visualizer to be OVERLY complex and the architecture being generally designed to push tons of small polygons instead of bigger polygons with high degree of multi-texturing required.

Then it will be better to use per fragments and typical OGL pipe, instead of REYES. You will only want to used REYES, if the resultant image is significantly better than the one that can be done on the OGL pipe.


The Shaders can use how many textures they prefer, I remember the REYES pipeline and what happens in the Shading stage ( texture input is one of those things )... wether they are procedurally generated ones or not...

What I was referring was the configuration of the Rasterizer unit: like DeanoC was commenting in a post about REYES-like renderers a while ago, the Rasterizer doesn't need to be overly complex and doesn't need to do tons of texture layers each cycle...

Ohh, I was talking about textures with respect to amount of memory. (We are on memory thread afterall :) ) But yes, we don't need alot of texture units, with the availability of vertex and fragment shaders.

Textures are sampled in the Shading phase and when the sea of micro-polygons is sent to the rasterizer what we will worry about will be if the Rasterizer can draw them on screen as fast as they come...

I never thought the rasteriszer would be the limiting factor. Why are you looking into the rasterizer ?



Except with the fact that we should have 4 PEs in the BE and that we have much more local bandwidth thanks to the e-DRAM and the PEs should also be clocked higher than 400 MHz...

Its also scalable, similar to how PE is scalable. They can put 64 of those processor and get 1 TFLOPS.

Also the BE would have a bit more local memory, the Imagine Stream Processor has 128 KB of SRF ( Stream Register File ) divided between the 8 SIMD clusters while a single PE has 128 KB and thirty-two 128 bits GPRs per each APU...

The BE should have the clock and resource advantage over the Imagine Processor...

Of course it will. That Imagine processor is only like what 20+ million transistors.

I can see the influence of the Imagin on Cell... Sony supposely has been active collaborating with Universities world-wide and they could have brought in the Cell project some of the results they got...

The result are published, if we can get them, so would Sony.


Look at the Stanford paper comparing REYES and OpenGL... where does the REYES pipeline spends the most time ?

In the Geometry phase ( slicing 'n dicing, Shaders, etc... ) and much less time on the Rasterizing phase...

This kind gives you an idea of where we have the processing bound part of the rendering time... adding to the Pixel Engines capability of doing two/four textures per cycle and then offering loop-back would be wasted for a REYES-like renderer, what we need is a VERY fast CPU to do the Shading part and when we have a 1 TFLOPS class CPU I think we have a nice candidate for the job...

I am not arguing about, the rasterizing bit, what I am saying, using per fragment and larger polygon than micropolygon, ie the maximum image quality OGL pipes can give, while still working efficiently, will gives better performance, with similar image quality with its REYES like counterpart. Like I said before, we don't want REYES with quality compromised, that would be against the goal of that algo.
 
V3 said:
Then reduce the number of FP ops

Procedural texture don't come cheap.


But this will help the REYES-like renderer...

Only if the scene is as complex as the one in the movie.

1. we have FLOPS ;)

2. Deferred Shading reduces the Shading load... unless you are telling me that sorting the HOS/Subdivision surfaces will take more rendering time than the time we save by reducing the Shading load


It should operate per fragment too, nothing prohibites the APUs on the Visualizer to process Pixel Programs while the APUs on the Broadband Engine do T&L and run Vertex programs... I just do not see the Pixel Engines in the Visualizer to be OVERLY complex and the architecture being generally designed to push tons of small polygons instead of bigger polygons with high degree of multi-texturing required.

Then it will be better to use per fragments and typical OGL pipe, instead of REYES. You will only want to used REYES, if the resultant image is significantly better than the one that can be done on the OGL pipe.

There are advantages... using micro-polygons the size of 1/2 or 1/4th of a pixel will help with things like nice displacement mapping ( we need to work on a sub-pixel level for proper displacement mapping anyways... ).



Textures are sampled in the Shading phase and when the sea of micro-polygons is sent to the rasterizer what we will worry about will be if the Rasterizer can draw them on screen as fast as they come...

I never thought the rasteriszer would be the limiting factor. Why are you looking into the rasterizer ?

Fill-rate and Set-up engine concerns mainly...


I am not arguing about, the rasterizing bit, what I am saying, using per fragment and larger polygon than micropolygon, ie the maximum image quality OGL pipes can give, while still working efficiently, will gives better performance, with similar image quality with its REYES like counterpart. Like I said before, we don't want REYES with quality compromised, that would be against the goal of that algo.

I do not want shitty quality for a REYES-like renderer either...

What I would like is to bring an uniform approach, everything gets diced in micro-polygons and micro-polygons are the basic unit that gets Shaded...

No Vertex and Pixel Shading... only micro-polygon Shading :)

I understand REYES was thought for Quality rendering and I am not saying I would want it to be slow either...

Let's use as the REYES paper says micro-polygons of the size of 1/4th of a pixel... that will generate more micro-polygons, we can reduce the Shading Load by using doing Hidden Surface Removal processing before Shading except we would have to displace the transformed micro-polygons before culling unseen ones...

Transform, depth sort the HOS control points, slice 'n dice only visible patches plus visible and invisible ones that are using displacement mapping ( you can find ways to embed this info )... you will still have a Z-buffer to rely on...

Instead of 16 samples use 4 or 8 for stochastic AA ( it is the randomly jittered pattern that matters and 4 samples is not that low )...

There are advantages in using a REYES-like model that are unrelated to Image Quality... ( ease of programming could be one, the REYES pipeline is quite logical, uniform and neat to follow )

*Much easier to Vectorize and distribute all the shading operations across all available APUs... Shading done at a single stage in the pipeline and on uniform objects ( micro-polygons in a grid, eye-space )

*No need of perspective correction for textures ( speeds up Geometric Transform )... on PSOne titles where perspective correction was not available they solved the problem by subdividing geometry more finely thus reducing the texture warping effect... it would be interesting to take a game like BG: DA or other highly detailed next-generation games and find a way to disable perspective correction for textures... the texture warping would not be as bad in average as it was on PSOne and Saturn...

*higher texture locality: a great deal of texture trashing is avoided as Shading is done in Object order...

*easier clipping
 
I am wondering about that article from Mercury News about the PS3 CELL.

it said 72 processors. (8 PPCs and 64 APUs)

that would mean 8 PEs instead of 4, for the Broadband Engine/ PS3 CPU.

do you think they made a mistake?
 
I think IQ wise the GSCube can afford bigger textures and much higher AA thanks to the fact it has more memory available, but processing power wise the Cell set-up we have in that patent ( 1 TFLOPS class machine ) surpasses the regular GSCube...


yeah, regular GSCube has floating point performance of 97.5 GFLOPs
( 6.2 GFLOPS x 16) perhaps some overhead, or wasted FP power, or the EEs are clocked slightly lower than PS2 EEs, because 6.2 GFLOPs x 16 = 99.2 GFLOPs. but whatever, 97.5 or 99.2 GFLOPs for GSCube vs the 1TFLOPs CPU of PS3.

The GSCube has much more main memory (128MB x 16 = 2048 MB) and eDRAM video memory (32MB on GS x 16 = 512 MB eDRAM)

plus absolutely enormous raw fillrate. it's listed as 37.7 billion pixels: 16 GS x 16 pixel pipes x Mhz (144~150 Mhz)


official GSCube specs:

GScube's memory bus bandwidth is 50.3 Gbytes/second, and it has a floating-point performance of 97.5 gigaflops and 3-D CG geometric transformation of 1.04 gigapolygons/s. It has 512 Mbytes of video RAM and a VRAM bandwidth of 755 Gbytes/s. The pixel fill rate is 37.7 Gbytes/s (no that has to be GPixels!) and the polygon drawing rate is 1.2 gigapolygons/s.


that is regular GSCube with 16 PS2 chipsets and more memory per chipset. the GSCube with 64 PS2 chipsets is 4x more in every area but dont know if that version made it out.



back to regular GSCube.... the main bandwidth is 50+ GB/sec - PS3 could match that if it has 4 channels of XDR. the GSCube's eDRAM bandwidth is 755 GB/sec. it'll be interesting to see what the eDRAM bandwidth is for PS3's CPU and GPU. hopefully in the hundreds of GB/sec.

PS3 should crush GSCube in geometry processing. GSCube only does 1.2 billion raw polys/sec. PS3 should be in the billions.
 
1. we have FLOPS

Then reduce the number of FP ops

;) 1-2 TFLOPS is alot, but not plenty.

2. Deferred Shading reduces the Shading load... unless you are telling me that sorting the HOS/Subdivision surfaces will take more rendering time than the time we save by reducing the Shading load

Yes, using occlusion culling will give you speed up.

There are advantages... using micro-polygons the size of 1/2 or 1/4th of a pixel will help with things like nice displacement mapping ( we need to work on a sub-pixel level for proper displacement mapping anyways... ).

Again, this is image fidelity things, if PS3 can render those Dinosours from movie like Disney's Dinosour, than displacement mapping of that quality is needed. Anything less than that, OGL displacement mapping will most likely suffice.

I do not want shitty quality for a REYES-like renderer either...

What I would like is to bring an uniform approach, everything gets diced in micro-polygons and micro-polygons are the basic unit that gets Shaded...

No Vertex and Pixel Shading... only micro-polygon Shading

For the reasonable expectation of image quality, that we are going to get, going with vertex and fragment shading should give better performance.


I understand REYES was thought for Quality rendering and I am not saying I would want it to be slow either...

Let's use as the REYES paper says micro-polygons of the size of 1/4th of a pixel... that will generate more micro-polygons, we can reduce the Shading Load by using doing Hidden Surface Removal processing before Shading except we would have to displace the transformed micro-polygons before culling unseen ones...

Transform, depth sort the HOS control points, slice 'n dice only visible patches plus visible and invisible ones that are using displacement mapping ( you can find ways to embed this info )... you will still have a Z-buffer to rely on...

Instead of 16 samples use 4 or 8 for stochastic AA ( it is the randomly jittered pattern that matters and 4 samples is not that low )...

For PS3 with limited memory, it will probably used the bucket system, if REYES got any chance of working there.

There are advantages in using a REYES-like model that are unrelated to Image Quality... ( ease of programming could be one, the REYES pipeline is quite logical, uniform and neat to follow )

OGL pipe are quite simple already. But using REYES pipeline will give movie production an advantage, when they can reused their stuff from the movie to make game based on it. But with PS3 at 1-2 TFLOPS and limited memory, most likely they have some reworking to do.


*Much easier to Vectorize and distribute all the shading operations across all available APUs... Shading done at a single stage in the pipeline and on uniform objects ( micro-polygons in a grid, eye-space )

vertex and fragment shaders are vectorisable too.

*No need of perspective correction for textures ( speeds up Geometric Transform )...

At a cost of slicing and dicing.

*higher texture locality: a great deal of texture trashing is avoided as Shading is done in Object order...

Yes, that's desirable too.

*easier clipping

Clipping against the camera is more troublesome, than typical OGL clipping.
 
The GSCube has much more main memory (128MB x 16 = 2048 MB) and eDRAM video memory (32MB on GS x 16 = 512 MB eDRAM)

Hmm I though ps2 had 32MB main memory, which would be 512MB(32*16). Also it should be 4 * 16 for the video memory, which is 64MB, which is the same as 512 Mbytes that is state later in your post.


back to regular GSCube.... the main bandwidth is 50+ GB/sec - PS3 could match that if it has 4 channels of XDR. the GSCube's eDRAM bandwidth is 755 GB/sec. it'll be interesting to see what the eDRAM bandwidth is for PS3's CPU and GPU. hopefully in the hundreds of GB/sec.

Again you are trying to change bits into bytes, 755gigabits is just under 100GB/sec, which is very possible for ps3 to achieve.
 
Hmm I though ps2 had 32MB main memory, which would be 512MB(32*16). Also it should be 4 * 16 for the video memory, which is 64MB, which is the same as 512 Mbytes that is state later in your post.

He's talking about the GScube though, not a bunch of parallel PS2s...
 
I am not going to be in topic but if the ps3 has 256 of main ram it will be a big mistake. It'sgoing to be like 32 mb inthe year 2005/6. Sony should really think twice
 
archie4oz said:
Hmm I though ps2 had 32MB main memory, which would be 512MB(32*16). Also it should be 4 * 16 for the video memory, which is 64MB, which is the same as 512 Mbytes that is state later in your post.

He's talking about the GScube though, not a bunch of parallel PS2s...

Ah I see, I got confused with I seen MBytes written out, just glanced and though it said Mbits.

Ah, so the GSCube is not 16 PS2's in a box, it has much more memory per EE and GS?
 
V3,

Remember that Open Source REYES renderer you gave me the link of ?

Well before putting it on PlayStation 2 Linux, I downloaded on my Red HAT box at work ( EV56 400 MHz and 256 MB of RAM, Red HAT 7.2 )...

It was a bit of a pain to compile..

The author had set in all the makefiles the home directory as <user>/src/reyes, the bool.h file was missing, define statement lacked, it could not link -lg++ ( I was able to "succesfully" compile skipping it ), &lt;string.h> typed as &lt;String.h>...

What made it all worse is that the program Seg Faults and the author e-mail is not working anymore :(
 
Panajev2001a,

Yeah, that site is old, Its on my bookmark, I was suprised its still up.

There is another one, real-time one using OGL, its probably on my bookmark but my bookmark is unorganised and had like several thousands entries, can't find it :oops:

Maybe if you google around, the site might still be up.

Tsmit42,

Ah, so the GSCube is not 16 PS2's in a box, it has much more memory per EE and GS?

Yeah, its different. If you want parallel PS2s, there was another article on it posted on this board too, look several pages back.
 
Is Reyes style rendering on PS2 even a good idea? When the GS fills triangles in untextured mode, it does so in a 4 * 4 pattern. Wouldn’t only filling one pixel or less mean that it only used a fraction of it potential fillrate?
I seem to recall that the optimal size for a PS2 triangle is 32 pixels?

Is it even technically possible to use something near the full fillrate when rendering untextured or textured geometry on ps2?

Do other architectures like PC/xbox or Gamecube have similar problems with attaining their full fillrate, due to similar “nigglesâ€￾ in the actual filling process?
 
Squeak said:
Is Reyes style rendering on PS2 even a good idea? When the GS fills triangles in untextured mode, it does so in a 4 * 4 pattern. Wouldn’t only filling one pixel or less mean that it only used a fraction of it potential fillrate?
I seem to recall that the optimal size for a PS2 triangle is 32 pixels?

Is it even technically possible to use something near the full fillrate when rendering untextured or textured geometry on ps2?

Do other architectures like PC/xbox or Gamecube have similar problems with attaining their full fillrate, due to similar “nigglesâ€￾ in the actual filling process?

AFAIK most architectures would lose ALOT of fill rate if used for a Reyes style renderer. The all use 2x2 or larger patterns, with any pixel not covered by the current triangle being wasted.
 
PS2 has 32 MB main memory and 4 MB on GS

however

GSCube has 128 MB main memory per PS2 chipset and 32 MB eDRAM on each GS

so GSCube has:

128 MB x 16 = 2048 MegaBytes or 2 GigaBytes main memory
32 MB eDRAM x 16 = 512 MegaBytes eDRAM graphics memory
 
"Again you are trying to change bits into bytes, 755gigabits is just under 100GB/sec, which is very possible for ps3 to achieve."

but actually we are talking about GSCube here, and it is 755 GigaBytes per second of video memory bandwidth. not 755 gigabits. it comes from PS2's 48GB/sec eDRAM bandwidth on GS x 16 GS's on GSCube.

GSCube GS's have roughly the same bandwidth as PS2 GS but it also has 32 MB eDRAM instead of 4 MB eDRAM.

PS3's eDRAM bandwidth could very well be in the 100s of GigaBytes per second if the main memory bandwidth is 25.6 GB/sec.

look at PS2. it's main memory bandwidth is only 3.2 GB/sec but it's video memory bandwidth is 48 GB/sec - that's 15x greater. if the same ratio applied to PS3 it would have 384 GB of graphics memory bandwidth (25.6 x 15 ) and don't forget PS3 will have 2 pools of eDRAM. the GPU and the CPU. so PS3 will likely have two processors each with hundreds of GB/sec of eDRAM bandwidth.
 
megadrive0088 said:
"Again you are trying to change bits into bytes, 755gigabits is just under 100GB/sec, which is very possible for ps3 to achieve."

but actually we are talking about GSCube here, and it is 755 GigaBytes per second of video memory bandwidth. not 755 gigabits. it comes from PS2's 48GB/sec eDRAM bandwidth on GS x 16 GS's on GSCube.

GSCube GS's have roughly the same bandwidth as PS2 GS but it also has 32 MB eDRAM instead of 4 MB eDRAM.

PS3's eDRAM bandwidth could very well be in the 100s of GigaBytes per second if the main memory bandwidth is 25.6 GB/sec.

look at PS2. it's main memory bandwidth is only 3.2 GB/sec but it's video memory bandwidth is 48 GB/sec - that's 15x greater. if the same ratio applied to PS3 it would have 384 GB of graphics memory bandwidth (25.6 x 15 ) and don't forget PS3 will have 2 pools of eDRAM. the GPU and the CPU. so PS3 will likely have two processors each with hundreds of GB/sec of eDRAM bandwidth.


I understand, but you originally posted 755gigabits before you made the edit to your post.
 
heres what I think the specs will be for the PS3.... the wording is kept as close as I can make it to how they would be listed in a real specs list:


theoretical polygon rate of 15 billion per second.....

ACTUAL draw rate will be about 10-25% of that figure so we are looking at an actual rate of 1.5-4b per second (polys)... (which at 60fps gives you over 11 million on each and every frame !!

pixel fill rate will be about 60 - 80 gig (60 - 80,000,000 pixels a second).... the texel fill rate will be about the same as pixel fil.

memory band width... I think will be about between 75 - 150 gigbytes (sorry for the range on that (depends if its a 512bit or 1024bit bus) again there is a formula to calculate that I can give if required.

the core processor will run at 4ghtz.... (totalling 1 teraflop of TOTAL through put, broken down in to around 300gflops of complex/spagetti code as its known...(supplied by the 4 main processor cores on the CELL)and 700 gflops of simple floating point arithmetic (supplied by the fp co processors along side the four main cores, and are also on the CELL)

the graphics chip will run at about a half of that so 2ghtz.....
Will be built on .065nm fabrication technology....

I think the core processor will comprise of 250million transistors....
the graphics chip will be about 400 million transistors....

I think the machine will have 512mb of memory.

the render precision will be at 128 bit

As for what the graphics quality and output will be..... well all I can think is 'toy story to warcraft 3 fmv's maybe better in real time' and thats not just marketing talk....



what do you guys think?
 
theoretical polygon rate of 15 billion per second.....

This was my prediction also..

pixel fill rate will be about 60 - 80 gig (60 - 80,000,000 pixels a second).... the texel fill rate will be about the same as pixel fil.

Way too high.

memory band width... I think will be about between 75 - 150 gigbytes

Internal or external memory? If external, I am unsure. Current predictions are 25gb/s.

the core processor will run at 4ghtz

3-4 is my guess.

the graphics chip will run at about a half of that so 2ghtz.....

Way way too high, expect 800mhz - 1Ghz.

I think the machine will have 512mb of memory.

I too, think this will be the external memory.

As for what the graphics quality and output will be..... well all I can think is 'toy story to warcraft 3 fmv's maybe better in real time' and thats not just marketing talk....

I expect Final fantasy the movie type tech demo's, with the actual graphics being that of final fantasy X quality CGI.
 
I expect Final fantasy the movie type tech demo's, with the actual graphics being that of final fantasy X quality CGI.

woa don;t you think (FFTSW) tech demos is putting the bar in the stratospere? something resembling maybe.
 
Back
Top