Could..in THEORY Cell do FSAA.

jvd · Jul 17, 2005

Jaws once again the bandwidth needed for super sampling and hdr is not what you quoted. The rsx will have to acess ram to store the information its working on . Thus there are 2 busses to ram . A 25. something and a 22. something which acesses the ram . However the cell chip will also need to acess one of the two pools of ram .

The 7800gtx however never has to share its dedicated ram buss .

The bandwidth between the rsx and cell doesn't matter unless you truely belive that tiling the framebuffer into the spe's small ram pool is going to be effective for housing a hdr + ssaa data .

MechanizedDeath · Jul 17, 2005

jvd said:
Jaws said:

jvd said:

...
The fact of the matter is the rsx will have the same or less bandwidth as a 7800gtx .
...

Click to expand...

Please stop posting misinformation. We've been through this on many occasions. The current fact in this context, is that the RSX has 22.4+35 ~ 57 GB/sec of bandwidth available to it, although with varying latencies.

Click to expand...

Please stop posting misinformation . We've been through this before . The cell cpu will still need to acess ram also .

aside from that to actually store a hdr + ssaa buffer and other relevent data the only bandwidth numbers that matter are the bandwidth to ram . Which would be

25.6 and 22.3 giving 47.9 but that is if the cell chp never has to use any of the ram badnwdith for itself .

35GB is from FlexIO to RSX. 25.6GB is from XDR to Cell. XDR access has no impact on FlexIO. That 35GB/s is steady, so *IF* RSX just works back and forth with the SPEs or PPE and never touches XDR, then that's a legit 35GB/s, as has been stated numerous times. Unless you can prove that RSX HAS to hit up XDR, then maybe you should drop this line of argument. PEACE.

jvd · Jul 17, 2005

35GB is from FlexIO to RSX

except there is no ram at the other end of this . Except the cache and small ram in the cell processer . Thus it doesn't help with storing hdr + ssaa data needed to be worked on and read and write too

That 35GB/s is steady, so *IF* RSX just works back and forth with the SPEs or PPE and never touches XDR, then that's a legit 35GB/s, as has been stated numerous times

Except as i've sated there is no where to work on the hdr / ssaa data . So what are you doing with this bandwidh

Can you provide the data of how exactly your going to do hdr + ssaa in the cell's ram with out hitting the xdr ram ?

It simply wont .

Inane_Dork · Jul 17, 2005

jvd said:
Can you provide the data of how exactly your going to do hdr + ssaa in the cell's ram with out hitting the xdr ram ?

It simply wont .

nAo, IIRC, was thinking of storing the framebuffer in the LS of various SPEs. It would perform quickly enough if the RSX is able to address LS, which is not known to be feasible to me.

j^aws · Jul 17, 2005

jvd said:
Jaws once again the bandwidth needed for super sampling and hdr is not what you quoted. The rsx will have to acess ram to store the information its working on . Thus there are 2 busses to ram . A 25. something and a 22. something which acesses the ram . However the cell chip will also need to acess one of the two pools of ram .

The 7800gtx however never has to share its dedicated ram buss .

The bandwidth between the rsx and cell doesn't matter unless you truely belive that tiling the framebuffer into the spe's small ram pool is going to be effective for housing a hdr + ssaa data .

You still do not understand.

Code:

CELL + RSX Block:

................L2_Cache_[PPE]..................TurboCache_[RSX]
................|...............................|...............
................|...............................|...............
................+---&lt;---+---&lt;----FlexIO---->----+MMU............
................|EIB....|.......................|...............
................|4xRings|.......................|...............
....7x_LS-------+--->---+.......................GDDR3...........
....[SPEs]..............|.......................................
........................|.......................................
........................XDR.....................................

http://www.beyond3d.com/forum/viewtopic.php?t=24483

I suggest you do some more reading/understanding on the subject. I've provided some links above.

Look carefully at the bus structures. Bandwidth is data flow. The fact is that RSX has 57 GB/sec *availlable* to *RSX*.

It has that much *dataflow* whether CELL is using 1 GB/sec or 25 GB/sec. RSX *still* has 57 GB/sec available to *RSX*.

What you posted earlier is not fact but WRONG.

jvd said:
...
The fact of the matter is the rsx will have the same or less bandwidth as a 7800gtx .
...

No. This is wrong. It's 57 Gb/sec with varying latencies.

jvd · Jul 17, 2005

Inane_Dork said:
jvd said:

Can you provide the data of how exactly your going to do hdr + ssaa in the cell's ram with out hitting the xdr ram ?

It simply wont .

Click to expand...

nAo, IIRC, was thinking of storing the framebuffer in the LS of various SPEs. It would perform quickly enough if the RSX is able to address LS, which is not known to be feasible to me.

Yes there have been ideas of what to do . However the LS wont be big enough to fit all the data in . Thus your going to have to tile it . Your going to have to store the tiles some where while they aren't being worked on .

Fafalada · Jul 17, 2005

It would perform quickly enough if the RSX is able to address LS

LS resides in same address space as XDR - so either RSX can address both, or neither.

jvd, FB is not the only bandwith that will ever be needed, in fact there's nothing saying it will even be the largest.

jvd · Jul 17, 2005

jvd, FB is not the only bandwith that will ever be needed, in fact there's nothing saying it will even be the largest.

No of course not . However we are specificly talking about hdr + fsaa being done .

Its all good and dandy if the cell makes some textures and sends them to the rsx and other data that would be needed between the two .

But we are talking specificly about hdr + ssaa .

So as I said the data will need to be stored while worked with so where you going to store it ? There are only two areas large enough to store the data needed where it wont cripple or greatly reduce the efficency of other functions and thats the two ram pools .

The flexio bus between the two doesn't come into effect when talking about hdr + ssaa

Titanio · Jul 17, 2005

Fafalada said:
It would perform quickly enough if the RSX is able to address LS

Click to expand...

LS resides in same address space as XDR - so either RSX can address both, or neither.

I've a question you might be able to answer - the SPEs can snoop data off Flexio into LS, IIRC, but does that data have to pass on through the EIB and onto XDR, or if it's not needed in XDR can you prevent that? In other words, when the SPEs take data off Flexio, do they copy it off, or take it off completely?

Shifty Geezer · Jul 17, 2005

The FlexIO is direct to Cell internal storage and is seperate to other RAM bandwidths. FlexIO access doesn't interfere with XDR or DDR. I think XDR access goes through FlexIO (Jaws' diagram shows this) so to access XDR, you have to eat into FlexIO BW.

Now onto the topic of backbuffers!

I'm with jvd here. We know backbuffer work is very consumptive of BW; to the point that MS went with a relatively low system BW and saving a lot with seperate backbuffer space. HDR+SSAA needs a lot of BW to write data into a cohesive backbuffer. If RSX is to work on a single monolithic data structure, it's gonna need a large store and that means DDR. 22 GB/s isn't enough for all the fancy effects, especially at higher res. BW saving, such as sending data to RSX from Cell and thus not eating into RAM BW, means all that may be available. But it's still not enough.

The only solution is a massively tiled renderer on Cell, using SPE LS to process tiddly little tiles. This adds an extra 35 GB/s but is it feasible and will it impact heavily on the performance of Cell? And is that much BW still going to be enough?

Titanio · Jul 17, 2005

Shifty Geezer said:
The only solution is a massively tiled renderer on Cell, using SPE LS to process tiddly little tiles. This adds an extra 35 GB/s but is it feasible and will it impact heavily on the performance of Cell? And is that much BW still going to be enough?

Well, once the tiles are in the SPEs, then the bandwidth intensive ops themselves would be eating SPE-to-LS bandwidth, no?

I honestly don't know how feasible it is, but as a concept, moving the frame into and out of SPE LS once per op (for AA, or HDR or whatever) shouldn't eat too much BW?

Fafalada · Jul 18, 2005

jvd said:
But we are talking specificly about hdr + ssaa .

Fair enough - but I don't feel qualified to discuss this in detail until I know more internal details about RSX and the rest of the system. But for what's worth, I can see ways to render this without running into bandwith issues - it's the SSAA incurring extra shader costs that worries me really.
That and the more concerning question - could we do non-ordered grid SS?

Titanio said:
In other words, when the SPEs take data off Flexio, do they copy it off, or take it off completely?

Well if that data was sent into SPEs localstore to begin with, it'll only go that far from what I understand(and possibly up to L2 if it's flagged as cacheable). XDR doesn't contain shadow copies of SPE localstores.

Shifty Geezer · Jul 18, 2005

Titanio said:
Shifty Geezer said:

The only solution is a massively tiled renderer on Cell, using SPE LS to process tiddly little tiles. This adds an extra 35 GB/s but is it feasible and will it impact heavily on the performance of Cell? And is that much BW still going to be enough?

Click to expand...

Well, once the tiles are in the SPEs, then the bandwidth intensive ops themselves would be eating SPE-to-LS bandwidth, no?

Of course. Silly me

Acert93 · Jul 18, 2005

Inane_Dork said:
jvd said:

Can you provide the data of how exactly your going to do hdr + ssaa in the cell's ram with out hitting the xdr ram ?

It simply wont .

Click to expand...

nAo, IIRC, was thinking of storing the framebuffer in the LS of various SPEs. It would perform quickly enough if the RSX is able to address LS, which is not known to be feasible to me.

I am not sure how people can realistically talk about the performance hit Xenos will take with tiled rendering (e.g. 3 tiles for 720p FP10 4x MSAA with the 10MB eDRAM die... 6 tiles for FP16) and then turn around and honestly suggest doing the same process on non-dedicated silicone that has much, much smaller memory blocks.

We know Xenos has some hardware work arounds to make the process go smooth, but it STILL has to spend time fixing the tile seams. The more tiles, the larger hit to performance. ATI is talking about ~5% with 3 tiles.

With 256KB tile segments you are talking about 40 tiles with the PS3 instead of 3 tiles with Xenos (or worse, almost 80 tiles with FP16... and this is all at 720p. moving on up to 1080p basically doubles the tiles to the 160 range... sorry, to lazy to do all the exact math right this moment).

And then there is the issue of maxing out CPU performance for graphical tasks. Where is the game code going to run? On one PPE?

Is it possible? Almost anything is possible with processors.

Is it a realistic use and approach to the hardware? . . .

In the end PS3 developers will focus, and exploit, the strengths of the systems. Using up the SPEs' LS memory seems like a waste of silicone and effort. This obsession with trying to one-up the competition on every checkbox item is old. They each have strengths. End of story. And entire 3D console generation of HW (or 2!) did well without much AA or HDR. Look at the NV "Max" demo. Games can look breathtaking without using both features.

And I am sure consumers would rather see the SPEs used for stuff like intense geometry and vertex deformation (like realistic waves and destructable buildings!), accurate physics, extreme amounts of particle effects, advanced AI, interactive environments, etc... These are CELLs strengths and I am hoping it is used in these areas, not trying to "one up" the Xbox. They are different designs, there are going to be areas where one will be better in a specific task, benchmark, or metric. It is not the end of the world!

(Ps- I know Inane you are not suggesting this... your post seemed like a good launching point)

Tap In · Jul 18, 2005

good post Acert

PS3 has its strengths and X360 has theirs.

Trying to match it item for item by utilizing hardware more well suited to other needs seems like a waste of energy.

I really wonder how many devs would even bother (even if it is possible).

These systems are different and they were designed that way on purpose.

I'm excited to see how they each will be used to their own strengths without trying to force an issue to match the other system.

aaaaa00 · Jul 18, 2005

Just a minor spelling nitpick:

silicone -

Any of a group of semi-inorganic polymers based on the structural unit R2SiO, where R is an organic group, characterized by wide-range thermal stability, high lubricity, extreme water repellence, and physiological inertness and used in adhesives, lubricants, protective coatings, paints, electrical insulation, synthetic rubber, and prosthetic replacements for body parts.

silicon -

A nonmetallic element occurring extensively in the earth's crust in silica and silicates, having both an amorphous and a crystalline allotrope, and used doped or in combination with other materials in glass, semiconducting devices, concrete, brick, refractories, pottery, and silicones. Atomic number 14; atomic weight 28.086; melting point 1,410Â°C; boiling point 2,355Â°C; specific gravity 2.33; valence 4.

People confuse the two all the time, even though it's really really hard to make an IC out of silicone.

Jawed · Jul 18, 2005

Xenos - 192 simple but dedicated processors running at 500MHz

Cell - 7 "general purpose" SPEs running at 3.2GHz

Jawed

BenSkywalker · Jul 18, 2005

To the general bandwidth issue- running HDR and rendering out duplicate frames you are going to be GPU limited in almost any situation. Running MSAA consumes almost the same amount of bandwidth as SSAA- the problem with SSAA has been the brute force it takes in terms of fillrate(obviously now shader power). You utilize a relatively decent shader load and HDR then bandwidth is going to be far removed from your largest priorty the overwhelming majority of the time. If we were talking about a MS solution then bandwidth would be the limiting factor- but not using SS.

Faf-

That and the more concerning question - could we do non-ordered grid SS?

Shouldn't you be able to use the VSs to set up a nigh redundant frame at a sub pixel offset of your choosing? Not sure, but I would assume that this level of fcuntionality should be well within the limits of the RSX.[/b]

Fafalada · Jul 18, 2005

BenSkywalker said:
Shouldn't you be able to use the VSs to set up a nigh redundant frame at a sub pixel offset of your choosing?

With acumulation we might run in trouble with texture bandwith... (or even vertex, depending on what you're doing).

blakjedi · Jul 18, 2005

Jawed said:
Xenos - 192 simple but dedicated processors running at 500MHz

Cell - 7 "general purpose" SPEs running at 3.2GHz

Jawed

My thoughts exactly...

Could..in THEORY Cell do FSAA.

jvd

MechanizedDeath

jvd

Inane_Dork

Rebmem Roines

j^aws

jvd

Fafalada

jvd

Titanio

Shifty Geezer

uber-Troll!

Titanio

Fafalada

Shifty Geezer

uber-Troll!

Acert93

Artist formerly known as Acert93

Tap In

aaaaa00

Jawed

BenSkywalker

Fafalada

blakjedi

Similar threads