RSX: memory bottlenecks?

ihamoitc2005 said:
What I am curious is PS2 emulation method. Maybe full PS2 is inside component? I dont now. If someone has this knowledge please make post on this. Thank you.

Well i would assume that they'd emulate on Cell, The SPE's hasve more than enough horsepower to software render the EE effects and so does RSX. I expect they'll use the "Brute Force" method of emulating :)
 
!eVo!-X Ant UK said:
Well i would assume that they'd emulate on Cell, The SPE's hasve more than enough horsepower to software render the EE effects and so does RSX. I expect they'll use the "Brute Force" method of emulating :)

It's the 48GB/s bandwidth inside the GS to its own eDRAM that's the problem to emulate on PS3, seen how it doesn't have any eDRAM and the main RAM bandwidth or GDDR bandwidth are lower than that.
 
Could the RSX have been rewored to have EDRAM, but only for BC purposes, as 4MB EDRAM wouldn't help that much in todays games, or would it?...
 
Platon said:
Could the RSX have been rewored to have EDRAM, but only for BC purposes, as 4MB EDRAM wouldn't help that much in todays games, or would it?...

It "could"... But then again, anything "could" happen... We'll just have to wait and see. Personally i'm not expecting any eDRAM. Besides, if it's there, why use it only for BC? Sure it's not as big as it should be, but it would definately turn out useful for some things.
 
I put my money on the "full PS2 inside PS3" as was with PS2 (the PSOne chip(s?) acted as IO controller or somesuch).
The EE+GS don't need any active cooling in PSTwo, and I guess they take little enough board space and is cheap to manufacture to make it viable option for PS3 backwards compatibility.
I don't really know how the Cell + RSX arhchitecture would accept the EE + GS as an IO chip, and maybe even as a chip that could assist in the next gen EyeToy image processing/motion capture, sound processing... but somehow I just feels it would be easier and cheaper for Sony to go that way instead of software emulation which must take a lot of work if they hope to get even 90% games to work.
Edit: Oh yes, the EDRAM of EE+GS... I don't know would it have any use beyond the BC in PS3.
 
Last edited by a moderator:
london-boy said:
It's the 48GB/s bandwidth inside the GS to its own eDRAM that's the problem to emulate on PS3, seen how it doesn't have any eDRAM and the main RAM bandwidth or GDDR bandwidth are lower than that.
Was the 48GB/s ever fully topped. I guess there was a lot of headroom with the bottlenecks being elsewhere, and that in reality the PS3 bandwith could be enough.
All guesses on my part of course, as I have no deeper knowledge of PS2 workings.
 
Kutaragi's commented a fair bit on BC. He's said they can use Cell for a software emu but he wants hardware assistance to get things 'perfect' and accomodate developers doing things in a less than usual manner. He's also said there's no eDRAM in RSX as you'd need loads of eDRAM if you aren't tile rendering to fit 1080p.

From this it's pretty safe to say, assuming things haven't changed, there's a degree of hardware BC and no eDRAM, or full PS2 chipset otherwise why aste time with software emu? Even if the PS3 chipset is only 5 quid, that'd be like 500 million pounds over the life of PS3. If they can save the cash by using PS3's hardware it'd make sense to do so.

The 48 GB/s seems the limiting factor, but PS2 didn't support hardware compression. At an average 4x compression the GDDR BW is plenty enough. I don't know what the difference in latency would be though between this and PS2's eDRAM. My guess is for some GS emulation on RSX, which accounts in part for RSX's long development from just a G70, and software EE emulation on Cell.
 
nAo said:
One might want to use that 'remaining part' to store all those kind of data requiring a moderate amount of bandwith.

What's a moderate amount of bandwidth? :p

I've wondered for a while what might be a reasonable amount of VRAM bandwidth to reserve from the framebuffer in order to exploit the remaining VRAM for texture/vertex reads etc. I've been guessing low-to-medium single digits, maybe 4-9GB/s? I know there are so many factors affecting this, and it'd be so variable but still, I'm wondering if you have any specific range in mind when you say "moderate"?
 
For texture fetches I guess you could calculate based on texture fetches. Assuming one surface per pixel (ignoring alpha blended effects), at 5 textures per surface, 32 bits per texture, that'd be 2 million pixels (1080p) * 20 bytes = 40 megabytes a frame, 240 MB/s. That's uncompressed data too. If this back-of-an-envelope maths is valid, texture BW shouldn't be massive. :???:
 
london-boy said:
It's the 48GB/s bandwidth inside the GS to its own eDRAM that's the problem to emulate on PS3, seen how it doesn't have any eDRAM and the main RAM bandwidth or GDDR bandwidth are lower than that.

Could'nt you do it on Cell, and have the EIB act as the 48gb/s and have the SPE's write the data into the VRAM ?? Or am i talking trash??? i am trying though :)
 
!eVo!-X Ant UK said:
Could'nt you do it on Cell, and have the EIB act as the 48gb/s and have the SPE's write the data into the VRAM ?? Or am i talking trash??? i am trying though :)

The internal cache of Cell isn't 4MB.

Whatever way you look at it, without some more hardware somewhere, the PS2 eDRAM (either the bandwidth or just the size for the PS3 chips cache) just won't fit on PS3.
 
Shifty Geezer said:
If this back-of-an-envelope maths is valid, texture BW shouldn't be massive. :???:

More like back of a tiger trying to eat you.
I'd love to see that actually, trying to scribble numbers on a tiger's back, while the beast tries to eat your appendages...
 
Shifty Geezer said:
For texture fetches I guess you could calculate based on texture fetches. Assuming one surface per pixel (ignoring alpha blended effects), at 5 textures per surface, 32 bits per texture, that'd be 2 million pixels (1080p) * 20 bytes = 40 megabytes a frame, 240 MB/s. That's uncompressed data too. If this back-of-an-envelope maths is valid, texture BW shouldn't be massive. :???:

40 MB/frame -> 240 MB/s implies 6 Frames per second! Hmmm, must be Killzone!

60 Fps -> 2.4 Gb/sec...

x 4 texels per sample

x 24 TMUs

~ 230 GB/sec!
 
So I was out by a factor of ten. Big deal. NASA has larger tolerances than that :p
That full texturing capacity figure, 230 GB/s to feed 24 TMUs, sounds like a waste of silicon to me. Are those TMU's ever going to be sharing data? Presumably that number is there to support texturing when the pipes not doing something else. A simultaneous 24 way grab at RAM will course a good many stalls.
 
Shifty Geezer said:
That full texturing capacity figure, 230 GB/s to feed 24 TMUs, sounds like a waste of silicon to me. Are those TMU's ever going to be sharing data? Presumably that number is there to support texturing when the pipes not doing something else. A simultaneous 24 way grab at RAM will course a good many stalls.

Batches of 1024 fragments (given G70) should be executed across each quad and these fragments maybe involved in dependent texturing, i.e. a result of one texture op is needed by another... but I'm not sure if that would remain within a batch or needed across batches...

That 230 Gb/s is a worst case scenario from your description. I don't think 24 TMUs are a waste, as long as they can be used. Utilizing 24 TMUs in 1 cycle is unlikely but you can think of it as load balancing between peak texture ops and shader ops (bandwidth permitting)...

Texture compression and reducing the number of texture layers would reduce that number by an order of magnitude. Also in certain cases, procedural texturing could be used instead to save B/W...
 
All 24 TMUs in G70 are working in lockstep. Don't forget that each quad shares an L1 cache, and that the entire set shares an L2 cache.

A lot of TMU operations will re-use nearby texture data (due to a degree of texture "prefetch" which is a side effect of the tiled nature of textures across RAM chips).

Jawed
 
Jawed said:
All 24 TMUs in G70 are working in lockstep.

What do you mean by lockstep, that all 24 TMUs will always be used?

Each quad is an independent SIMD superscaler processor. It's quite conceivable that a fragment batch/ thread being executed by a quad may not have any texture instructions and therefore TMUs within that quad may not be utilized. Or did you mean something else?
 
You're right, the quads in G70 are MIMD - I was thinking of NV40, where the quads are lockstep.

Jawed
 
Jawed said:
You're right, the quads in G70 are MIMD - I was thinking of NV40, where the quads are lockstep.

Jawed

Really? Do you have a source for that info? I thought G70 was SIMD across all quads.
 
Back
Top