Xbox360 Texture limitation ?

version

Regular
both gpus read from ram , geometry, texture, and game datas
xbox360 has 22GB/s , ps3 22+25 ram bandwirh

if gameengine use 10GB gamedata, 5 GB geometrydata then texture bandwith will be very low on xbox360 about 7 GB/s,
ps3 use about 20 GB/s

comp.JPG
 
You forget the framebuffer bandwith. Calculate how much the RSX would use with 4xAA, 1280*720 color + Z, and of course with overdraw, and additional rendertargets... and see how much of that extra 22GB is left for anything :)
 
Laa-Yosh said:
You forget the framebuffer bandwith. Calculate how much the RSX would use with 4xAA, 1280*720 color + Z, and of course with overdraw, and additional rendertargets... and see how much of that extra 22GB is left for anything :)

how much the framebuffer bandwith in general case ?
 
Culled from:

http://www.xbitlabs.com/news/multimedia/display/20040426094105.html
http://www.beyond3d.com/previews/nvidia/nv40/index.php?p=8

xbox2_scheme_bg.gif


Leak said:
...
The Xenon GPU is a custom 500+ MHz graphics processor from ATI. The shader core has 48 Arithmetic Logic Units (ALUs) that can execute 64 simultaneous threads on groups of 64 vertices or pixels. ALUs are automatically and dynamically assigned to either pixel or vertex processing depending on load. The ALUs can each perform one vector and one scalar operation per clock cycle, for a total of 96 shader operations per clock cycle. Texture loads can be done in parallel to ALU operations. At peak performance, the GPU can issue 48 billion shader operations per second.

The GPU has a peak pixel fill rate of 4+ gigapixels/sec (16 gigasamples/sec with 4× antialiasing). The peak vertex rate is 500+ million vertices/sec. The peak triangle rate is 500+ million triangles/sec. The interesting point about all of these values is that they’re not just theoretical—they are attainable with nontrivial shaders.

Xenon is designed for high-definition output. Included directly on the GPU die is 10+ MB of fast embedded dynamic RAM (EDRAM). A 720p frame buffer fits very nicely here. Larger frame buffers are also possible because of hardware-accelerated partitioning and predicated rendering that has little cost other than additional vertex processing. Along with the extremely fast EDRAM, the GPU also includes hardware instructions for alpha blending, z-test, and antialiasing.
...
...
Eight pixels (where each pixel is color plus z = 8 bytes) can be sent to the EDRAM every GPU clock cycle, for an EDRAM write bandwidth of 32 GB/sec. Each of these pixels can be expanded through multisampling to 4 samples, for up to 32 multisampled pixel samples per clock cycle. With alpha blending, z-test, and z-write enabled, this is equivalent to having 256 GB/sec of effective bandwidth! The important thing is that frame buffer bandwidth will never slow down the Xenon GPU.
...
I've lost track of the full text of the leak :(

Jawed
 
I expect that second gen. and later games for xbox360 make use of procedural textures as the r500 has some improvements for that area
 
version said:
Laa-Yosh said:
You forget the framebuffer bandwith. Calculate how much the RSX would use with 4xAA, 1280*720 color + Z, and of course with overdraw, and additional rendertargets... and see how much of that extra 22GB is left for anything :)

how much the framebuffer bandwith in general case ?

I don't know what "the general case" is but I figure a 1280*720 colour (32 bit) z buffer (16-bit - that ok?) would consume ~1.6GB/s of bandwidth for 60 frames per second, with 5x overdraw. Someone might want to check my figures though, I'm kinda just trying to figure this out myself. I don't know how AA would affect that figure though (4x AA = 4x the bandwidth?) - and what's overdraw like in a modern game? Is 5x high, low, middle? Also not sure how many rendertargets a game would typically be juggling, and I'm also not sure if they'd all be of the same precision (?)

In terms of pulling things from XDR, forgetting about latency for a second and just looking at bandwidth, with the CPU-to-GPU bandwidth being 35GB/s, and XDR being 25GB/s, can we treat the GPU's effective bandwidth to the XDR memory as being near enough to 25GB/s, if it had that memory all to itself?

If that was the case, then surely PS3's GPU in terms of main system memory bandwidth would have whatever X360's GPU has (XDR bandwidth minus CPU usage actually leaves a little more) + 22GB/s? I don't think the framebuffer would really offset that entirely (?) If the CPU takes 10GB/s then that leaves 37GB/s for PS3's GPU versus 12GB/s for X360's (this is assuming of course that the flexio interconnect allows the GPU to take as much data as xdr can feed it, which might be a little too much of an assumption, so correct me if I'm wrong!).

Might be a good time to look up NVidia's turbocache technology..;)
 
Cell -> RSX is 20GB/s write. In the other direction it's 15GB/s.

The asymmetry may well imply that it's full-duplex.

Jawed
 
Jawed said:
Cell -> RSX is 20GB/s write. In the other direction it's 15GB/s.

The asymmetry may well imply that it's full-duplex.

If it were fill-duplex one would expect the read and write bandwidth to be exactly the same.

But we already know that the read and write channels use different, uni-directional, lanes.

Cheers
Gubbi
 
Gubbi said:
If it were fill-duplex one would expect the read and write bandwidth to be exactly the same.
Only if it was a conventional bus.

But we already know that the read and write channels use different, uni-directional, lanes.
I didn't know that (I was inferring it, though), thanks.

Jawed
 
This actually gets into a question I have. With 10MB of edram and a throughput of 32GB/s, you could (if you assume peak throughput and constant writes) refill the entire memory pool 320 times per second. Even if you assume half duplex for reading/writing and some overhead, you can still read and write out of the memory pool at a much faster speed than 60 times per second.

I guess my question is, does it even matter if the edram runs faster than 32GB/s for what it's designed to do? As Titanio asked, how does AA factor into this?

Nite_Hawk
 
Talking to one developer recently about the memory split on PS3 he said that they would generally budget around a 50%/50% split between system and graphics memory anyway, so its probably the exception rather than the norm that developers would place texture data outside of graphics RAM anyway. There is 256MB on the graphics for a reason, its not there not to be used.
 
DaveBaumann said:
Talking to one developer recently about the memory split on PS3 he said that they would generally budget around a 50%/50% split between system and graphics memory anyway, so its probably the exception rather than the norm that developers would place texture data outside of graphics RAM anyway. There is 256MB on the graphics for a reason, its not there not to be used.

Dave... X360 thread...? Any info on the X360 instead? ;)
 
So...is the eDRAM used to buffer often used texture data, kinda like a cache? USing the 25 GB/s bandwidth, fill say half the eDRAM with data. The GPU can access this at 64 GB/s lots of times, while the othe half is filled. Then the other half can be used, and more data fetched?

If not, I don't understand how the eDRAM is any use (don't understand it on PS2 either :? ) as it looks like the GPU's a barrel, the eDRAM a large drain, and the RAM's a thin hose, and no matter how big the drain pipe the barrel aint gonna fill faster than the hose can supply it.
 
london-boy said:
DaveBaumann said:
Talking to one developer recently about the memory split on PS3 he said that they would generally budget around a 50%/50% split between system and graphics memory anyway, so its probably the exception rather than the norm that developers would place texture data outside of graphics RAM anyway. There is 256MB on the graphics for a reason, its not there not to be used.
Dave... X360 thread...? Any info on the X360 instead? ;)
ZOMG, he just predicted that the PS3 will T0T4LLY 0WNZ the 360!!!one1!!

;)

I'll approach this the same way as the EDRAM thread: we don't know enough yet. At the moment, it's rather like comparing the "on paper" specs of the PS2 to the Dreamcast, then the Xbox to the PS2... It's certainly nice to have a lot more than we HAVE had so far, but we need to see the whole picture first.

...and having one of the systems actually out so its specs ARE finalized would be god, too. :)
 
Titanio said:
version said:
Laa-Yosh said:
You forget the framebuffer bandwith. Calculate how much the RSX would use with 4xAA, 1280*720 color + Z, and of course with overdraw, and additional rendertargets... and see how much of that extra 22GB is left for anything :)

how much the framebuffer bandwith in general case ?

I don't know what "the general case" is but I figure a 1280*720 colour (32 bit) z buffer (16-bit - that ok?) would consume ~1.6GB/s of bandwidth for 60 frames per second, with 5x overdraw. Someone might want to check my figures though, I'm kinda just trying to figure this out myself. I don't know how AA would affect that figure though (4x AA = 4x the bandwidth?) - and what's overdraw like in a modern game? Is 5x high, low, middle? Also not sure how many rendertargets a game would typically be juggling, and I'm also not sure if they'd all be of the same precision (?)

Try the following numbers

1920x1080
32bit Color read/write - 8
32bit Z read - 4
60fps

60x12x2000000 (give or take) - 1.4 GB/s per pass

Now lets try that with the 128bit color buffers NVidia mentioned at the conference

we have
128 bit color read/write
32 bit Z read

so 60*36*2000000 - 4.3 GB/s per pass

using a 64 bit buffer for HDR about 2Gb/s/pass

1280x720 is about 1/2 that.

You can decide for yourself what reaonable overdraw is.

This obviously doen't include the cost of AA or savings from early or compressed Z, or the fact that you won't actually get anywhere near the peak bandwidth numbers for the memory unless your just filling the screen over and over.

So to answer your question who knows, it'll all depend on the app and what it really wants to achieve.
 
Gubbi said:
If it were fill-duplex one would expect the read and write bandwidth to be exactly the same.

But we already know that the read and write channels use different, uni-directional, lanes.

You have it backwards.

Half-duplex (bi-directional, but only one direction at a time) would imply same read/write bandwidth since the same lines are used for both directions, just time-multiplexed.

Full-duplex (bi-directional, both directions simultaneously) implies that dedicated lines be reserved for read, and others for write, and the allocation of lines to read vs write can lead to different read and write bandwidths.
 
Back
Top