Complete Details on Xenos from E3 private showing!

DemoCoder said:
Is this confirmed? It's quite an interesting bit of info. 32gigazixels/s is quite impressive. But if a quadrupled-up ROP can write 4 aa-samples per clock (each having their own Z) how do you arrive at 8-z samples per clock when color writes are disabled? I assume somehow the color write logic is borrowed to write the extra Z, but why isn't it 128 Z per clock then?
With color, it's 8 * 4 * (32bit color + 32bit Z/stencil, read and write)
Without color, it's 16 * 4 (32bit Z/Stencil, read and write)

Where would they take the bandwidth for 128 Z-samples from?
 
Yes, its confirmed. I just had the CC with the architects. I now have about an hour of audio to go through and get into something coherant.
 
Dave: sorry for the OT, is there any chance we'll have something similar (more info!) about RSX? or do we have to wait until G70 reviews appear on the net? :)
 
I've asked, but NVIDIA can't talk about it to any great extent yet, and when you do hear about it I still get the impression that the G70 reviews will reflect more accuracy for the actual architecture.
 
DaveBaumann said:
Yes, its confirmed. I just had the CC with the architects. I now have about an hour of audio to go through and get into something coherant.
When will we see what you got Dave? Can't wait. Thanks.
 
MS are not claiming 256GB/s from CPU->GPU. SONY claimed 48GB/s eDRAM bandwidth in GS, but the actual EE->GS bandwidth was PUNY. If SONY can claim 48GB/s bandwidth of the eDRAM then why can't MS claim 256GB/s bandwidth of EDRAM???

I don't see the relevance in talking about the bandwidth between EE and GS. Sony could claim 48GB/s bandwidth for GS's eDRAM because the bus from GS's graphics core and eDRAM was 48GB/s. The bus between 360's graphics core and eDRAM is 32GB/s. Certain parts of 360's rendering pipeline have a 256GB/s bus to eDRAM (z and stencil buffer) but not the whole thing. So I agree with Mordecaii and Shifty, its misleading to claim 256GB/s bandwidth to eDRAM and not mention that the actual main graphics core uses a 32GB/s bus. Not that its a big deal though with all the other exaggeration and PR crap that goes on in this industry :)
 
Sony could claim 48GB/s bandwidth for GS's eDRAM because the bus from GS's graphics core and eDRAM was 48GB/s.

Well *technically* no, the GS "graphics core" got 48GB/s to the page buffer... The page buffer got around 150-188GB/s to eDRAM...
 
how is that different from GS?
Read what archie wrote - GS has internal component running at a much higher bandwith in a similar manner (though the asociated logic in pagebuffers is pretty dumb :p).

By all means, Sony should have claimed the 150+GB/s number then.
 
Fafalada said:
how is that different from GS?
Read what archie wrote - GS has internal component running at a much higher bandwith in a similar manner (though the asociated logic in pagebuffers is pretty dumb :p).

By all means, Sony should have claimed the 150+GB/s number then.

So instead of Smart Memory, it's Dumb Memory? :LOL:
 
Jawed said:
On X850XTPE 4xAA costs upto 45% fps.

ATI is claiming that 4xAA on Xenos will cost upto 5% fps.

Jawed

Not in reality, because 4xAA will take up more edram, so you have to draw more tiles.

Folks, I don't know how many times I have to say this - 720p + 2xAA DOES NOT FIT INTO 10mb of edram. Period. End of story. The scene has to be tiled , ie - executed multiple times. This is required, as the Xbox360 minimum requirement is 720p + 2xAA. Going to 4xAA may require even more tiles.

AA is 'free' at the cost of having to render scene multiple times due to low amounts of edram. We'll see how it plays out on beta hardware.
 
fresh said:
AA is 'free' at the cost of having to render scene multiple times due to low amounts of edram. We'll see how it plays out on beta hardware.

The only reason for the speed hit is the splitting of triangles lying on tile edges. Copying the backbuffer into the framebuffer should not have a reasonable impact on the system's speed...
 
Laa-Yosh said:
fresh said:
AA is 'free' at the cost of having to render scene multiple times due to low amounts of edram. We'll see how it plays out on beta hardware.

The only reason for the speed hit is the splitting of triangles lying on tile edges. Copying the backbuffer into the framebuffer should not have a reasonable impact on the system's speed...
Triangles splitting along tile edges will not be required, guardband scissoring (it should be free..) will be enough.
If the application doesn't cope with tiling itself the driver will probably force geometry to be resent/retransformed (this will take some ALUs time )or cached and reused (this will take some extra memory)
 
fresh said:
Not in reality, because 4xAA will take up more edram, so you have to draw more tiles.

Folks, I don't know how many times I have to say this - 720p + 2xAA DOES NOT FIT INTO 10mb of edram. Period. End of story. The scene has to be tiled , ie - executed multiple times. This is required, as the Xbox360 minimum requirement is 720p + 2xAA. Going to 4xAA may require even more tiles.

Most likely, you are wrong. 720P at 4xAA will most likely fit within 10mb. In the rare cases where a given pixel needs more space for samples than is provided within the edram.

There are lots of alternatives. Even in the tiling case, the scene does not need to be executed multiple times.

Aaron Spink
speaking for myself inc.
 
Yes, I remember ERP posting that they can only fit 640*480 4xAA into it.
But does anyone know how much it costs to do tile rendering? ATI seems quite confident that their method does not cause such a big speed hit.
 
But does anyone know how much it costs to do tile rendering? ATI seems quite confident that their method does not cause such a big speed hit.
It depends.
If you let the driver handle tiling for you it will cost more, cause it would have to cache transformed geometry or it would have to read N times the same dispay list and transform geometry N times per frame.
If one has some spare bandwith and some free mem (not much..) the first option should be almost painless.
More advanced titles would probably handle tiling themselves, rendering N tiles and sending to each tile only geometry that belongs to the current tile (no transformed vertices caching or N read and transform of the same full frame display list), this would not be different from what a lot of multiplayers+split screen titles do now.

a 640x360x4AAxFP16+Z buffer takes 10.54 MB, so if it fits in R500 edram (I don't know the exact amount of edram!) 4 tiles would be enough to render a fulll 720p 64bit 4x AA frame!
This way tiles would be so big that a non managed solution would suffer an absolutely negligible hit, a managed solution would take more overhead.

With a 32bit render target a 3 tiles (1280x240 per tile) solution would nicely fit in R500 edram too.

More I think about R500, more I like it :)
 
Back
Top