Complete Details on Xenos from E3 private showing!

FP16 will probably not be used, there's a 10/10/10/2 FP format instead. This means that developers can work with 32 bit data, so your 3-tile calculation is probably correct for this case.
 
nAo said:
Laa-Yosh said:
fresh said:
AA is 'free' at the cost of having to render scene multiple times due to low amounts of edram. We'll see how it plays out on beta hardware.

The only reason for the speed hit is the splitting of triangles lying on tile edges. Copying the backbuffer into the framebuffer should not have a reasonable impact on the system's speed...
Triangles splitting along tile edges will not be required, guardband scissoring (it should be free..) will be enough.
If the application doesn't cope with tiling itself the driver will probably force geometry to be resent/retransformed (this will take some ALUs time )or cached and reused (this will take some extra memory)

There is logic on the chip to avoid some of the unnecessary transform work when tiling.

I don't know if Dave got details of this in his conversation.
 
nAo said:
a 640x360x4AAxFP16+Z buffer takes 10.54 MB, so if it fits in R500 edram (I don't know the exact amount of edram!) 4 tiles would be enough to render a fulll 720p 64bit 4x AA frame!
This way tiles would be so big that a non managed solution would suffer an absolutely negligible hit, a managed solution would take more overhead.

With a 32bit render target a 3 tiles (1280x240 per tile) solution would nicely fit in R500 edram too.

More I think about R500, more I like it :)

So FP10 can fit into 3 tiles and FP16 can fit into 4?

It would seem[/i, based on what ATI has said (have not seen anything running mind you!), that if the system can do 3 tiles quick enough with no problem that 4 would be plausible. Even if they were cutting it close, and 60fps with 3 tiles per frame was the max (180 tiles a second), that would mean with FP16, if it took 4 tiles, would be 45fps.

Of course I would hope they would not be cutting it that close... I guess there is the extra processing power to deal with FP16 too, but it will be interesting to what kind of performance drop there is from FP10 to FP16. Is this something anyone can comment on or something we will find out in Dave's article?
 
ERP said:
There is logic on the chip to avoid some of the unnecessary transform work when tiling.
nice ;)

Do you believe a non managed solution might be even more efficient? (less overhead, almost no extra bw or extra mem required)
 
Acert93 said:
It would seem[/i, based on what ATI has said (have not seen anything running mind you!), that if the system can do 3 tiles quick enough with no problem that 4 would be plausible. Even if they were cutting it close, and 60fps with 3 tiles per frame was the max (180 tiles a second), that would mean with FP16, if it took 4 tiles, would be 45fps.

Umh..no :)
Things are not that simple; performances are not so tightly related to the tiles count.
Moreover I don't know how tone mapping is handled on R500.
Tone mapping may require to 'see' all the image at the same time so after all tiles are rendered (and downscaled) to external dram it might be necessary to reload them into edram (obviously there is enought space to hold a 720p 64 bit frame buffer) in order to perform sone kind of tone mapping. After that a final frame buffer download to external dram would be required.
Of course I would hope they would not be cutting it that close... I guess there is the extra processing power to deal with FP16 too, but it will be interesting to what kind of performance drop there is from FP10 to FP16. Is this something anyone can comment on or something we will find out in Dave's article?
Rendering to a FP16 render target cut in half fill rate (4 pixels per clock cyle).
 
Ugh, I really don't like this FP10 format. I'm just coming off of headaches from blending artifacts on the 5:6:5 format, and now I'll be subject to 7-bit artifacts, which if next-gen XB360 games are the FB blend happy games I think they will be, I'll be subject to even more.
 
nAo said:
Tone mapping may require to 'see' all the image at the same time so after all tiles are rendered (and downscaled) to external dram it might be necessary to reload them into edram (obviously there is enought space to hold a 720p 64 bit frame buffer) in order to perform sone kind of tone mapping. After that a final frame buffer download to external dram would be required.
I would guess tone mapping is done like on today's cards - using the (downsampled) rendertarget as a texture. No need to reload it into eDRAM.
 
Xmas said:
I would guess tone mapping is done like on today's cards - using the (downsampled) rendertarget as a texture. No need to reload it into eDRAM.
For a moment I've thought R500 could sample textures from it edram, but I think this is not the case, and I don't even know if this would make a more efficient/faster tone mapping pass(es)
So yes, downsampled tiles would be sampled as textures ;)
 
Back
Top