XDR2 needs to be over twice as fast to give an advantage in peak bandwidth per data-pin.
True, but RAMBUS allows for AMD and nVIDIA to ease out the transition to such a new radically different, for them, system (which would be painful and needs lots of resources).
GDDR5 only use differential signalling for the clock signals? Care to elaborate more ? I feel like I am missing something... (is it due to micro-threading?)
Edit: I thought GDDR5 was using fully differential signalling for data too.
Z isn't compressed. It is fully allocated and static in size for a given render target size.I just have to bring this up, what about using edram? Sure putting full depth buffers in there isn't really feasible, but z buffers are compressed nowadays so if you'd put for instance only the parts in there which are fully compressed (8x ratio?) you'd "only" need 16MB for 2560x1600 with 8xAA, which doesn't look unreasonable. You'd still need high-bandwidth memory (to fetch other parts of z buffer, color buffers, textures etc.) but surely this should help.
Z *is* compressed or else you would see much slower results for single-sample Z fillrate. You still need to allocate the full memory, however, because you can't guarantee a consistent level of compression.Z isn't compressed. It is fully allocated and static in size for a given render target size.
The data format for Z is optimised to minimise the count of locations that are accessed, e.g. a 4x4 tile of pixels where no edge falls can have Z samples written to a single tile in a burst for only 16 samples, leaving the remaining 48 samples (4xMSAA) unwritten. On-chip tile tag tables are used to keep track of the status of tiles.
Z isn't compressed. It is fully allocated and static in size for a given render target size.
The data format for Z is optimised to minimise the count of locations that are accessed, e.g. a 4x4 tile of pixels where no edge falls can have Z samples written to a single tile in a burst for only 16 samples, leaving the remaining 48 samples (4xMSAA) unwritten. On-chip tile tag tables are used to keep track of the status of tiles.
Jawed
Having a group of threads reading/writing from/to another group of threads tile(s) is not a good idea. Apart from the obvious synchronization issues good luck with maintaining the proper primitive submission order without leaving a lot of performance on the table.In the end if it's inside the engine? Very little AFAICS. I expect some devs to experiment with approximate tile rendering on Fermi (ie. you tile, but you don't clip to tile boundaries and only render triangles which straddle boundaries once).
It's not a nitpick to say that 'you'd "only" need 16MB for 2560x1600 with 8xAA' is wrong. The entire surface needs to exist, which is more than 16MB. Anyway, we've moved past that. I think we've even discussed the partially within EDRAM type solution before...Jawed, that's just nitpicking, which invites the meta-nitpick that the individual tile is compressed. Anyway ... the term compression for this is now so well entrenched that regardless of your personal reservations of whether it's applicable or not is quite irrelevant.
It's not a nitpick to say that 'you'd "only" need 16MB for 2560x1600 with 8xAA' is wrong. The entire surface needs to exist, which is more than 16MB.
And how much edram would you need for runnig eyefinity with 6 monitors
Thus, there is a point where you have to go to tiling or render the whole image with some sort of irregular or multi resolution method.Its always never enough no matter what the targeted resolutions!
In the end if it's inside the engine? Very little AFAICS. I expect some devs to experiment with approximate tile rendering on Fermi (ie. you tile, but you don't clip to tile boundaries and only render triangles which straddle boundaries once).
The good thing with GF100 is that you can take a break for a week or so, drool over your newborn like there's no tomorrow and there still won't be anything worthwhile about it in the net.
Sorry for the OT; as you where