...Or something like that.
I'm no super expert on this sort of thing.
I think your expertise is well enough
So basically, GPUs take framebuffer bw penalties when a 2x2 block doesn't have full coverage, or when 4 2x2 blocks cannot be dispatched to 4 different chips in a single clock. If I'm not mistaken it is the same approach a single DDR chip already uses right?
The write gathering you mentioned seems to be a good solution and avoids multiple writes to a single chip. At first I thought it would be quite expensive (using high speed scheduling and all), but setting it up as a reluctant cache and write buffer on top of that might help already.
Another question that disturbs my mind: in what way do shading units interact. Is it like all of the units rendering as much pixels, performing a single operation per pixel per clock. Or is it more like the shaders are divided equally among the ROPS and are setup as a pipe, so that it will have lots of latency, but renders a complete shading program in a single cycle? In the latter case the fillrate wouldn't be influenced by shading operations. What is plausible? (BTW, sorry for being offtopic)
You have to look at 5xxx series mobile binned parts on 40nm to find 400 shader GPUs that might fit into the Wii U power envelope. And even with 8 ROPs they could outperform the 360.
IMO AMD would be perfectly capable of instructing Nintendo how to build a GPU that is affordable and effective in 2012. So god knows what they choose for in the end. If they want to attract hardcore gamers they'd better have, if I were a XBOX fanboy that might be seduced by next gen Wii, I'd certainly wait for another two years for a 720 with these kind of specs speculating around! I agree on the ROP thing, but given the small clockspeed increase, the additional performance (if there already) should come mainly from faster shading in order to achieve higher fillrates.
Chances are that either the GPU is BW limited in some way or it doesn't actually have a lot of extra grunt. Or both.
Well we shouldn't be disappointed if that were the case
ERP; said:
because tiling it reduces the efficiency of the texture cache which is optimized for the swizzled case.
Mmm, good point. I'd say this is no issue for GC and Wii since the data is converted when exported to main memory. Don't know if the same applies to 360? Then again, the bandwidth required for transfer might be of larger cost. Agree with the alignment thing.
jlippo; said:
Also mipmaps are needed to get any decent performance from cache.
From a TMU's perspective it is, each mip level halves the res, so less data to read. But in case of anisotrophic filtering, or when LOD is in between two mip levels, it results in additional external bw requirements when both textures aren't in cache yet (you always need to read 1 byte, and theoretically 0.25 byte for the next LOD).
Just to clarify, they aren't paying to have it X-Rayed, they just pooled some money to purchase the images off chipworks. The money is collected (in just a matter of hours), and they're just waiting for verification of funds before the purchase is made. Should be interesting to hear what they find.
Interesting! Makes me kind of wonder why chipworks made x-ray pics... I don't assume because there is a large market for GPU circuitry posters!