This is simply a wonderful read , thank you ..
http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=127466&enterthread=yYour point that it isn't a L2 is well taken, but that doesn't prove read accesses from the memory controller can't be sourced from the cache to allow it to be used as a general R/W cache for UAVs. All I see from the diagram is that the R/W cache connects to the memory controllers, which allows the option that reads can bypass memory altogether (making it a R/W cache for UAVs).
Very nice article but I'm not sure what can be inferred from it. If Cypress was in fact cut down and stuff like Sideport was removed was it just a simple cut - like number of SIMDs or did they completely change what the chip was going to be?
In any case if Northern Islands is another full lineup on 40nm I don't see how it can be anything other than an architectural overhaul otherwise what would be the point? Say back when it was decided that Cypress was going to be a bit smaller than first planned they also decided to refresh it with a bigger chip on the same architecture. That would make some sense but it wouldnt make any sense at all for the downmarket derivatives. Yeah so I'm gonna bet on significant overhauls somewhere, maybe in the geometry pipeline.
I don't expect anything other than a boring refresh of Evergreen this year. I don't know what its name is.
I don't know if NI is meant to be a refresh of Evergreen (e.g. as minor as RV790 or as major as R520->R580 or RV730->RV740) or if NI is meant to be a substantial change (RV670->RV770, RV770->Evergreen) or if NI is meant to be an architectural re-boot (R580->R600).
Evergreen is less late than I thought it was (I thought it was about 1 quarter late) and logic would indicate that AMD plans a substantial change (RV670->RV770, RV770->Evergreen) for summer/autumn 2010 based on the pattern for RV770 and Evergreen. But I think process complications and the GF factor will all conspire against anything other than a boring refresh this year.
Jawed
Is GF 32nm going to be production ready this year? If so AMD's current process advantage could turn into a slaughter.
Are there any die photos of RV870/Cypress yet?
I always type before I think. Anyway, so indeed "presently" all UAV accesses go either through texture cache or uncached :/ I wonder what the performance is if you try to abuse atomics to use it as a R/W cache anyway (atomic OR 0 with return to read for instance).Ah, your ninja edits are amusing at times.
Supposedly. GF seems to have been quite bullish about 32nm, and there were 32nm wafers showed off by GF a couple of months back that were supposed to be more than just SRAM. If 32nm at GF goes fairly well without any of the delays and problems we've come to expect from TSMC, it could be looking good for CPU/GPU production towards the latter half of the year.
It'd be fun and it is 128KB...I always type before I think. Anyway, so indeed "presently" all UAV accesses go either through texture cache or uncached :/ I wonder what the performance is if you try to abuse atomics to use it as a R/W cache anyway (atomic OR 0 with return to read for instance).
Unfortunately the ISA document is in "preview" condition with large chunks missing... This may be the abuse that you mentioned in the prior paragraph It seems you would use EXPORT_RAT_INST_XCHG_RTN to write using the atomic functionality, and then ignoring the return value. The bandwidth is low, because writes are DWords with no option to write 128-bits. No idea what kind of serialisation mechanics are in play.BTW ... the presentation did suggest something else, not so much the diagram but the actual text :
"Unordered shared consistent loads/stores/atomics via R/W Cache"
Since 28nm is only 3 months behind 32nm and is a bulk process, I wouldn't expect AMD to bother with 32nm for GPUs... if it weren't for Llano.
Since they have to make a GPU on 32nm SOI anyway, why not make use of the experience they'll gain, and release more GPUs on 32nm? I wonder if it makes sense to do so...
Historically, AMD has been 1 year behind Intel on process technology. I see no reason why this has changed. GF was formed in early 2009, which is probably too late to have much impact on development of 32nm.
My guess is that GF has 32nm parts at the very end of the year (perhaps Llano), but not in high volume. I could very well be wrong, but that's my guess.
Of course, the real issue is not how GF compares to Intel, it's how GF compares to TSMC...and that's trickier to guess.
David
Playing a bit with GSA I noticed that UAV reads are performed via texture fetches. The interesting thing is that since texture caches are read only and not coherent they seem to issue a cache line eviction per each UAV read, followed by a memory barrier to wait for the eviction to be completed (just before the texture fetch). I wonder why they don't simply use uncached reads.UAV reads can either go through the texture cache hierarchy or they can be uncached reads from global memory.
They use it to handle UAVs global counters and counters for append/consume buffers (randomly notices while playing with GSA).Global Shared Memory, additionally, provides a RW surface - but it's 64KB.