AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

  • Within 1 or 2 weeks

    Votes: 1 0.6%
  • Within a month

    Votes: 5 3.2%
  • Within couple months

    Votes: 28 18.1%
  • Very late this year

    Votes: 52 33.5%
  • Not until next year

    Votes: 69 44.5%

  • Total voters
    155
  • Poll closed .
There is no general RW cache, that's the point. The only way general purpose reads are cached is via the texture system and as you can see from the diagrams the texture system's cache hierarchy has no connection to any of the write caches.

Jawed
 
Your point that it isn't a L2 is well taken, but that doesn't prove read accesses from the memory controller can't be sourced from the cache to allow it to be used as a general R/W cache for UAVs. All I see from the diagram is that the R/W cache connects to the memory controllers, which allows the option that reads can bypass memory altogether (making it a R/W cache for UAVs).
 
Very nice article but I'm not sure what can be inferred from it. If Cypress was in fact cut down and stuff like Sideport was removed was it just a simple cut - like number of SIMDs or did they completely change what the chip was going to be?

In any case if Northern Islands is another full lineup on 40nm I don't see how it can be anything other than an architectural overhaul otherwise what would be the point? Say back when it was decided that Cypress was going to be a bit smaller than first planned they also decided to refresh it with a bigger chip on the same architecture. That would make some sense but it wouldnt make any sense at all for the downmarket derivatives. Yeah so I'm gonna bet on significant overhauls somewhere, maybe in the geometry pipeline. :)
 
Ah, your ninja edits are amusing at times.

Your point that it isn't a L2 is well taken, but that doesn't prove read accesses from the memory controller can't be sourced from the cache to allow it to be used as a general R/W cache for UAVs. All I see from the diagram is that the R/W cache connects to the memory controllers, which allows the option that reads can bypass memory altogether (making it a R/W cache for UAVs).
http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=127466&enterthread=y
 
Very nice article but I'm not sure what can be inferred from it. If Cypress was in fact cut down and stuff like Sideport was removed was it just a simple cut - like number of SIMDs or did they completely change what the chip was going to be?

In any case if Northern Islands is another full lineup on 40nm I don't see how it can be anything other than an architectural overhaul otherwise what would be the point? Say back when it was decided that Cypress was going to be a bit smaller than first planned they also decided to refresh it with a bigger chip on the same architecture. That would make some sense but it wouldnt make any sense at all for the downmarket derivatives. Yeah so I'm gonna bet on significant overhauls somewhere, maybe in the geometry pipeline. :)

I agree, whatever the replacement for 2010 is called, it has to be either

a) an architectural overhaul,
b) a process shrink.

It would be downright BORING if it was just a port to GF 40nm without any changes.

However, if you look at GF's HKMG brochure, you'll find that they are saying that some customers will announce products in Q1 on their HKMG process. They don't have HKMG on anything 40nm or bigger. And since cpu's and gpu's are typically the first chips to migrate to new processes, chances are high that it will be a amd gpu, if only a pipecleaning part like rv740.
 
I don't expect anything other than a boring refresh of Evergreen this year. I don't know what its name is.

I don't know if NI is meant to be a refresh of Evergreen (e.g. as minor as RV790 or as major as R520->R580 or RV730->RV740) or if NI is meant to be a substantial change (RV670->RV770, RV770->Evergreen) or if NI is meant to be an architectural re-boot (R580->R600).

:???:

Evergreen is less late than I thought it was (I thought it was about 1 quarter late) and logic would indicate that AMD plans a substantial change (RV670->RV770, RV770->Evergreen) for summer/autumn 2010 based on the pattern for RV770 and Evergreen. But I think process complications and the GF factor will all conspire against anything other than a boring refresh this year.

Jawed

I dont see anything boring in a refresh. If they could sell a 1+ GHz 5870 for the price of 5770 people wouldnt care to much about architecture. If GF 32nm/28nm refresh would enable them much better yields and higher clocks then it could still beat gt400 cards in price (GT200 vs RV770).
 
Is GF 32nm going to be production ready this year? If so AMD's current process advantage could turn into a slaughter.
 
Is GF 32nm going to be production ready this year? If so AMD's current process advantage could turn into a slaughter.

Supposedly. GF seems to have been quite bullish about 32nm, and there were 32nm wafers showed off by GF a couple of months back that were supposed to be more than just SRAM. If 32nm at GF goes fairly well without any of the delays and problems we've come to expect from TSMC, it could be looking good for CPU/GPU production towards the latter half of the year.
 
Ah, your ninja edits are amusing at times.
I always type before I think. Anyway, so indeed "presently" all UAV accesses go either through texture cache or uncached :/ I wonder what the performance is if you try to abuse atomics to use it as a R/W cache anyway (atomic OR 0 with return to read for instance).

BTW ... the presentation did suggest something else, not so much the diagram but the actual text :
"Unordered shared consistent loads/stores/atomics via R/W Cache"
 
Supposedly. GF seems to have been quite bullish about 32nm, and there were 32nm wafers showed off by GF a couple of months back that were supposed to be more than just SRAM. If 32nm at GF goes fairly well without any of the delays and problems we've come to expect from TSMC, it could be looking good for CPU/GPU production towards the latter half of the year.

Since 28nm is only 3 months behind 32nm and is a bulk process, I wouldn't expect AMD to bother with 32nm for GPUs... if it weren't for Llano.

Since they have to make a GPU on 32nm SOI anyway, why not make use of the experience they'll gain, and release more GPUs on 32nm? I wonder if it makes sense to do so...
 
I always type before I think. Anyway, so indeed "presently" all UAV accesses go either through texture cache or uncached :/ I wonder what the performance is if you try to abuse atomics to use it as a R/W cache anyway (atomic OR 0 with return to read for instance).
It'd be fun and it is 128KB...

BTW ... the presentation did suggest something else, not so much the diagram but the actual text :
"Unordered shared consistent loads/stores/atomics via R/W Cache"
Unfortunately the ISA document is in "preview" condition with large chunks missing... This may be the abuse that you mentioned in the prior paragraph :LOL: It seems you would use EXPORT_RAT_INST_XCHG_RTN to write using the atomic functionality, and then ignoring the return value. The bandwidth is low, because writes are DWords with no option to write 128-bits. No idea what kind of serialisation mechanics are in play.

Jawed
 
Since 28nm is only 3 months behind 32nm and is a bulk process, I wouldn't expect AMD to bother with 32nm for GPUs... if it weren't for Llano.

Since they have to make a GPU on 32nm SOI anyway, why not make use of the experience they'll gain, and release more GPUs on 32nm? I wonder if it makes sense to do so...


Plus the benefit of power gating on 32nm SOI from GloFo ...
Quite a feature for high end GPU, not so much for low end where there is little to shut down.
 
Historically, AMD has been 1 year behind Intel on process technology. I see no reason why this has changed. GF was formed in early 2009, which is probably too late to have much impact on development of 32nm.

My guess is that GF has 32nm parts at the very end of the year (perhaps Llano), but not in high volume. I could very well be wrong, but that's my guess.

Of course, the real issue is not how GF compares to Intel, it's how GF compares to TSMC...and that's trickier to guess.

David
 
Historically, AMD has been 1 year behind Intel on process technology. I see no reason why this has changed. GF was formed in early 2009, which is probably too late to have much impact on development of 32nm.

My guess is that GF has 32nm parts at the very end of the year (perhaps Llano), but not in high volume. I could very well be wrong, but that's my guess.

Of course, the real issue is not how GF compares to Intel, it's how GF compares to TSMC...and that's trickier to guess.

David

GF's roadmap says that risk production for 32nm is scheduled for mid-2010, which does seem a little tight if AMD wanted to refresh the entire Evergreen line-up in 2010, but for just Cypress it seems very doable.

Also, while GF was formed in 2009, it was planned much earlier than that, and no doubt they anticipated it in some ways. For instance, there's a 40nm LP bulk process planned for risk production in mid-2010 and I doubt GF waited for its effective independence to start working on it.

That said, AMD seems more likely to play it safe and just stick to TSMC's 40nm process for 2010. By now, they know it well... I just don't think we can completely rule out GF's 32nm.
 
UAV reads can either go through the texture cache hierarchy or they can be uncached reads from global memory.
Playing a bit with GSA I noticed that UAV reads are performed via texture fetches. The interesting thing is that since texture caches are read only and not coherent they seem to issue a cache line eviction per each UAV read, followed by a memory barrier to wait for the eviction to be completed (just before the texture fetch). I wonder why they don't simply use uncached reads.

Global Shared Memory, additionally, provides a RW surface - but it's 64KB.
They use it to handle UAVs global counters and counters for append/consume buffers (randomly notices while playing with GSA).
 
If AMD are going to transition a chip onto the GF 32nm process quickly I would suggest its likely the Xbox 360 GPU. It makes the most sense as its directly applicable to getting Fusion up and running, its secretive in that Nvidia probably won't find out they've done it until far later than a seperate GPU SKU as Microsoft wouldn't even start selling them until the end of the year and its something which works in with their old Global Foundries agreement in that it nets a new client for GloFo as well as giving them an assured payout with little risk to themselves.

As an aside, I wonder if in the future they will deliberately sell chips they can use for both computers and consoles for economies of scale. They can divide the R+D costs over a larger number of chips and they can lower the overall cost per chip for themselves and the console manufacturer by taking the valuable high bin chips and salvaging the lower bin chips so there would be less wastage and overall better margins all around.
 
Back
Top