AMD: R7xx Speculation

Status
Not open for further replies.
Interresting:

rv770yk3.jpg


powerplay2ic1.jpg
 
One characteristic of the RV670 and very possibly 770 is that voltage doesn't matter much relatively to clockspeed (I think the guys on XS concluded ths. :D ) for overall consumption. Which isn't vaguely close to ideals I know.
That's not even not close to ideals, it doesn't make sense at all. Whatever data XS had, I bet their conclusions are wrong. Would be a nice article though someone varying clock / voltage and measuring power draw on a rv670 (and rv770).
 
One characteristic of the RV670 and very possibly 770 is that voltage doesn't matter much relatively to clockspeed (I think the guys on XS concluded ths. :D ) for overall consumption. Which isn't vaguely close to ideals I know.

Could you restate what that means?

Are you trying to say:
voltage bumping doesn't raise power consumption
voltage bumping doesn't raise the clock ceiling
upping the clock doesn't raise power consumption
upping the voltage and clock doesn't raise power consumption
 
No more ringbus + fixed performance (by way of lots more units while keeping transistor count fairly low) seems to indicate that the ringbus naysayers were right & the ringbus was a waste of transistors?

Do we know which bit of ATI/AMD designed the RV770?
Team A: R300 -> Xenos -> RV770?
Team B: R420 -> R520 -> R600

It was the ArtX team, the creators of the R300.

:LOL:
Just kidding.


http://www.beyond3d.com/content/interviews/8/3

B3D: With respect to engineering resources its been suggested to us that the “West Coast Team” (Santa Clara - Silicon Valley) has become the main focus for all the PC parts coming from ATI and that now even R500, which we initially understood to be an “East Coast Team” (Marlborough) product, is being designed at Santa Clara. Is it the case that Santa Clara will mainly produce the PC parts now, while Marlborough will be active with “special projects” such at the next X-Box technologies?

We had this concept of the “ping-pong” development between the west and east coast design centres. On paper this looked great, but in practice it didn’t work very well. It doesn’t work well for a variety of reasons, but one of them is the PC architecture, at the graphics level, has targeted innovation and clean sheet innovation and whenever you have separate development teams you are going to, by nature, have a clean sheet development on every generation of product. For one, we can’t afford that and its not clear that it’s the right thing to do for our customers from a stability standpoint. Its also the case that’s there’s no leverage from what the other development team has done, so in some cases you are actually taking a step backwards instead of forwards.

What we are now moving towards is actually a unified design team of both east and west coast, that will develop our next generations of platforms, from R300 to R400 to R500 to R600 to R700, instead of a ping-pong ball between them both. Within that one organisation we need to think about where do we architecturally innovate and where do we not in order to hit the right development cycles to keep the leadership, but it will be one organisation.

If you dissect in, for example, to the R600 product, with is our next, next generation, that development team is all three sites - Orlando, Silicon Valley, Marlborough – but the architectural centre team is in the Valley, as you point out, but all three are part of that organisation.

B3D: Would I be correct in suggesting that mainly Marlborough and Orlando would be the R&D centres – with the design of various algorithms for new 3D parts – while the Santa Clara team would be primarily responsible for implementing them in silicon?

No, because the architecture of the R300 and R500 is all coming from the Valley, but we’ve got great architects in all three sites.

Bob Drebin in the Valley is in charge of the architecture team and so he’s in charge of the development of all the subsequent architectures but he goes out to the other teams key leaders and that forms the basis of the unified architectural team. At an implementation level, you’re right – Marlborough is mainly focused on the “special projects” and that will probably be another 18 to 24 months for them. So the R600 family will mainly be centred primarily in the Valley and Orlando with a little bit from Marlborough, and then the R800 would be more unified.
 
No more ringbus + fixed performance (by way of lots more units while keeping transistor count fairly low) seems to indicate that the ringbus naysayers were right & the ringbus was a waste of transistors?
Jawed said something that sort of echoed my sentiments. The MC is distributed around the edges of the die, and you're going to tell me the ring bus is dead? Maybe technically it's not a ring bus, but the idea is the same.

It's sort of "inside-out" from the R580 die shot that I remember. You still need to get data from one MC to the other end of the chip. The hub and the ring are sort of the same thing, except the layout of the former doesn't look like a ring anymore.

In any case, it looks like ATI took a few pages out of NVidia's book by tying the ROPs to the memory channels.
 
It looks like there's a 1:1 relationship between TUs and L1s.
Perhaps each TU can have short-cycle access to the nearest TU, but it would seem sensible to assume that there are frequent cases where one L1's data set would be in the same locality as another's.
To keep replication from gutting the effectiveness of the caches, maybe each TU can have delayed access to other L1s, or the global data share picks up on shared lines and saves a copy.
There could be some kind of relationship with the local data share per SIMD, and I wish I had some clarity as to its use.
Synchronization within and between SIMDs would be facilitated by the data shares, but they could also be used to house temp copies of L1 lines, or even contexts for pending clauses from other batches to keep more in flight.

Alternatively the TUs can fetch from either vertex cache or global data share. Is it reasonable to presume that at any one time only one TU can fetch from either GDS or VC? If all the TUs could concurrently fetch from VC, say, you'd have a completely stupid crossbar.
If that were the case, vertex work and synchronization operations would be bottlenecked at one access per cycle for the entire chip, assuming the data shares enable synchronization primitives.
The GDS would become a global serialization structure, something that could have been handled with a few "mass halt" signal lines instead.
I'm thinking those caches might be banked or multiported. Maybe not 10-way, but definitely more than single-ported.

So, is local data share an analogue of parallel data cache in G80?
Possibly. I can imagine it could be used to emulate CUDA functionality, if AMD chooses to expose it. I'm not sure if it's a good long-term idea, but I dunno. It may be safer to abstract it behind ops that implicitely handle the local storage.
More complex data sharing could be done with the data share between SIMD lanes, though this would inject new forms of dependences within a SIMD that used to assume complete data independence.

I'm not sure what the local data share is for, if the SIMDs are still in lockstep and the assumption is that there is minimal intrabatch dependence.
 
... which is apples to oranges. Doesn't make sense to compare those.
And it does make sense to patch it for Radeon testing? So that their performance is artificially crippled, even if there are no graphic bugs in the unpatched game?

=>w0mbat: The RV770 doesn't have a ring-bus? You sure of it?
 
Status
Not open for further replies.
Back
Top