AMD: R7xx Speculation

Status
Not open for further replies.
1 billion transistors is not very many, if R600 is ~700M.

E300 ~107M
R420 ~160M
R520 ~320M
R600 ~700M

If the multi-chip rumour is true, then wouldn't that add overhead transistors?

Jawed
 
I don't expect R700 to be a totally new architecture from R600. I'm thinking R700 will be somewhat like what R420 was to R300.

the next major architecture is probably R800.

ati has been doing a semi split devolpment team for awhile they said they were going to stop doing that around the R420 days but who knows if they did or not.
 
Well, considering both ATI and Intel are investigating multi-chip/multi-core scalable graphics chips. I'd imagine there's something there that makes it worth it.

It may just be something as simple as... At some point you aren't going to be able to continue shrinking the process and thus, you'll run into a wall for how many transistors you can put on a single chip.

1 billion transistors on a single chip is going to make for a monster of a chip even at a smaller process.

As such, is it possible to move the ringbus off chip? Such that it could serve multiple chips on a single substrate? Could this be one of the reasons that ATI has invested so heavily in that memory architecture?

And if so, would the complexity of such a solution be offset but using more chips of a simpler less transistor heavy design? I'd imagine this would improve overall yields, no?

It'd be interesting to see if Nvidia was also investigating a possible move to multi-chip/multi-core. Actually, I think we may already be seeing the signs of this with the NVIO chip.

Regards,
SB
 
ati has been doing a semi split devolpment team for awhile they said they were going to stop doing that around the R420 days but who knows if they did or not.

We talked about it in another thread and yes, although there are separate teams (because of distance) much of the resources and "production" was shared nowadays. there is no such thing as a "R600 team" anymore.

Although Multi-chip design is not a performance enhancer per-sé it does open roads to future improvements far beyond what is possible with the current limitations.
With R600 already being built with a parallel design in mind I can't see performance degrading when going to a multi-core design.

As Jawed pointed out, how are you going to feed such a beast when it's a group of mid-range processors stuck together.
My guess is that this design will incorporate clock domains from ati which will group four RV610's together clocked at insane speeds where the whole design will be made or broken with the interaction of the ring bus controller.
 
Last edited by a moderator:
Although Multi-chip design is not a performance enhancer per-sé it does open roads to future improvements far beyond what is possible with the current limitations.
In the future, going multi-chip may be the only way to get past a certain amount performance simply because single chip could run into area/power constraints, but given the same amount of aggregate shaders/texture/MC BW, it can never as efficient: it is just too easy on a single chip to add extremely high-BW buses. The cost of external buses is very high.

With R600 already being built with a parallel design in mind I can't see performance degrading when going to a multi-core design.
Exaclty how is R600 more built with parallel design mind than other GPUs?
 
In the future, going multi-chip may be the only way to get past a certain amount performance simply because single chip could run into area/power constraints, but given the same amount of aggregate shaders/texture/MC BW, it can never as efficient: it is just too easy on a single chip to add extremely high-BW buses. The cost of external buses is very high.

that's what I was trying to say yeah :(

Exaclty how is R600 more built with parallel design mind than other GPUs?

I meant parallel operations on the R600 seem to perform much better than on previous hardware and I can see work for a dispatcher that will resolve bottlenecks in some situations, but then again.. other bottlenecks arise with this kind of design.
 
Exaclty how is R600 more built with parallel design mind than other GPUs?

Well referring to this diagram from the B3D piece...

http://www.beyond3d.com/images/reviews/r600-arch/r600-big.png

It appears that the SPUs and ROPs (RBEs) are setup as 4 distinct and seperate groups of processing clusters. And that there's an overall command/setup(?) structure that controls the whole thing. And presumably it all communicates with each other over the ringbus.

On the surface at least, it would seem possible that you could

1. Move the ringbus off chip to maintain the same type of communication.
2. Have a central "command" processor chip.
3. Have multiple dedicated processing chips.

I realize this is a gross over-simplification of what is probably going on. But it wouldn't take much imagination to think that R600 was possibly just a stepping stone on the way to a multi-chip/multi-core architecture, R700 perhaps?

Regards,
SB

[Edit] Which makes me wonder if RV610 and RV630 are ways for them to experiment with different TMU/SPU ratios to find out which one works best for future multi-whatever chips.
 
Last edited by a moderator:
I'm not sure where the motivation for splitting up R700 will come from. Is it really going to be that complex of a chip where upcoming 65nm and 55nm processes will result in excessively large dies? Isn't 65nm something like a 50% reduction compared to 90nm?
 
I'm not sure where the motivation for splitting up R700 will come from. Is it really going to be that complex of a chip where upcoming 65nm and 55nm processes will result in excessively large dies? Isn't 65nm something like a 50% reduction compared to 90nm?
It's not an issue of die size as it is tape out costs, or so Arun has convinced me.
 
On the surface at least, it would seem possible that you could

1. Move the ringbus off chip to maintain the same type of communication.

No way you could move the ring bus off chip, that's rather ridiculous idea. It would slow down everythig by an order of magnitude.
 
On chip ring bus + "off chip" clients is a solution.

I don't know how much space each part of the R600 takes, but if texturing units consume a lot of die space that's one possible application.

By removing the ROPs (hey, doesn't that look like an extension of R600's "custom filters"?), only 2 parts remain, looking quite similar to the good old Voodoo2 design: TMU chips and 1 SP array + memory controller die.
 
I realize this is a gross over-simplification of what is probably going on. But it wouldn't take much imagination to think that R600 was possibly just a stepping stone on the way to a multi-chip/multi-core architecture, R700 perhaps?
Even if the ring bus could be used as a basis for inter-chip communication, hard as it is, it's probably one of the least difficult problems to solve. All it does, after all, is just transport data...

The overall architecture of how to partition a GPU into multiple dies and do it efficiently is much harder: what kind of data will travel between the dies? What will the memory architecture look like (more or less mirrored, like CF/SLI, or distributed and shared) ? Will it duplicate setup engines or will there be a master/slave configurations? etc.

That's why don't really see how the organization of the major blocks in R600 is significantly different from R580 or G80: there are no obvious indications of doing things significantly different that would make separation easier.
 
But isn't it already separated to an extent? At least the diagram would imply that the SPUs and ROPs are four separate entities.

Regards,
SB
 
It don't think it would be mirrored since if 2-4 medium size dice are used to make a high end board, I would expect the cost of equivalent amounts of useable memory (compared to a single chip design) would be significant. On the other hand, I'd also think that full-speed texturing from non-local memory would require significantly more latency tolerance than would otherwise be the case, so that would adversely affect performance/mm2.

silent_guy - is chip to chip latency be significantly reduced by placing multiple chips into a single package (like Clovertown)? Also, another possibly crazy question: is it possible to build a compute tile (where in this case a compute tile is an entire GPU) where each tile connects to the top/bottom/left/right tile on the same wafer? That way, maybe one could actually cut different sized dies out of a wafer - you'd cut say 2x2 tiles for high-end dies, 1x1 for low end, 1x2 for midrange. I imagine each tile would be connected to the neighbours with some fairly wide/high speed bus and each tile would be designed to handle the fact that the bus leading to any one of the 4 neighbours might go nowhere...
 
But isn't it already separated to an extent? At least the diagram would imply that the SPUs and ROPs are four separate entities.

Regards,
SB

If you look at the Z-buffer you'll see it's connected both to the "scheduler" part and to the "output" part.

psurge> Clovertown doesn't have die to die connection, they rely on the "Netburst" FSB for that.
 
I'm not sure where the motivation for splitting up R700 will come from. Is it really going to be that complex of a chip where upcoming 65nm and 55nm processes will result in excessively large dies? Isn't 65nm something like a 50% reduction compared to 90nm?

One of the trueisms/mantras of engineering is something like "optimize for the common case, not the corner case". Maybe that idea is being applied here as well. Especially when that tiny percentage of high-end buyers have proven that cost isn't a major concern for them, you can start building into your models the idea that you can stick them with the extra memory costs associated with SLI/CF types of implementations.

But some of you old timers know that I've been expecting this kind of thing that Inq is suggesting re R700 to become common for two years or more.

The fly in that ointment to some degree, however, is the experience with two GX2, which didn't seem too promising, frankly.
 
But some of you old timers know that I've been expecting this kind of thing that Inq is suggesting re R700 to become common for two years or more.

512-bit multichip boards all the way geo
 
...
The fly in that ointment to some degree, however, is the experience with two GX2, which didn't seem too promising, frankly.

I think the way two GX2 didn't seem too promising since the GPU wasn't designed for multiple chip in mine (architecture wise). The base line on G70 architecture would be for dual chip only. Anyway, more than 2 chips would work (theoretically) but it might not at its best spot comparison to that of dual chips in SLi/Crossfire setup. This same analogy also could be seen in the CPU area, that the more you add the processor doesn't mean the more performance you can get.

G80 and R600 would be more interesting to see if it could do well for more than 2 chips SLi/Crossfire.

Regards,

Edit: typo as usual...
 
Status
Not open for further replies.
Back
Top