AMD: R7xx Speculation

Status
Not open for further replies.
Radeon HD 4850 may have 1GB version

We learned from CHIPHELL Forum, AMD and ATI are preparing to introduce Radeon HD 4850 Graphics Card with 1GB video memory.

In accordance with established practice, ATI designs the card, and one AIB manufacturer will produce the card, and bulk shipments to other AIB manufacturers, bear their respective logos for sales, its so-called ATI reference version.

With original plan, AMD and ATI will let AIB manufacturers to design their own Radeon HD 4850 1GB graphics cards. We believe that by this new decision, AMD and ATI will raise the performance of RV770Pro, helping Radeon HD 4850 to get better scores in the evaluation tests published next month with the announcement of RV770 series.

We also think AMD and ATI have got some leaked scores of rival products, which sent by moles, so, AMD and ATI thought RV770 Pro would get less winning chance with only 512MB GDDR3 video memory, especially in 3DMark Vantage.

In addition, according to AIB manufacturers that Radeon HD4850 512MB GDDR3 graphics, cards will use the red PCB with 8 layers.

http://www.pczilla.net/en/post/20.html
 
It doesn't have to. Smaller parts are cheaper to manufacture and, generally, have higher yields.

Smaller parts individually are cheaper to manufacture and have higher yields than larger parts, but this is not really an apples to apples comparison here, because we are comparing multiple smaller parts vs single larger parts.

My point was that your claim that the multi-chip board having to be faster than a single chip board is wrong: The multi-chip board just needs to be competitive. Competitive does not mean slower, it means competitive: It could be faster, it could be slower, it could be exactly the same speed.

If a multi-die solution is only approximately as fast as a single die solution, then most consumers will be better off going the route of the single die solution, due to less software-related glitches and issues. After all, putting two smaller dies on one pcb doesn't just magically erase all the issues and limitations that people must face when using SLI and Crossfire systems.

Except that you may orphan the single, large-die product by having a multi-chip solution of a smaller part with higher performance. And that would hurt sales/margins on the large-die product.

As long as a multi-die solution is the top performing card, then it can be priced relatively high without forcing the company to dramatically slash the price of lower performance single die products.

One other benefit of developing "monolithic" gpus is that, even though R&D cost is significant, sometimes the technology for the high end gpu can trickle down to lower cost variants as long as the architecture is flexible and scaleable.
 
The big thing with 4870X2 is whether AMD truly has some major improvement in getting two chips to work together up it's sleeve as speculated (I lean towards doubting that they do).

If not, the multi-chip solution is probably a loser.

The problem with multi-chip solutions:
- You have to load geometry, textures, shaders, etc into memory for each GPU.
- You have to share constants and variables between GPUs.
- Either one GPU computes geometry or it is done for two separate frames.

So if rumors are correct and both GPUs use a shared memory interface all the above problems disappear. So for example you should be able to split geometry processing between both GPUs and share textures, geometry, shaders etc.

Another advantage would be cost because you can implement without double the memory.

In a way multi-core gpus makes more sense then multi-core cpus. Multi-core cpus are idle most of the time while GPUS can be highly saturated.
 
Last edited by a moderator:
The problem with multi-chip solutions:
- You have to load geometry, textures, shaders, etc into memory for each GPU.
- You have to share constants and variables between GPUs.
- Either one GPU computes geometry or it is done for two separate frames.

So if rumors are correct and both GPUs use a shared memory interface all the above problems disappear. So for example you should be able to split geometry processing between both GPUs and share textures, geometry, shaders etc.

Another advantage would be cost because you can implement without double the memory.

In a way multi-core gpus makes more sense then multi-core cpus. Multi-core cpus are idle most of the time while GPUS can be highly saturated.

How do you split geometry processing?
 
Morgoth, same way you do it now inside the chip. Any parallelism which can work on chip can work with multiple chips, it's just that on chip bandwidth is much cheaper. This type of parallelism would essentially mean looping the ring bus through an external connection ... a lot of chip real-estate which would go to waste on single chip setups. They would also have to engineer a chip to chip interconnect operating at very high frequencies ...

Seems easier to just design a low end and a high end chip (like Nvidia seems to be doing). In theory a truly scaleable design would be nice, in practice the present high end is already too niche ... being able to efficiently scale past 2 chips wouldn't do much for the bottom line.
 
Last edited by a moderator:
Morgoth, same way you do it now inside the chip. Any parallelism which can work on chip can work with multiple chips, it's just that on chip bandwidth is much cheaper. This type of parallelism would essentially mean looping the ring bus through an external connection ... a lot of chip real-estate which would go to waste on single chip setups. They would also have to engineer a chip to chip interconnect operating at very high frequencies ...

Seems easier to just design a low end and a high end chip (like Nvidia seems to be doing).

Well, yeah, if you end up nearly fusing two discrete dies it might work (as in having a very high-speed interconnect and the logic you speak of), but then the question arises of whether or not it was best to have a shot at a single big chip in the first place-because once you get an interconnect that's fast enough in place, coupled with the logic required for shared memory access, it kindof starts looking like a big arsed chip, doesn't it?
 
This type of parallelism would essentially mean looping the ring bus through an external connection ... a lot of chip real-estate which would go to waste on single chip setups.
A bit like CrossFire connectivity. Though I think it's reasonable to assume there'll be rather more bandwidth afforded to the inter-die connection.

And, presumably, there'll need to be a separate CrossFire connection, in addition, for connecing a pair of R700 boards.

They would also have to engineer a chip to chip interconnect operating at very high frequencies ...
According to this:

Memory controller with ring bus for interconnecting memory clients to memory devices

[25] ... In one embodiment, the data width of each ring is 256-bits running at system clock (e.g., 500 MHz). This is generally sufficient to support eight 1.2 GHz 32-bit memory channels.
So two rings running at 500MHz is enough for 76.8GB/s.

To support ~128GB/s of memory a 512-bit ring bus is going to need to run at ~833MHz. Does that indicate HD4870's approximate clock?

How much bandwidth do non-memory transfers take? Do any non-memory transfers run on the ring bus? (I'm assuming that all transfers across the PCI Express bus also count as memory transfers for the sake of this question.)

Seems easier to just design a low end and a high end chip (like Nvidia seems to be doing). In theory a truly scaleable design would be nice, in practice the present high end is already too niche ... being able to efficiently scale past 2 chips wouldn't do much for the bottom line.
NVidia apparently designs 4 chips to make up a family, though over the last couple of years fortune hasn't smiled upon the bottom chip.

Jawed
 
Well, yeah, if you end up nearly fusing two discrete dies it might work (as in having a very high-speed interconnect and the logic you speak of), but then the question arises of whether or not it was best to have a shot at a single big chip in the first place-because once you get an interconnect that's fast enough in place, coupled with the logic required for shared memory access, it kindof starts looking like a big arsed chip, doesn't it?

But what if you didn't have the resources to design and debug a really big chip, without possibility of massive delays, in the first place?
 
But what if you didn't have the resources to design and debug a really big chip, without possibility of massive delays, in the first place?

Then you probably also don't have the resources required to make the fusing of two smaller chips work in the way that was described. Because getting a fast enough interconnect and an adjusted ring-bus is not as easy as pie, and opens you up to other dangers. Which is something AMD hardly needs. As a consequence, I'm very curious just how adventurous they ended up being with the supposed X2 part.
 
Then you probably also don't have the resources required to make the fusing of two smaller chips work in the way that was described. Because getting a fast enough interconnect and an adjusted ring-bus is not as easy as pie, and opens you up to other dangers. Which is something AMD hardly needs. As a consequence, I'm very curious just how adventurous they ended up being with the supposed X2 part.

I disagree. Seems like building two smaller chips then only having to focus on the inteconnect part to build a X2 is simpler, less risky, and less delay prone than building one monolithic chip.

Of course I dont know anything about this stuff compared to you guys but :p
 
Cooling solutions of R(V)7xx / HD4000 series:
http://diybbs.pconline.com.cn/topic.jsp?tid=8641624
Very interesting.

Few possible scenarios just by looking at the cooling solutions:
- R700 uses two RV770 Pro chips for thermal management, RV770 XT runs really fast and needs that R600 slot cooler and RV770 has a TDP of ~100w;
- R700 uses two RV770 XT chips, RV770 XT's cooler is overkill and RV770 Pro is roughly the same as RV670 (TDP wise).

EDIT: Is it just me, or does the die footprint(s) seem larger on the R700 cooler compared to the RV770 XT/Pro cooler?
 
I disagree. Seems like building two smaller chips then only having to focus on the inteconnect part to build a X2 is simpler, less risky, and less delay prone than building one monolithic chip.

Of course I dont know anything about this stuff compared to you guys but :p

Of course, that assumes that building and integrating such an interconnect is:

a)Simple- there's no indication of this being the case

b)Efficient- if you end up wasting a lot of trannies in single chip just to make em multi chip ready, what's the benefit

c)A one time thing- namely you do the research once, and then can simply plug it in your chips going forward, with moderate adjustments-if you have to rework the implementation for each new generation, you're getting into more trouble than it's worth.

It's not necessarily undoable, but as soon as you start looking at implementation details you start seeing that you're not gaining all that much from making a single big chip by taking the "proper" multi-chip route. I'd love to be wrong on this, but my hunch is that AFR will be with us for quite some time, with the multi-chip approach simply getting some tweaks, but nothing really paradigm shifting, like MfA was suggesting.
 
Status
Not open for further replies.
Back
Top