AMD: Southern Islands (7*** series) Speculation/ Rumour Thread

Even then, a ~30% vs ~15% delta seems like inefficient design.
As I said, it depends on the workload. Barts only has 75% of the compute power of Cypress, so if you are interested in compute, then Cypress wins. Not to mention Cypress has more memory bandwidth, which can be helpful for compute as well. If you want to use doubles or FMA, then Barts isn't even an option.

If you want to claim that having double support in GPUs for games is a waste, then you're right, but so what? Do you want double support in GPUs or not? What about CPUs? Do doubles get used in games? Probably not, so it's a waste there as well.
 
Can things be clock gated on a cycle by cycle basis? I.e. one could imagine clock-gating the DP units (if they are separate from other ALUs) most of the time, or clock-gating lanes in a SIMD that have been predicated off.

Are these things already commonly done?
 
Can things be clock gated on a cycle by cycle basis? I.e. one could imagine clock-gating the DP units (if they are separate from other ALUs) most of the time, or clock-gating lanes in a SIMD that have been predicated off.

Are these things already commonly done?
There might be small optimizations to be made, but with most designs in the industry the low hanging fruit has been picked with regard to fine grain clock gating.
 
I think a bigger problem might be that bulk power gate transistors for gating several watts aren't going to be easy in the first place. Only Intel does that and Intel, is well, Intel.
I had actually not considered that. But obviously, you would have to disable CUs in multiple of 4.

bobcat does power gating, and i remember reading that the new tablet SOC (Hondo) is meant to have more power gating, ie per core not just the L2.
 
So most probably:

HD77xx and up: GCN architecture
HD75xx and/or HD76xx: VLIW4 architecture
HD74xx: probably reserved for iGPUs in lower-end Trinity APUs: VLIW4 architecture
HD73xx and down: probably rebadged Caicos discrete cards, along with Krishna/Wichita iCPUs (which may very well bring integrated Caicos): VLIW5 architecture (160sp, 8TMUs, 4 ROPs)

I doubt they'll go with 3 different architectures.
High-end GCN
Midrange & down VLIW4, possibly less models than before but I think the time for cutting models isn't here yet, might be next gen though.
 
I don't see any reason to keep VLIW5 around. In fact, it's probably best to do away with VLIW4 as well.
 
I doubt they'll go with 3 different architectures.
High-end GCN
Midrange & down VLIW4, possibly less models than before but I think the time for cutting models isn't here yet, might be next gen though.

I don't see any reason to keep VLIW5 around.

I don't think you understood my point with the VLIW5 in the "family".

Basically, you both think AMD will "bother" to develop a low-end VLIW4 GPU.

I think they won't, and they'll just rebrand the existing Caicos from HD64xx to HD73xx.
Just like they rebranded Cedar from HD54xx to HD63xx.


In fact, it's probably best to do away with VLIW4 as well.
Not in SI.
A VLIW4 GPU in discrete cards for mid-low end is pretty much confirmed right now, because of Hybrid Crossfire compatibility with Trinity.
 
I don't think you understood my point with the VLIW5 in the "family".

Basically, you both think AMD will "bother" to develop a low-end VLIW4 GPU.

I think they won't, and they'll just rebrand the existing Caicos from HD64xx to HD73xx.
Just like they rebranded Cedar from HD54xx to HD63xx.

What about the mid-range, i.e. the replacement for Turks?


Not in SI.
A VLIW4 GPU in discrete cards for mid-low end is pretty much confirmed right now, because of Hybrid Crossfire compatibility with Trinity.

I don't know, would it really be much more difficult to have Hybrid Crossfire with a VLIW4 APU and a GCN discrete GPU? And wouldn't prolongating the lifespan of VLIW4 hurt AMD's efforts to promote GPGPU?
 
They are not if it isn't a GT200.

Do you think this might make sense again soon? In particular, I'm wondering whether there is a power penalty for using a more flexible SIMD that can do N SP fmas and N/2 or N/4 DP fmas versus having dedicated DP/SP units. (The area cost need not lead to a leakage cost if DP units are power gated, as they could be in games).

I guess it's a very small scale version of the general purpose homogeneous vs. more specialized heterogeneous question.
 
Do you think this might make sense again soon? In particular, I'm wondering whether there is a power penalty for using a more flexible SIMD that can do N SP fmas and N/2 or N/4 DP fmas versus having dedicated DP/SP units. (The area cost need not lead to a leakage cost if DP units are power gated, as they could be in games).
I don't think that makes sense. If you have more units you also need to get signals there which isn't free neither. Maybe though you could actually clockgate parts of your multiplier (if you go for half-rate DP mul), the rest of the alu should be used by both SP and DP well mostly.
 
I don't know, would it really be much more difficult to have Hybrid Crossfire with a VLIW4 APU and a GCN discrete GPU? And wouldn't prolongating the lifespan of VLIW4 hurt AMD's efforts to promote GPGPU?
As far as Hybrid CF and GCN/vliw4, I have no idea but in the past they have had to be very similar architectures for it to work, ex- Why doesn't Llano CF with anything higher than 6670?

I don't think AMD is too focused on GPGPU with $100. Sure it is a great feature to offer and can be put to use but it isn't a huge selling point. VLIW4 is a step in the right direction, as far as GPGPU, well over vliw5 it is.
 
A VLIW4 GPU in discrete cards for mid-low end is pretty much confirmed right now, because of Hybrid Crossfire compatibility with Trinity.
Crossfire was rearchitected some time ago (in fact, the pretty cool scaling you see now on high end GPU's owes a lot to that rearchitecture):

http://www.pcper.com/reviews/Graphi...Preview-Changes-multi-GPU-multi-display-users

http://news.softpedia.com/news/AMD-Catalyst-10-2-and-10-3-Preview-135289.shtml

Another critical update that will be enabled in the Catalyst 10.2 is the rearchitecture of CrossFire. According to the chip maker, the new drive will see some of the CrossFire code moved from the 3D drive to a separate driver component, in an attempt to prepare the Catalyst Software Suite for future AMD products, namely the much-anticipated Fusion products, such as the 2011-bound 'Llano' APU. In addition, the separate drive will also enable users to mix and match ATI graphics cards from different generations.
 
As far as Hybrid CF and GCN/vliw4, I have no idea but in the past they have had to be very similar architectures for it to work, ex- Why doesn't Llano CF with anything higher than 6670?
That is a factor of scaling. If one is too imbalanced with the other then you start generating more cases of negative scaling.
 
Maybe they would buy it because they need a discrete card? Or they don't want an APU? Or they have an older system?

If they need a discrete card surely that can pay $70-80 to get a mid-range card instead of a low end card for $40 right? The midrange card typically gets you far higher performance as well (eg. Redwood v/s Cedar, Turks v/s Caicos). Same goes if they have an older system. About not wanting an APU, well with essentially all of Intel's desktop processors set to be APU's, and most of AMD's as well, it appears they would have no choice anyway.

It will be 4 chips again. They still need a ~100mm2 8-12SIMD vliw4 discrete card to buddy up with Trinity and take care of the <$100 and HTPC market.
I would consider it more of a low-mid range chip. I think we have seen the last of the 64bit cards from AMD though.

The chips as I see them;
Highend GCN 32CUs
Performance GCN 24CUs (matching GTX580)
Midrange GCN 16CUs (somewhere between 6950/6870, maybe a 5870?)
Lowend VLIW4 8-12SIMDs (around a 6790/6770)

A 100mm2 GPU's performance would not be that much better than a comparable APU (the die size of the GPU in Trinity looks set to be at least 100 mm2). And ive answered the part about VLIW4 card to buddy up with Trinity below. HTPC can go for an APU as well, if they want discrete they can go for the $70-80 midrange GPU like ive said above.

bobcat does power gating, and i remember reading that the new tablet SOC (Hondo) is meant to have more power gating, ie per core not just the L2.

Are you sure about that? Afaik TSMC does not offer power gating on 40nm.

I doubt they'll go with 3 different architectures.
High-end GCN
Midrange & down VLIW4, possibly less models than before but I think the time for cutting models isn't here yet, might be next gen though.

I see the high end and upper midrange as GCN (Cayman and Barts replacements essentially) and the lower midrange (Juniper/Turks replacement) as VLIW4 if they decide to go with it

I don't think you understood my point with the VLIW5 in the "family".

Basically, you both think AMD will "bother" to develop a low-end VLIW4 GPU.

I think they won't, and they'll just rebrand the existing Caicos from HD64xx to HD73xx.
Just like they rebranded Cedar from HD54xx to HD63xx.

Not in SI.
A VLIW4 GPU in discrete cards for mid-low end is pretty much confirmed right now, because of Hybrid Crossfire compatibility with Trinity.

Why is keeping VLIW4 around for hybrid crossfire with Trinity such a big deal? How big is the market for gaming laptops? I'd venture to say its less than 5% of the total laptop sales. Most people buy a laptop just for general usage. Developing a graphics chip just to address that 5% of the market is just not worth it IMHO. You'd be better off offering a higher performance discrete GPU in that segment anyway.

Im not saying that they wont have a VLIW4 architecture. It may be that a VLIW4 architecture is more efficient at graphics than GCN, which is more compute oriented. Compute is not extremely important in the midrange segment and if they get a lower die size with VLIW4 that may be the deciding factor in choosing VLIW4 instead.
 
They are gating 2W, not 20W.

so that shifting the goal posts, the statement was:

I think a bigger problem might be that bulk power gate transistors for gating several watts aren't going to be easy in the first place. Only Intel does that and Intel, is well, Intel.

also under load a bobcat core can consumer way more then 2W more like around 5 watts per core.

dumb question: what does peak power usage have to do with power gating? its not like it disapates the enwegy.
 
Back
Top