AMD: Southern Islands (7*** series) Speculation/ Rumour Thread

The ROPs on AMD do 4x more z/stencil ops, and this has been there for a long time now, presumably by using the 4 channels. So that much seems unlikely to change. Which leaves the Z rate a function of ROPs.
 
We should wait until we see the die size and TDP confirmed.

The target die size is lower and the stock power limits probably cap performance in a power-dominated situation.
 
At least some of the transistors went there. Some power-saving techniques such as clock or power gating can also consume more die area than transistors, since they involve physically larger gates and circuits.

I am curious if the < 3W idle power means there is power gating involved.
 
Maybe those transistors went into power saving features and GPGPU features -the caches, EEC, etc.

My impression was that AMD was decidedly less focused on the GPGPU side and that GCN was supposed to be more optimized for gaming workloads compared to their 4+1 scheme of yore.

I am curious if the < 3W idle power means there is power gating involved.

Wouldn't it HAVE to be? That many transistors means your leakage would exceed that without mitigation techniques like power gating in my mind.

edit: Quick calc. It's safe to assume at least 10% power loss due to leakage. On a 300W card, that's 30W. Even if you downclock 1/10 of the frequency and leakage tracks with frequency, you're still looking at 3W, and you've not allowed anything but leakage in that 3W. I just don't see how it's possible without power gating.
 
Last edited by a moderator:
4+1 was a better match to game shaders, since it devoted the +1 to a lot of specialized functionality that either did not get used in scientific computing or lacked the precision to be used for serious computation.
 
4+1 was a better match to game shaders, since it devoted the +1 to a lot of specialized functionality that either did not get used in scientific computing or lacked the precision to be used for serious computation.

Interesting. I'd love to see two cards as closely matched as can be (ALU count, core and memory clocks etc.) run the same benchmarks and see how GCN fares.
 
A lot of other features of the design have changed outside of the ALU composition, so I would doubt GCN would lose unless the 4+1 design also inherited the new scheduling hardware and memory pipeline.

When comparing the 69xx to the 68xx series there were some games where the 5+1 architecture's slightly higher peak performance helped it match or beat the VLIW4.
In terms of specialized functions, at least synthetic benchmarks showed that the penalty to FP throughput due to formerly T-slot instructions occupying the ALUs in VLIW4 was measurable.
The problem was worse for some instructions than others, as some like sin and cos required setup code that took up slots anyway.
 
The VLIW-5 construct was a compromise for both pixel and vertex workloads in a unified shader architecture. It matched pretty well the dominant co-issue types in graphics, like vec3+1, vec4 and vec4+1 in a singular instruction bundle.
Abstract compute workloads have much more variable combinations.
 
From what I can recall about those different archictecures, VLIW5 was definately best for gaming workloads, while VLIW4 suited better for GPGPU without sacrificing much gaming speed.
GCN is bit of a mystery, it was definately made GPGPU in mind, but should be great for games too - we can see from nVidia products that 1D at least can work great with games and with GPGPU, but GCN isn't quite the same even if it is 1D
 
We should wait until we see the die size and TDP confirmed.

The target die size is lower and the stock power limits probably cap performance in a power-dominated situation.
With "only" 65% more transistors than 6970, something like ~50% faster still doesn't sound that great (especially considering Cayman didn't have the best perf/area ratio, still it's not too bad). The speculated die size (similar to Cayman) is indeed rather large though, I might miss some details but theoretically you could fit twice as many transistors on 28nm compared to 40nm within the same area.
 
With "only" 65% more transistors than 6970, something like ~50% faster still doesn't sound that great (especially considering Cayman didn't have the best perf/area ratio, still it's not too bad). The speculated die size (similar to Cayman) is indeed rather large though, I might miss some details but theoretically you could fit twice as many transistors on 28nm compared to 40nm within the same area.

In theory it could be almost 2x the transistors. They weren't going to get 1/2 the power consumption per transistor to be able to use them.
Density would also have been impacted by the wider interface and memory controller, since those don't contribute as much to the transistor count, but do consume area.
Power gating, if in use, also takes up more physical area than it contributes in transistors.
 
With "only" 65% more transistors than 6970, something like ~50% faster still doesn't sound that great (especially considering Cayman didn't have the best perf/area ratio, still it's not too bad). The speculated die size (similar to Cayman) is indeed rather large though, I might miss some details but theoretically you could fit twice as many transistors on 28nm compared to 40nm within the same area.

The bandwidth is 50% higher, so that's an upper bound right there.
 
rpg.314 said:
The bandwidth is 50% higher, so that's an upper bound right there.
Do anyone, by any chance, have a pointer to sites with benchmark shmoos for shader and memory clocks? Especially where wide ranges are used, not just minor overclocks.

It'd be interesting to see how big of a factor external BW really is.
 
Back
Top