AMD: Southern Islands (7*** series) Speculation/ Rumour Thread

fellix · Dec 16, 2011

rpg.314 said:
32 ROPs are surprisingly low. For a 50% bw increase, if they can get ~40% more performance, then I'd say a job very well done.

I'm more disappointed by the lack of Z-rate improvements, given the steady bump in memory throughput.

rpg.314 · Dec 16, 2011

The ROPs on AMD do 4x more z/stencil ops, and this has been there for a long time now, presumably by using the 4 channels. So that much seems unlikely to change. Which leaves the Z rate a function of ROPs.

Kaotik · Dec 16, 2011

rpg.314 said:
7950 will have less mem channels then.

Can't be that, we already saw the board with 2x6pin which means 7950, and it had 12 mem chips

anexanhume · Dec 16, 2011

Man from Atlantis said:
43% more transistors than GF110
65% than Cayman

And with a supposedly better architecture in GCN, it's hard to swallow it's only 20% better than the 580.

3dilettante · Dec 16, 2011

We should wait until we see the die size and TDP confirmed.

The target die size is lower and the stock power limits probably cap performance in a power-dominated situation.

DarthShader · Dec 16, 2011

Maybe those transistors went into power saving features and GPGPU features -the caches, EEC, etc.

3dilettante · Dec 16, 2011

At least some of the transistors went there. Some power-saving techniques such as clock or power gating can also consume more die area than transistors, since they involve physically larger gates and circuits.

I am curious if the < 3W idle power means there is power gating involved.

anexanhume · Dec 16, 2011

DarthShader said:
Maybe those transistors went into power saving features and GPGPU features -the caches, EEC, etc.

My impression was that AMD was decidedly less focused on the GPGPU side and that GCN was supposed to be more optimized for gaming workloads compared to their 4+1 scheme of yore.

3dilettante said:
I am curious if the < 3W idle power means there is power gating involved.

Wouldn't it HAVE to be? That many transistors means your leakage would exceed that without mitigation techniques like power gating in my mind.

edit: Quick calc. It's safe to assume at least 10% power loss due to leakage. On a 300W card, that's 30W. Even if you downclock 1/10 of the frequency and leakage tracks with frequency, you're still looking at 3W, and you've not allowed anything but leakage in that 3W. I just don't see how it's possible without power gating.

3dilettante · Dec 16, 2011

4+1 was a better match to game shaders, since it devoted the +1 to a lot of specialized functionality that either did not get used in scientific computing or lacked the precision to be used for serious computation.

anexanhume · Dec 16, 2011

3dilettante said:
4+1 was a better match to game shaders, since it devoted the +1 to a lot of specialized functionality that either did not get used in scientific computing or lacked the precision to be used for serious computation.

Interesting. I'd love to see two cards as closely matched as can be (ALU count, core and memory clocks etc.) run the same benchmarks and see how GCN fares.

3dilettante · Dec 16, 2011

A lot of other features of the design have changed outside of the ALU composition, so I would doubt GCN would lose unless the 4+1 design also inherited the new scheduling hardware and memory pipeline.

When comparing the 69xx to the 68xx series there were some games where the 5+1 architecture's slightly higher peak performance helped it match or beat the VLIW4.
In terms of specialized functions, at least synthetic benchmarks showed that the penalty to FP throughput due to formerly T-slot instructions occupying the ALUs in VLIW4 was measurable.
The problem was worse for some instructions than others, as some like sin and cos required setup code that took up slots anyway.

fellix · Dec 16, 2011

The VLIW-5 construct was a compromise for both pixel and vertex workloads in a unified shader architecture. It matched pretty well the dominant co-issue types in graphics, like vec3+1, vec4 and vec4+1 in a singular instruction bundle.
Abstract compute workloads have much more variable combinations.

Kaotik · Dec 16, 2011

From what I can recall about those different archictecures, VLIW5 was definately best for gaming workloads, while VLIW4 suited better for GPGPU without sacrificing much gaming speed.
GCN is bit of a mystery, it was definately made GPGPU in mind, but should be great for games too - we can see from nVidia products that 1D at least can work great with games and with GPGPU, but GCN isn't quite the same even if it is 1D

Man from Atlantis · Dec 16, 2011

Radeon HD 7970 is around 30 percent faster

mczak · Dec 16, 2011

3dilettante said:
We should wait until we see the die size and TDP confirmed.

The target die size is lower and the stock power limits probably cap performance in a power-dominated situation.

With "only" 65% more transistors than 6970, something like ~50% faster still doesn't sound that great (especially considering Cayman didn't have the best perf/area ratio, still it's not too bad). The speculated die size (similar to Cayman) is indeed rather large though, I might miss some details but theoretically you could fit twice as many transistors on 28nm compared to 40nm within the same area.

Kaotik · Dec 16, 2011

Man from Atlantis said:
Radeon HD 7970 is around 30 percent faster

Wait what?
"around 30% faster than 6970, should put it somewhere around 6990" :???:

Unlike OBR, Fudzilla says it's better in gaming though.

edit:
For clarification, 6990 is around 60% faster even at mere 1920x1200

3dilettante · Dec 16, 2011

mczak said:
With "only" 65% more transistors than 6970, something like ~50% faster still doesn't sound that great (especially considering Cayman didn't have the best perf/area ratio, still it's not too bad). The speculated die size (similar to Cayman) is indeed rather large though, I might miss some details but theoretically you could fit twice as many transistors on 28nm compared to 40nm within the same area.

In theory it could be almost 2x the transistors. They weren't going to get 1/2 the power consumption per transistor to be able to use them.
Density would also have been impacted by the wider interface and memory controller, since those don't contribute as much to the transistor count, but do consume area.
Power gating, if in use, also takes up more physical area than it contributes in transistors.

air_ii · Dec 16, 2011

mczak said:
With "only" 65% more transistors than 6970...

Unless AMD miscalculated the transistor count by some 1 billion (as with Bulldozer)

.

rpg.314 · Dec 16, 2011

mczak said:
With "only" 65% more transistors than 6970, something like ~50% faster still doesn't sound that great (especially considering Cayman didn't have the best perf/area ratio, still it's not too bad). The speculated die size (similar to Cayman) is indeed rather large though, I might miss some details but theoretically you could fit twice as many transistors on 28nm compared to 40nm within the same area.

The bandwidth is 50% higher, so that's an upper bound right there.

silent_guy · Dec 16, 2011

rpg.314 said:
The bandwidth is 50% higher, so that's an upper bound right there.

Do anyone, by any chance, have a pointer to sites with benchmark shmoos for shader and memory clocks? Especially where wide ranges are used, not just minor overclocks.

It'd be interesting to see how big of a factor external BW really is.

AMD: Southern Islands (7*** series) Speculation/ Rumour Thread

fellix

rpg.314

Kaotik

Drunk Member

anexanhume

3dilettante

DarthShader

3dilettante

anexanhume

3dilettante

anexanhume

3dilettante

fellix

Kaotik

Drunk Member

Man from Atlantis

mczak

Kaotik

Drunk Member

3dilettante

air_ii

rpg.314

silent_guy

Similar threads