AMD: Southern Islands (7*** series) Speculation/ Rumour Thread

Since when does Dell (or any OEM, for that matter) advertise the amount of memory bandwidth on the graphics card?
 
Anybody else think that Cayman's ALU:TEX ratio is a bit on the low side? The other thing is that Cayman's texturing capacity is obviously way over-specified. I can't see them further doubling the number of texture units so the ALU:TEX could/should increase in SI.

Option 1: 32 wide SIMDs
Requires a doubling of register file bandwidth and wavefronts would execute over two cycles instead of four. Doesn't seem impossible.

Option 2: Multiple SIMDs share a quad-TMU
Not sure how feasible this is with AMD's predetermined execution latencies for each clause.

Option 3: ????
 
Anybody else think that Cayman's ALU:TEX ratio is a bit on the low side?
I don't. AFAIK, it has 16 ALUs per texture sampler, thats
32 flops per bilinear sample (same as GF110, 24:1 for GF114)
64 flops per trilinear sample (same as GF110, 48:1 for GF114)
up to 1024 flops per sample with 16xAF (same as GF110, 768:1 for GF114)
up to 2048 flops per sample with 64bit textures. (GF110 is 1024:1, GF114 is 768:1)
 
I don't. AFAIK, it has 16 ALUs per texture sampler, thats
32 flops per bilinear sample (same as GF110, 24:1 for GF114)
64 flops per trilinear sample (same as GF110, 48:1 for GF114)
up to 1024 flops per sample with 16xAF (same as GF110, 768:1 for GF114)
up to 2048 flops per sample with 64bit textures. (GF110 is 1024:1, GF114 is 768:1)

Exactly, on paper it's the same as GF110 but in practice it's going to be lower due to lower ALU utilization. Unless texturing on Cayman is extremely inefficient it just has too many units. The 6970 has twice the texturing capacity as a 570 and is only on par performance wise. If they keep this ratio then even more transistors will be wasted doing texturing on SI. Note that the 570 also has lower numbers for bandwidth, fillrate and flops.

I wouldn't be surprised if full speed FP16 filtering makes its debut as well so that 64-bit ratio can potentially come down too.
 
Wow nice! It will be a lot of fun to see how they tackle those problems and see whether they can do it more efficiently than nVidia has managed to.

Wonder how that tidbit got out. Did some good wine loosen tongues at the dinner? :)
 
I personally find that very odd, so soon after moving from VLIW5 to VLIW4, which must have been quite time-consuming.
 
my inner voice tells me it is 8000 series not 7000s..

Or could it be just that the 32nm cancellation, which caused 6000-series delay (IIRC 6 months was at least called somewhere), made them scratch the "original 7000" and start rushing "original 8000" series as "7000s"?
 
Wonder if they'll stick with the precompiled clause approach and avoid the scheduler and scoreboarding overhead. Could be the best of both worlds.

This makes complete sense to me. A 64-wide SIMD could run the same scalar instruction on 64 threads/pixels/vertices with the same bandwidth requirements as today's 16-wide VLIW4 and gain higher efficiency in the process. The challenge is branch granularity, they would need to process a wavefront in a single cycle instead of 4.

Or maybe each SIMD is only 16 wide with 4 of them executing 4 different wavefronts in parallel. Very similar to an nVidia SM but with potentially much lower control overhead if they don't do hardware instruction scheduling.
 
Ah yes just saw that. So no more clauses. They seem to be embracing a lot of things nVidia has been preaching for years. Guess it'll come down to who has the best implementation.
 
Back
Top