AMD: Southern Islands (7*** series) Speculation/ Rumour Thread

Sinistar · Jun 11, 2011

Still listed here

Squilliam · Jun 12, 2011

Rename?

Alexko · Jun 12, 2011

Since when does Dell (or any OEM, for that matter) advertise the amount of memory bandwidth on the graphics card?

rpg.314 · Jun 12, 2011

Alexko said:
Since when does Dell (or any OEM, for that matter) advertise the amount of memory bandwidth on the graphics card?

They could atleast tell us what GPU is in there.

Would you like to buy an Intel 2GB machine?

trinibwoy · Jun 13, 2011

Alexko said:
Since when does Dell (or any OEM, for that matter) advertise the amount of memory bandwidth on the graphics card?

Yeah that's really weird. Completely inconsistent with their normal GPU descriptions.

trinibwoy · Jun 13, 2011

Anybody else think that Cayman's ALU:TEX ratio is a bit on the low side? The other thing is that Cayman's texturing capacity is obviously way over-specified. I can't see them further doubling the number of texture units so the ALU:TEX could/should increase in SI.

Option 1: 32 wide SIMDs
Requires a doubling of register file bandwidth and wavefronts would execute over two cycles instead of four. Doesn't seem impossible.

Option 2: Multiple SIMDs share a quad-TMU
Not sure how feasible this is with AMD's predetermined execution latencies for each clause.

Option 3: ????

DarthShader · Jun 13, 2011

Option 2 has been speculated already a year ago by Jawed, he found some patents too.

Thowllly · Jun 13, 2011

trinibwoy said:
Anybody else think that Cayman's ALU:TEX ratio is a bit on the low side?

I don't. AFAIK, it has 16 ALUs per texture sampler, thats
32 flops per bilinear sample (same as GF110, 24:1 for GF114)
64 flops per trilinear sample (same as GF110, 48:1 for GF114)
up to 1024 flops per sample with 16xAF (same as GF110, 768:1 for GF114)
up to 2048 flops per sample with 64bit textures. (GF110 is 1024:1, GF114 is 768:1)

trinibwoy · Jun 13, 2011

Thowllly said:
I don't. AFAIK, it has 16 ALUs per texture sampler, thats
32 flops per bilinear sample (same as GF110, 24:1 for GF114)
64 flops per trilinear sample (same as GF110, 48:1 for GF114)
up to 1024 flops per sample with 16xAF (same as GF110, 768:1 for GF114)
up to 2048 flops per sample with 64bit textures. (GF110 is 1024:1, GF114 is 768:1)

Exactly, on paper it's the same as GF110 but in practice it's going to be lower due to lower ALU utilization. Unless texturing on Cayman is extremely inefficient it just has too many units. The 6970 has twice the texturing capacity as a 570 and is only on par performance wise. If they keep this ratio then even more transistors will be wasted doing texturing on SI. Note that the 570 also has lower numbers for bandwidth, fillrate and flops.

I wouldn't be surprised if full speed FP16 filtering makes its debut as well so that 64-bit ratio can potentially come down too.

Man from Atlantis · Jun 15, 2011

DKrwt David Kanter
Next AMD GPU is much more programmable. no VLIW, real L1 and L2 caches, better branching, etc.

just seen this at twitter

trinibwoy · Jun 15, 2011

Wow nice! It will be a lot of fun to see how they tackle those problems and see whether they can do it more efficiently than nVidia has managed to.

Wonder how that tidbit got out. Did some good wine loosen tongues at the dinner?

Alexko · Jun 15, 2011

I personally find that very odd, so soon after moving from VLIW5 to VLIW4, which must have been quite time-consuming.

psurge · Jun 15, 2011

Somewhat more detail here: http://www.realworldtech.com/forums/index.cfm?action=detail&id=120411&threadid=120411&roomid=2

What exactly is out-of-order resource allocation?

Man from Atlantis · Jun 15, 2011

my inner voice tells me it is 8000 series not 7000s..

Kaotik · Jun 15, 2011

Man from Atlantis said:
my inner voice tells me it is 8000 series not 7000s..

Or could it be just that the 32nm cancellation, which caused 6000-series delay (IIRC 6 months was at least called somewhere), made them scratch the "original 7000" and start rushing "original 8000" series as "7000s"?

wishiknew · Jun 15, 2011

http://www.pcper.com/news/Editorial/AMD-Fusion-Developer-Summit-2011-Live-Blog is that it? From 6:50 and on?

trinibwoy · Jun 15, 2011

Wonder if they'll stick with the precompiled clause approach and avoid the scheduler and scoreboarding overhead. Could be the best of both worlds.

This makes complete sense to me. A 64-wide SIMD could run the same scalar instruction on 64 threads/pixels/vertices with the same bandwidth requirements as today's 16-wide VLIW4 and gain higher efficiency in the process. The challenge is branch granularity, they would need to process a wavefront in a single cycle instead of 4.

Or maybe each SIMD is only 16 wide with 4 of them executing 4 different wavefronts in parallel. Very similar to an nVidia SM but with potentially much lower control overhead if they don't do hardware instruction scheduling.

wishiknew · Jun 15, 2011

Still kinda amazed no one posted those slides from pcper.

trinibwoy · Jun 15, 2011

Ah yes just saw that. So no more clauses. They seem to be embracing a lot of things nVidia has been preaching for years. Guess it'll come down to who has the best implementation.

fellix · Jun 15, 2011

No vliw, just multiple issue simd
Branch, scalar, vector, vector memory, export units
4x16 wide vector ALUs

So, AMD is finally getting rid of the static scheduling?

AMD: Southern Islands (7*** series) Speculation/ Rumour Thread

Sinistar

I LIVE

Squilliam

Beyond3d isn't defined yet

Alexko

rpg.314

trinibwoy

Meh

trinibwoy

Meh

DarthShader

Thowllly

trinibwoy

Meh

Man from Atlantis

idk

trinibwoy

Meh

Alexko

psurge

Man from Atlantis

idk

Kaotik

Drunk Member

wishiknew

trinibwoy

Meh

wishiknew

trinibwoy

Meh

fellix

Similar threads