AMD: Southern Islands (7*** series) Speculation/ Rumour Thread

That is what I have found: :mrgreen:


Radeon HD 7990 Specifications
  • Dual graphics processing units
  • Compute Power: 12.16 TFLOPS (single-precision), 3.04 TFLOPS (double-precision)
  • Process Technology: First-generation "gate-first" 28 nm High Performance Process
  • Stream Processors: 6400
  • Texture Units: 256
  • Texture Fillrate: 243.2 GTexel/s
  • Color ROP Units: 96
  • Pixel Fillrate: 91.2 GPixel/s
  • Z/Stencil ROP Units: 384
  • Memory Bus Width: 384-bit
  • Memory Type: 6 GB GDDR5+
  • Memory Bandwidth: 576 GB/s
  • Bus Interface: PCI Express 3.0
  • Board power (Idle/Load): 30W/300W
  • Form Factor: Dual Slot
  • Cooling Solution: Fansinks with Vapor Chamber
  • Graphics API:
    • DirectX 11
    • OpenGL 4.2
    • OpenGL ES 2.0
  • Accelerated Parallel Processing:
    • DirectCompute 11
    • OpenCL 1.1
  • AMD EyeSpeed: Yes
  • AMD Eyefinity: Yes
  • AMD HD3D: Yes
  • Unified Video Decoder 4.0:
    • Multi-View Codec (MVC)
    • HEVC
    • MPEG-4 AVC/H.264
    • VC-1, VC-2
    • MPEG-2
    • DivX/XviD
    • WebM
    • Flash
  • AVIVO HD: Yes
  • Display Outputs:
    • Dual-link DVI
    • Mini DisplayPort
    • HDMI 1.4
  • AMD CrossFireX: Quad GPU Scaling
  • AMD PowerPlay: Yes
  • Ultra Low Power State: Yes


http://www.semiaccurate.com/forums/showthread.php?p=84917#post84917

I'd like to see your comments about it. And if there is any information of the expected pipe cleaner @28 nm.


I have also found this:


High K Metal Gate (HKMG) Solutions for 28nm Technologies Introduction ==> http://www.youtube.com/watch?v=tQlZnhYrz5w

High K Metal Gate (HKMG) Performance, Cost, Die Size and Design Compatibilty ==> http://www.youtube.com/watch?v=Syl2pEEEu6c

Ability to Ramp & Time-to-Volume and Manufacturability & Reliability ==> http://www.youtube.com/watch?v=49TVR_ktgjE

But no news about 28nm GPU ! :/ After Cayman we want news about 28nm GPU !!! :D


From the same thread:

http://www.semiaccurate.com/forums/showthread.php?t=3825&page=10
 
That is what I have found:

Aren't those just made up numbers? I don't think it even counts as speculation. The rates and an implied 950 MHz clock are consistent with the unit counts, but the unit counts are not consistent with each other within AMD's current batch and SIMD constraints.

Maybe if we're lucky, Fudzilla or its ilk will print it as a rumor next year.
 
Ahem, is that supposed to be a x2 card???
The specs don't make sense, not even in theory. For a single card, you can just about forget the SP count imho, and for the dual card the 384bit interface doesn't make sense (though the memory bandwidth quoted would indicate 2x384bit probably).
I think it would make more sense speculating about the single chip solutions first.
For these I take 2 things for granted:
- pcie 3.0 (according to anandtech, this was planned for Cayman already, and in any case graphic cards always adopted new pcie specs quickly)
- VLIW-4 units. AMD wouldn't have gone from VLIW-5 to VLIW-4 just for Cayman if they didn't intend to keep the units for a little while longer. And hence the one-fourth DP to SP ratio should be true too (for MUL/FMA at least).

If AMD is aiming for a small chip, my guess would be about this:
- 2 graphic engines (AMD notes these scale separately from the shader engines, so this should be possible, but maybe 4 is a better number to at least catch up with GF110...)
- 4 shader engines (dispatch processors) with 8-10 simds each - that would be 2048-2560 SPs

On the ROP/mem interface no idea. I think there's some problem scaling memory bandwidth with gddr5 in that timeframe to a lot more. OTOH a 384bit memory interface has no place on a small (< 300mm²) chip really. Also 32 ROPs offers "enough" color fill rate - if anything maybe they could be beefed up to handle twice the z/stencil rate, so I'd guess AMD will stick to 256bit (with the fastest gddr5 memory they can get) and 32 ROPs. With naturally 2GB of memory.

Other than that, I'd expect it to scale better to higher simd counts (otherwise that increase would be useless), hence front-end or other bottlenecks (cache hierarchy, internal bandwidth, or whatever these are on current gen) to be addressed.
It could also have reworked simds - I'd expect AMD to stick to VLIW-4, but the simds could be grouped differently, so 2 simds share a TMU (as seen in the patents Jawed quoted). (Though if that's the case I would expect the tmus to have full-rate FP16 filtering - also for 32 simds this would only give 64 tmus which sounds like it might not be quite enough, might make more sense if there are 40 simds).

This is of course pure speculation which might be off pretty far. So with all the rumblings about 28nm delays, when exactly do we expect HD7xxx? Do we even expect it on 28nm at all?
 
My wish-list for the S.I. architecture:

* Keep the "sweet spot" strategy on 28 nm tech, i.e. a die size equal or less than Cypress';
* 30~36 SIMD multiprocessors on VLIW4 format;
* Three SIMD blocks (10~12 MPs each, see above), each one with dedicated front-end (geometry assembly, tessellator HiZ, etc.), similar to Cayman;
* Cached global memory via L2 with coherent reads&writes (finally!);
* Double the Z/Stencil throughput;
* ...can I ask for XDR memory interface (d'oh!) :p

But to be honest, this is as far as the evolution could drive the R600 architectural legacy. Still too much graphics centric, IMHO.
Parallel geometry processing and GPGPU is still not as "organic" part of the architecture as Fermi's approach. For this, I think, AMD probably must go for a new fresh direction in the future, not just piling and patching over and over.
 
Hey, how about Fusion? :)
1-2 CPU cores with doubled Cayman. How big it would be?
Some driver workload could be offloaded to GPU.
 
That is what I have found: :mrgreen:


Radeon HD 7990 Specifications
  • Dual graphics processing units
  • Stream Processors: 6400
  • Texture Units: 256
Trivially fake, SP and TMU counts don't match.

There should always be one TMU/16 SP's on that architecture.

(one processor core has 16*4 = 64 SP's , and 4 TMU's)
 
IMO a 28nm Barts would be the best bet for a 28nm pipe cleaner.
Should stay around 150mm2 with a 128bit bus. What's the over/under on 2H '11?

I wouldn't mind seeing a 28nm bart as the 76x0 line and a 28nm cayman in the $100+ price point.
 
IMO a 28nm Barts would be the best bet for a 28nm pipe cleaner.
Should stay around 150mm2 with a 128bit bus. What's the over/under on 2H '11?

Yea i would agree with that. Barts with 128 bit mem controller(and maybe an increase to 1280 Shaders) would be a good choice to go with for a pipe cleaner. IMO we can even expect it in Q2 2011 as it is meant to be an early part, followed by the rest of the parts in Q3/Q4

I wouldn't mind seeing a 28nm bart as the 76x0 line and a 28nm cayman in the $100+ price point.

Assuming 60% scaling, 28 nm cayman will be ~230 mm2. They may be pad limited so they may increase die size a bit like they did with RV770. Would probably be priced at around Cypress' launch prices as 28nm wafers wont be cheap initially.

The limiting factor for the next gen will probably be GDDR5 speeds i think. According to AnandTech we aren't going to see speeds greater than 6 gbps even though the standard was desiged to go to 7 gbps.
 
Yea i would agree with that. Barts with 128 bit mem controller(and maybe an increase to 1280 Shaders) would be a good choice to go with for a pipe cleaner. IMO we can even expect it in Q2 2011 as it is meant to be an early part, followed by the rest of the parts in Q3/Q4



Assuming 60% scaling, 28 nm cayman will be ~230 mm2. They may be pad limited so they may increase die size a bit like they did with RV770. Would probably be priced at around Cypress' launch prices as 28nm wafers wont be cheap initially.

The limiting factor for the next gen will probably be GDDR5 speeds i think. According to AnandTech we aren't going to see speeds greater than 6 gbps even though the standard was desiged to go to 7 gbps.


What about the so mentioned by Charlie GDDR5+? What is it? When should we see anything faster like GDDR6, or XDR, or whatever?
And also, although the wafer prices may be high enough initially, they form only a part of the equation for the final street price. So I don't think that may serve as an excuse for the current overpriced products and the future ones.
 
So any word as to whether they are likely to go Global Foundries 28nm because im pretty sure they have a contract which requires them to source X quantity of their GPUs from that foundry or whether they will likely stick with TSMC 28nm? It'll be an interesting question to find out which 28nm process is better and whether either of them can reach any sort of volume on that process in 2011.

P.S. Of course my fantasy is that we get a 28nm pipecleaner part in Q2 of next year. :)
 
So any word as to whether they are likely to go Global Foundries 28nm because im pretty sure they have a contract which requires them to source X quantity of their GPUs from that foundry or whether they will likely stick with TSMC 28nm? It'll be an interesting question to find out which 28nm process is better and whether either of them can reach any sort of volume on that process in 2011.

P.S. Of course my fantasy is that we get a 28nm pipecleaner part in Q2 of next year. :)

I believe the part in bold is correct. Whether AMD decides to use GloFo's process for low-end or high-end parts is the big question…
 
I believe the part in bold is correct. Whether AMD decides to use GloFo's process for low-end or high-end parts is the big question…

Does the foundry where they intend to produce fusion products count at all in this decision? Would you call Llano a mid range or low end product? If so I would expect them to keep their CPU + GPU fusion products and low end GPUs on the same foundry. Also TSMC has more experience producing relatively larger die GPUs on cutting edge process nodes.
 
Does the foundry where they intend to produce fusion products count at all in this decision? Would you call Llano a mid range or low end product? If so I would expect them to keep their CPU + GPU fusion products and low end GPUs on the same foundry. Also TSMC has more experience producing relatively larger die GPUs on cutting edge process nodes.

I don't think Fusion products count, at least that's not the impression I got from the way AMD (Dirk Meyer, if I recall correctly) presented things, but I haven't read the contract, which is probably confidential.

I guess TSMC does have more experience making big chips… then again, is 40nm experience really all that relevant to 28nm challenges? Plus, GloFo has been making Istanbuls (~350mm² I believe) with far higher yields than TSMC has been making GF100s, so… Plus, GloFo is currently making 32nm HK/MG Llanos and Bulldozers for AMD, while TSMC has yet to demonstrate their ability to make anything below 40nm and with HK/MG.

Actually, I'd expect AMD to just give the high-end to the foundry with the best process, and the rest to the foundry with the cheapest one, unless of course there's a big performance or time-to-market difference.
 
I believe the part in bold is correct. Whether AMD decides to use GloFo's process for low-end or high-end parts is the big question…

GloFo have zero experience on GPU parts, and there's no proof that they can get the density of what TSMC is getting.

So no, it's not really a big question. GloFo will be contracted with mid-low end parts or direct die shrink of current parts (a la 4770), probably both.

If they really need the pipe cleaner, I'd say a 12SIMD+/128bit "Barts" (~150mm² and/or a 20SIMD+/256bit "Cayman" (~200mm²) sounds about right.
They are both large enough to be real pipe-cleaners yet not too risky. They are both sensitive to cost and power but reduced R&D should be more than enough to make up the risk involved.

I'd say since Fusion/SB would have already been on shelves, a sub 100mm² part (8SIMD) would have to be very compatitive to survive, better leave it to the king of cost down. TSMC will also be contracted with a high end part (>300mm²) possibly another lower part (150~200mm²) if GloFo only got one part.
 
Back
Top