AMD: Speculation, Rumors, and Discussion (Archive)

Status
Not open for further replies.
High end 6-8cores, are running allready in the 130W ( as is my 4930K ) , add a simple gpu with 150W and you are allready in the 250-300W for APU.

My cpu overclocked, surely hit easely the 250W ( bios of my board by defaut have a 1000W power limit for the CPU alimentation, and the next setting is 10'000W ( useless marketing ofc ).
 
High end 6-8cores, are running allready in the 130W ( as is my 4930K ) , add a simple gpu with 150W and you are allready in the 250-300W for APU.

My cpu overclocked, surely hit easely the 250W ( bios of my board by defaut have a 1000W power limit for the CPU alimentation, and the next setting is 10'000W ( useless marketing ofc ).
Platform power management has to be taken account of though. Say the CPU has a different TDP when GPU is or is not in idle.

Anyway, computing the SOC TDP from AMD's figures (28W for four HBMs), the SOC TDP of Fiji allegedly with 64 CUs would be at around 140 watts. Assuming they would fit a GPU with Fiji's spec, there are still plenty of room for the rest of the SOC. Fuad's rumour suggests a 16 core CPU.
 
yeah, i had forget that we speak about HPC; server SOC, i can imagine they can be socketed in a blade ( like actual system with 2-4x CPU + 2-4 GPU ), cooling will be handled by cold air /water..

It could be really powerfull for HPC, workstation systems.. imagine 2x 16cores APU who hold each 2x gpu + 2-4 GPU's..

Lets call that, a really versatile system.
 
Since they are introducing a new SOC interconnect and how long their die stacking program has been running, I guess they would have been aware of the extensibility.
It would depend on whether AMD could treat the off-die links and the physical/electrical discontinuity introduced by the microbumps more like neighboring blocks on the same die or more like chips on an MCM.
The power/bit cost of data movement on die is still better than going off-die with an interposer. HBM is not 10x or more efficient than the DRAM it replaces, so the prospect of taking traffic that was once faster and more efficient and making it as efficient as HBM may constrain the unspecified performance upside to doing this.
I'm not sure if going this route implies a step back from the long-predicted convergence of the architectures, as their differences would be enshrined by distance once again.

For GPUs, if the GPU can make use of HBM, the cost is always there and a "normal" version is off the table. For CPUs, that's my question too - say if they would still give the single die part an interposer, make another non-interposer version, or use flip-chip bumps directly (but still 2.5D, so mixed bump size for the interposer...).

Perhaps if AMD actually doubled down on this, there would only be 2.5D chips with HBM links to a PHY driver component for off-interposer connectivity.

The interposer does present a set of coarse metal layers, which AMD is actually losing by going to generic foundry processes. Perhaps if it optimized towards this reality, something could be done to mate the chips more closely than the comparatively coarse microbumps we have now, and maybe some of the passive components or non-digital silicon could be moved onto or into the interposer, freeing up the primary die for other things.
I think there were some presentations for alternative schemes, I think Tezzaron presented something like using tungsten vias thanks to extreme die thinning. Whether that is better or can be worked economically for this, is a fair set of questions.

GPU in that range is often a really huge die... Anyway, one bullet point of 2.5D is to break down monolithic SOCs, and since the GPU is likely getting HBM, it seems a broken-up one is fairly natural move.
The 270X is a little below 200W, and Tahiti is 250W+.
Somewhere between 212mm2 and 360mm2 is the minimum footprint a chip made by AMD can break 200W with a monolithic ASIC.
This is assuming that there is a physical barrier like insufficient pad area for the required number of power pins, and that the 270X is a reasonable example of AMD's minimum.
 
Last edited:
This may be offtopic, but... how did Core 2 Quad work? It seems to have two distinct dies. The number of pins on the socket stayed the same. So was it something similar to two dies connected with an interposer?
 
This may be offtopic, but... how did Core 2 Quad work? It seems to have two distinct dies. The number of pins on the socket stayed the same. So was it something similar to two dies connected with an interposer?

If I remember correctly they were just connected by FSB, through the package—nothing particularly fancy, but it did the job just fine.
 
This may be offtopic, but... how did Core 2 Quad work? It seems to have two distinct dies. The number of pins on the socket stayed the same. So was it something similar to two dies connected with an interposer?

Back then, Intel CPUs still used a FSB. This was literally a bus, in the sense that in a multi-cpu system all the corresponding bus lines in all the cpus were just linked together, and cpus shared their use through some arbitration mechanism.

This allowed the solution that Alexko mentioned -- both the separate dies were just attached to the same pins.

Modern systems moved from shared busses to point-to-point links because as frequencies rose, the proportion of bus time lost to arbitration collisions rose, and the capacitance of the complex bus structure became more of a problem. BTW, The same shift is currently happening in ram, as DDR4 moves from a shared bus attached to multiple modules to point-to-point links and switches.
 
So with that 2x energy efficiency announcement of the Fury Nano over the 290X http://i.imgur.com/WiNvXWn.png which Jawed put in the 300 series thread, I wonder which GPU AMD's claims of doubling the perf/w a month or so ago were based on http://i.imgur.com/knqXeyx.jpg - 290X performance with no external power connector confirmed for Arctic Islands!?

If you want to get to no connector, it would require dropping below 75W.
It would take a bit more than the promised scaling to get the Nano's power down far enough, and there may be some fuzzy math since an 8-pin can take a card to 225W and AMD might be giving typical board power, rather than a ceiling at 175.
 
I'm 90% sure most of the energy efficiency gains Nano has over the other Furys is simply downclocking and downvolting a large die and going down that geometric curve for better efficiency.

The price you pay for this is manufacturing cost, so if they did the same for 14nm chips, don't expect them to be cheap.
 
"AI-100"
~500-550mm² @ 14nm GloFo
~6000-8000 Shader Cores
4096 Bit DDR HBM2
Hmm, should be completely off.
One would expect a massive 25 Billion transistor GPU Greenland with 8 000 - 10 000 unified shaders.

Hell yeah, I thought the whole point of ITRS roadmap is that each new process node is exactly SQRT(2)/2 =0.707 of the previous node, which results either 2x the number of transistors on the same die area, or 1/2 die area for the same number of transistors...

But even if 29% downscaling does not materialize, ~20% on each node transition still amounts to almost tripling the transistor count on the same die area, or the same number of transistors on 1/3rd of the die area.
 
FinFET processes use the same back-end as 20nm and are therefore barely denser. With the same die size, expect about twice as many transistors as you'd get on 28nm.
 
Last edited:
Hmm, should be completely off.
One would expect a massive 25 Billion transistor GPU Greenland with 8 000 - 10 000 unified shaders.

That would be correct if they were being a bit more truthful, marketing probably to say they're around Intel.

http://www.tsmc.com/english/dedicatedFoundry/technology/16nm.htm

"TSMC's 16FF+ (FinFET Plus) technology can provide above 65 percent higher speed, around 2 times the density, or 70 percent less power than its 28HPM technology. Comparing with 20SoC technology, 16FF+ provides extra 40% higher speed and 60% power saving."

http://spectrum.ieee.org/semiconductors/devices/the-status-of-moores-law-its-complicated

"as chip foundries prepare to roll out 14-nm and 16-nm chips, custom-made for smartphone makers and other customers, that will be no denser than the previous 20-nm generation."

So little if any density improvements, otherwise TSMC would've said their 16nm node would've offered around 1.6x the density of 20SoC.
 
Well, I understand. Then, let's lower our expectations from 25 B to 17-18 B transistors chip, which is not that bad either.

29m6qdw.jpg


http://pc.watch.impress.co.jp/img/pcw/docs/670/675/html/6.jpg.html
 
Advanced Micro Devices this week formally confirmed that it would not produce any of its chips using 20nm process technology at Taiwan Semiconductor Manufacturing Co. Instead, the company will focus on development of its chips to be made using various FinFET process technologies. As a result of the cancellation, the company took $33 million charge.

http://www.kitguru.net/components/c...y-cancels-20nm-chips-takes-33-million-charge/

With no new gpu architecture on the horizon, I hope they are quicker to market by a substantial amount to make up for their losses.

A boost to clockspeeds will be a godsend for GCN versus maxwell.
 
Status
Not open for further replies.
Back
Top