AMD: Speculation, Rumors, and Discussion (Archive)

Status
Not open for further replies.
A single CU should be able to do 64 FMAs per clock, so it should be able to read 3 operands per ALU and write one result. That's 4×32 bits per ALU, or 16 bytes.
Yes. Of course :).

I usually calculate my read and write BWs separately, so I just concentrated on the write (register update).
Is this a comparison between code optimized for an in-order pipeline with a significant amount of loop unrolling, and one that assumes the hardware can do so via register renaming?
In terms of how register consumption appears relative to the same code sequence, register renaming can only increase consumption. At a minimum, the last non-speculative architected state is recorded in addition to any speculative registers. As far as the hardware goes, the fixed thread count and fixed register files leave plenty of slack.
Aside from the 8-16 of each register type in an x86 case, per thread, there can be tens to over a hundred registers in the register file that can only be used or wasted by 2 or possibly 4 contexts.
Yes, renaming can only increase the consumption. However, with renaming the (named) architecture register count can be kept much lower than otherwise would be needed. Extra registers can be used in a flexible way to hide the latency (pipeline, memory, hyperthreading, rename loops, etc) whenever needed.

You could combine the GPU static shader register allocation idea and the architechure (named) register count. Shader compiler could decide the minimum need of registers and rest could be used for renaming (and/or running more waves).
 
Hmm, is this true:

Dont forget that Nvidia were working on HMC (Hybrid Memory Cube) and that turned out to be a complete failure, so nvidia brushed it under the rug and pretended nothing ever happened.

A quote from the comments.
I have never heard this before, and I don't think there is any evidence to support this.

Intel has gone on record supporting HMC, but I don't think Nvidia has.
 
I don't know why Nvidia would have even considered HMC, since it looks like this tech is more geared to dedicated industrial implementations.
 
HMC is a more fully-featured and expandable memory type, in terms of where it can be added to a system, how it can be set up, and various RAS measures. At least for HPC, HMC's bandwidth, better density than HBM, ability to be mounted on an interposer or not, larger potential number of stacks, and more tractable interface--although that is putting more control in the hands of Micron--would seem to be a tempting target for Nvidia.

If one was paranoid about HBM and interposers being too constrained in area, capacity, or interference from AMD, those would be bonuses.
 
Last edited:
Do you think this is a valid concern?
I put that in there for completeness, given the rumor mill. There is nothing that makes shenanigans impossible, but it would be a limited duration threat of a survivable inconvenience. There are a fair number counter-pressures and ways to work around it.
Hynix and the partners involved in creating the interposers and packages want volume, and either AMD runs the risk of overextending its purchases of HBM or HBM volume in general remains uncomfortably small enough that AMD can afford to significantly interfere with it. That seems self-correcting by increased production to meet demand or AMD running out of gas.
In isolation I wouldn't think it a deterrent without other factors like the ones prior to it being significant.

That HBM is what Nvidia is going to means those concerns may not be as big or there are concerns about the alternatives that are not immediately clear.
 
There are a fair number counter-pressures and ways to work around it.

If AMD decides that they need and wanna 100% of the SK Hynix memory production capacity, the competition has nothing more to do than eating their fingernails.

AMD won't make a mistake in doing so, because obviously they would need those memories, even if not immediately, they can keep them in warehouses waiting. :D
 
If AMD decides that they need and wanna 100% of the SK Hynix memory production capacity, the competition has nothing more to do than eating their fingernails.
Let's assume you mean HBM capacity, and not the $3.6 billion dollars per quarter Hynix has made with its general DRAM sales.
Is the assumption that HBM production is that limited, a year or more into manufacturing?
AMD would need to pay memory it's buying, with the upside that the more it pays the more Hynix will decide to increase the manufacturing rate.
AMD would then need to buy even more the next production period.
How long is that going to keep up?

AMD won't make a mistake in doing so, because obviously they would need those memories, even if not immediately, they can keep them in warehouses waiting. :D
It's generally not how current logistics chains usually handle this sort of thing, but let's assume this is the case.
AMD would be buying for the purposes of cornering the HBM market more DRAM than it needs, at the point that HBM's prices are highest even with a best-buddies discount from Hynix.
When AMD finally stops buying, the product would have matured more and everyone else will enjoy the high point of the price curve they were excluded from.

Nvidia's one announced HBM SKU might have a paper launch for a quarter, then its margins would be even better.
AMD's inventory of hundreds of thousands or millions of HBM stacks is now some percentage discounted.
We'll call it a one-time charge.
 
HMC is a more fully-featured and expandable memory type, in terms of where it can be added to a system, how it can be set up, and various RAS measures. At least for HPC, HMC's bandwidth, better density than HBM, ability to be mounted on an interposer or not, larger potential number of stacks, and more tractable interface--although that is putting more control in the hands of Micron--would seem to be a tempting target for Nvidia.

If one was paranoid about HBM and interposers being too constrained in area, capacity, or interference from AMD, those would be bonuses.
HMC has terrible pJ/bit, so it is not acceptable for a GPU.
 
HMC has terrible pJ/bit, so it is not acceptable for a GPU.

The numbers for standard HMC put it at 1/3 the pJ/bit of DDR4, which doesn't sound terrible.
Short-reach for HMC and even more so for HMC gen2 is expected to be lower.
This is coupled with features like notably higher stack density versus HBM1, no interposer requirement, DRAM ECC and link error detection, link failover, daisy-chaining, and a path to optical. For a graphics, this is probably too much, but for HPC, it's a start.

Knight's Landing uses some kind of modified HMC on-package memory with 5x DDR4's power efficiency, 16 GB capacity, and no silicon interposer.
 
Err.... source?
A friend who designs GPUs told me 2 years back that HMC pJ/bit was unacceptable, which is the primary reason GPUs will use HBM.

I realize this is only hearsay, and maybe things have changed in the past couple years.

But so far the evidence supports this idea. Intel using HMC means very little, since their volumes are essentially 0.

Both AMD and Nvidia are going with HBM instead.
 
Its 2 different technology with different benefits.... im not sure trying to compare them is so much easy.
 
AMD's CEO Lisa Su confirmed that the graphics products in 2016 would be transferred to a more advanced FinFET process.
They even said that they already have engineering samples of some products on this process. Left to see whether they have an Arctic Islands GPU or something else.

They also say that they are happy for now with yields.
 
AMD's CEO Lisa Su confirmed that the graphics products in 2016 would be transferred to a more advanced FinFET process.
They even said that they already have engineering samples of some products on this process. Left to see whether they have an Arctic Islands GPU or something else.

They also say that they are happy for now with yields.

Indeed, by next year Finfet should reach acceptable yields for larger dies, they're already in manufacturing (and even release in the case of the Galaxy S6) for smaller mobile SOC dies. And judging by TSMC's claims of a "70% increase in power efficiency" for their finfet and AMD's own claims of "Doubling power efficiency" for Arctic Islands (see earlier in the thread for sauce) I'd say a quick check of corporate speak gives Arctic Islands an 18% "increase in energy efficiency".

This of course assumes AMD means doubling power efficiency taking into account both the node power saving and whatever architecture improvement they've got, as well as the move the HBM, and assumes it means improvement over the Fury X. Not an unreasonable set of assumptions given previous corporate speak records. AMD should probably survive until next year, though in what shape I don't know. I base this off the Fury X being entirely sold out everywhere last time I checked. Whatever it is their yields are on that chip it's still a good sign for their Q3, the bad Q2 not withstanding.
 
This of course assumes AMD means doubling power efficiency taking into account both the node power saving and whatever architecture improvement they've got, as well as the move the HBM, and assumes it means improvement over the Fury X. Not an unreasonable set of assumptions given previous corporate speak records. AMD should probably survive until next year, though in what shape I don't know. I base this off the Fury X being entirely sold out everywhere last time I checked. Whatever it is their yields are on that chip it's still a good sign for their Q3, the bad Q2 not withstanding.


They were asked that question directly about sales of Fury and Lisa Su diverted the question and answered with HBM yields are progressing well. That doesn't sound like its good for them from a sales perspective.
 
“Greenland” will be AMD’s first graphics processing unit based on the all-new micro-architecture, which development began a little more than two years ago. While the architecture is currently known as another iteration of GCN, the new ISA [instruction set architecture] will be so considerably different compared to the existing GCN that it has every right to be called “post-GCN”, the source said. It is likely that the “Greenland” will retain layout of the contemporary AMD Radeon graphics processing units, but there will be significant changes in on the deeper level.


It is believed that AMD has already taped-out its “Greenland” graphics processing unit and is about to get the first silicon in the coming weeks.

http://www.kitguru.net/components/g...gpus-for-2016-greenland-baffin-and-ellesmere/
 
Status
Not open for further replies.
Back
Top