AMD: Speculation, Rumors, and Discussion (Archive)

Status
Not open for further replies.
As interesting as watching slow motion crash test videos!
That's a possibility, but I wanted to be open to other interpretations.

I saw those numbers elsewhere, but I think they're for mobile versions. They can't possibly have decided to value power efficiency that much over absolute performance.

I suppose there's the fully-enabled X variant not in the rumor.
At the very least, what rumored TDP would one get if Hawaii were put on a 2x more efficient process and its memory bus cut in half?
Perhaps power isn't the only thing being economized in that scenario?
From an IP standpoint it would look like Tonga scaled to Hawaii's general unit dimensions, save perhaps the width of the bus.

It seems like we have at least one example of a modestly evolved architecture making the jump from 28nm to 14nm and finding room for a MHz.
Perhaps Polaris will be ~1 GHz in base clock terms? Although even then, it's not the improvement seen by Pascal in terms of base clock even with the 290X's early-on clock inconsistencies as a baseline.

Just what are those little GCN pipe stages doing, I wonder?
Is it safe to assume the F04 for the architecture is notably higher for that architecture? What would you have to do or not do when given maybe not amazing metal scaling even from 28nm to 20nm and the generally unambiguous massive improvement in leakage and performance of FinFET in this range, and not budge the clock needle?

On the other hand, AMD has claimed over and again that Polaris' scope is power efficiency. Vega may be optimized for higher clocks, whereas Polaris was made for keeping that power consumption as low as possible (and easier transition to laptops).

Polaris is also meant to lower the price floor of affordable VR, or at least one of SKUs needs to match the 290X. That doesn't entirely align with the power efficiency goal, particularly if there's a "you need to spend a few Watts in a few non-ideal cases instead of throttling because people will vomit" condition. ALU-wise, Polaris 10 seems to be dropping right at the lower bound of acceptable VR AMD has introduced. With half-width bus and the allegedly similar (peak) clocks, how it consistently meets or exceeds a 290X so that we don't see a lower but nauseatingly uneven floor for affordable VR will be interesting. There are interesting architectural possibilities, but I can think of several PR games that could be played with the numbers so far.

It's nice to see a that a Hawaii-class performance level numbered as a *80-tier product, the last product generation didn't do that.
 
R9 490(X) 300mm2 Vega 11?

There's gonna be quite a price gap if they don't lower Fury(X) price to 390(X) level.
I'm still wondering if a full Polaris 10 with HBM1 could be a 490 with the X variant coming with Vega. Goes a long way to cover that price/performance gap and provides the bandwidth it seems to be lacking. That should easily bridge the gap until Vega can replace the high end parts along with providing a 490X.

With half-width bus and the allegedly similar (peak) clocks, how it consistently meets or exceeds a 290X so that we don't see a lower but nauseatingly uneven floor for affordable VR will be interesting.
AMD has said Polaris is compatible with GDDR5/X/HBM1/2, but we haven't seen the latter. Only that Vega is HBM2. If clocks scale well with FINFET, HBM1 should get it there. They are definitely keeping that one quiet if that's the case.
 
Did AMD really say that Polaris can have either GDDR5 or HBM? That'd be very surprising, TBH.
AMD said:
“AMD helped lead the development of HBM, was the first to bring HBM to market in GPUs, and plans to implement HBM/HBM2 in future graphics solutions.
At this time we have only publicly demonstrated a GDDR5 configuration of the Polaris architecture.It’s important to understand that HBM isn’t (currently) suitable for all GPU segments due to the current HBM cost structure. In the mainstream GPU segment, GDDR5 remains an extremely cost-effective, efficient and viable memory technology.”

http://wccftech.com/amd-future-gpus-named-stars-galaxies-company-bids-farewell-island-code-names/
Maybe the site got it wrong, but they explicitly said Polaris was compatible with all the modern memory standards. Only "publicly" demonstrating Polaris with GDDR5 seems odd wording from an official statement. Maybe GDDR5X was the alternative, but the context was about HBM.
 
I'm still wondering if a full Polaris 10 with HBM1 could be a 490 with the X variant coming with Vega.

AMD has said Polaris is compatible with GDDR5/X/HBM1/2, but we haven't seen the latter. Only that Vega is HBM2. If clocks scale well with FINFET, HBM1 should get it there.

Based on what I know of GDDR5 and HBM, there is basically no way of making a single controller that works for both. The only way to make a chip that can use both is to have extra controllers that are just turned off when not using that memory bus. I don't think that would make any sense.

On the other hand, GDDR5X was designed as something that would allow using a single controller design for both it and normal GDDR5. So instead of Polaris 10 being HBM/GDDR5, maybe it's gddr5x/gddr5.
 
Based on what I know of GDDR5 and HBM, there is basically no way of making a single controller that works for both. The only way to make a chip that can use both is to have extra controllers that are just turned off when not using that memory bus. I don't think that would make any sense.
In this context, it's important to separate controller from IO block. The former being the scheduler that orders the transactions for optimal efficiency, the latter being the IO interface itself (high speed sampling logic, training, impedance control, ...)

The controller could easily be used for both HBM and GDDR5, the difference would be in the IOs. So the extra area isn't as much as you'd initially think. But it's still a big waste to have both on the same die.
 
In this context, it's important to separate controller from IO block. The former being the scheduler that orders the transactions for optimal efficiency, the latter being the IO interface itself (high speed sampling logic, training, impedance control, ...)

The controller could easily be used for both HBM and GDDR5, the difference would be in the IOs. So the extra area isn't as much as you'd initially think. But it's still a big waste to have both on the same die.
What if they're not technically on the die? Certain elements on the interposer instead to allow some configuration? It might be practical as a Zen MCM or bridge for multiple chips as well. That "publicly" part still seems odd to me, differentiating between GDDR5 and GDDR5X as they're very similar. You'd expect the latter to be easily supported as that's how they were designed.
 
Going back to the power gating discussion - I thought that FinFETs dramatically reduced leakage power relative to planar transistors. Is leakage still a large enough factor that doing this at a fine grain for just ALUs (assuming I've understood the proposal correctly) is worth a lot vs clock gating?
 
They can create the lower end Polaris for GDDR5 only, the higher one with GDDR5 and HBM support and Vega with HBM only support.
If a chip can (or targeted to) support both mid and high end, then it wouldn't be strange for the chip to support both instead of making 2 types of the same chip.
 
The known Polaris SKUs are all serving the mid to low end of the market. An active interposer would make it a very costly product.
Low end probably doesn't need the interposer. There might be some applications for HBM at the low end though. That interposer might only be required for HBM configurations. As much compute as these chips are allegedly pushing, I can't imagine the currently available memory tech will provide enough bandwidth. Not without more than 128/256 bit busses respectively or some sort of memory compression for compute.

Going back to the power gating discussion - I thought that FinFETs dramatically reduced leakage power relative to planar transistors. Is leakage still a large enough factor that doing this at a fine grain for just ALUs (assuming I've understood the proposal correctly) is worth a lot vs clock gating?
They do, but if you were ramping voltage to drive up clocks as suggested in the patent it would probably be worth it. That might include per SIMD RF as well. AMD seems to be going crazy with the concept, so it must be doing something.
 
Low end probably doesn't need the interposer. There might be some applications for HBM at the low end though. That interposer might only be required for HBM configurations. As much compute as these chips are allegedly pushing, I can't imagine the currently available memory tech will provide enough bandwidth. Not without more than 128/256 bit busses respectively or some sort of memory compression for compute.
I thought the problem would be that the pitch and density of balls are different enough for TSV and normal package substrate. Unless you are willing to commit to the lowest common factor (could mean larger pads), and have foundries willing to make a customised flow for your special "mixed-use" GPUs.

TBH what's the problem of having Polaris solely on GDDR5/X anyway? HBM was expected to scale from the top at the beginning...
 
To say that an architecture can support both HBM and GDDR memory doesn't mean there's any particular actual silicon that does support both at once. Just that the base architecture can support both.

I wouldn't be surprised if polaris could be tweaked to accept a punch card memory storage-based system. What an architecture entails is a fairly abstract term, after all. :)
 
What kind of computing modules could they put on substrate itself? So far they use it only as a dumb small "single layered" motherboard.

Could they put memory controller directly on a substrate?
 
They can create the lower end Polaris for GDDR5 only, the higher one with GDDR5 and HBM support and Vega with HBM only support.
If a chip can (or targeted to) support both mid and high end, then it wouldn't be strange for the chip to support both instead of making 2 types of the same chip.


Does Polaris warrant the bandwidth that HBM will give for the performance it will have? I mean come on, if its maximum a midrange chip, it won't need 500 gb/sec or more would it? If it doesn't need it then making the same version of the chip for HBM makes no sense financially or functionally.

Will AMD make a lopsided chip that has huge amounts of bandwidth that isn't usable, we know they are making front end changes to improve throughput in their chips, and they don't want to get into a Fiji situation where HBM was wasted for the extra cost it incurred.
 
I agree. With GDDR5x on a 256-bit bus, you can still achieve around 400GB/sec BW. 400 GB/sec of HBM isn't going to make the GPU any faster if it's not BW bound already. Even on a 384-bit bus, you can get over 600GB/sec with GDDR5x so I don't think you really need HBM until the GPU has enough compute needs until it's bound by ~600GB/sec.

I think eventually will become more economical to use in all product lines, and we'll have to, as performance demands its, but I don't see that in Polaris's life time.
 
Does Polaris warrant the bandwidth that HBM will give for the performance it will have? I mean come on, if its maximum a midrange chip, it won't need 500 gb/sec or more would it? If it doesn't need it then making the same version of the chip for HBM makes no sense financially or functionally.

Will AMD make a lopsided chip that has huge amounts of bandwidth that isn't usable, we know they are making front end changes to improve throughput in their chips, and they don't want to get into a Fiji situation where HBM was wasted for the extra cost it incurred.
Especially for notebooks in very small form factors, even a single HBM stack could be sufficient, yielding 128 GB/s and saving massive amounts of space. There's not rule that you have to have 4 HBM modules attached to each and every GPU. So yes, especially if a Polaris variant was targeted for mobile, HBM could look very compelling - if they can get the pricing thing fixed.
 
That is a possibility, but costs won't be good for that, cause lets say 2.5 performance per watt increase, is what they are getting in all circumstances for Polaris. I see this card/GPU being slightly faster than Hawaii with a power consumption of 120 watts, that seems to fit in line with what they are going for and they should be able to use GDDR5 for memory to hit that. So if they want to drop power consumption more they can go to HBM, but as of right now they have been saying 100-130 watts.......

Now given the fact nV will definitely be able to hit the same power envelope possibly even lower at the same performance using GDDR5, I think the cost in margins going to HBM won't be a good idea to either of the companies in this category.
 
Status
Not open for further replies.
Back
Top