AMD: Speculation, Rumors, and Discussion (Archive)

Status
Not open for further replies.
The known Polaris SKUs are all serving the mid to low end of the market. An active interposer would make it a very costly product.

The problem I have with believing that Polaris chips will be for just the mid-low end of the market is that the shipping manifests show rather high prices for the supposedly Polaris cards.

Baffin is most likely confirmed as the smaller Polaris chip and on the shipping manifests as well.

16-Feb-2016 84733030 PRINTED CIRCUIT BOARD ASSEMBLY (VIDEO / GRAPHIC CARD)C98101 BAFFIN XT G5 4GB CHANNEL P/N 102-C98101-00 (FOC)

Above is for 48k while the other version is for 40k.

For comparison, this Fury X card shipped at 81k on the same day that 40k Baffin version was shipped as well.

1-Dec-2015 84733030 C880 PRINTED CIRCUIT BOARD ASSEMBLY (VIDEO GRAPHIC CARD)ATTACHED WITH COOLER MASTER HEATSINK P/N 102-C88001-00 (FOC) NOS 3 242,721 80,907

1-Dec-2015 84733030 PRINTED CIRCUIT BOARD ASSEMBLY (VIDEO/GRAPHIC CARD)P/N 102-C98001-00 (FOC) NOS 7 283,408 40,487

A more recent shipment of Fury x2 at 129k.

23-Feb-2016 84733030 PRINTED CIRCUIT BOARD ASSEMBLY (VIDEO GRAPHIC CARD) ATTACHEDWITH COOLER MASTER HEATSINKTOBERMOR P/N 102-C88801-00 (FOC) NOS 2 257,804 128,902

Then we have the C993 which is a whopping 111k.

1-Feb-2016 84733030 PRINTED CIRCUIT BOARD ASSEMBLY (VIDEO GRAPHIC CARD)P/N 102-C99398-00 (FOC) NOS 2 221,206 110,603

These are from the shipments from Canada to India,

https://www.zauba.com/import-printe...ode-84733030/fp-canada/ip-INHYD4-hs-code.html

Shipments from Hong Kong are more numerous and also show another C9xx card at 62k,

9-Mar-2016 84733030 PRINTED CIRCUIT BOARD ASSEMBLY FOR PERSONAL COMPUTER(VIDEO/ GRAPHICS CARD) P/N .102-C94402-00 (FOC) NOS 1 61,880 61,880

and a mysterious D00001 at 31k.

14-Jan-2016 84733030 PRINTED CIRCUIT BOARD ASSEMBLY FOR PERSONAL COMPUTER(VIDEO/ GRAPHICS CARD) P/N .102-D00001-00 (FOC) NOS 15 460,852 30,723

https://www.zauba.com/import-printe...-84733030/fp-hong+kong/ip-INHYD4-hs-code.html

There are some other C9xx models which are <20k from 2015 and appear earlier than Baffin samples in the Hong Kong manifest but they seem to be irrelevant for now.

All in all, Polaris isn't looking cheap in either flavour.
 
And are the prices of Zumba tell us about anything meaningful? Insurance rates, rupee to dollar conversion on particular dates, and any taxes imposed on the packages?
 
If AMD's prices on those shipping manifest are anything like most other companies, they mean very little.
The prices for a board with the same chip and the same configuration can differ a lot based on what they are supposed to be used for: the final product reference board can be really cheap, say $15, while a functionally identical development board that has extra measurement connector and mounting holes for a heating/cooling contraption costs $500 simply because they're very low volume one-off boards.

Polaris 10 is going to be a fine performer that may even reach Fury performance levels, but compared to other products of its generation it will occupy the same space as 28nm products that have a die size <250mm2: a mid-end high volume part for the cost conscious, in a segment with a lot of competition.
 
28-Dec-201584733030PRINTED CIRCUIT BOARD ASSEMBLY (VIDEO GRAPHIC CARD)C88202-00 FIJI NANO P/N:102-C88202-00 (FOC)CanadaHyderabad Air CargoNOS10266,80526,680
This was kind of interesting. Late December they decided they needed 10 Fiji Nano boards without chips? At 27k I'm assuming that's just a board at least. Guess they could be for Vega testing.

Will AMD make a lopsided chip that has huge amounts of bandwidth that isn't usable, we know they are making front end changes to improve throughput in their chips, and they don't want to get into a Fiji situation where HBM was wasted for the extra cost it incurred.
Why do we think Fiji bandwidth was wasted? We're only now starting to see compute heavy games that would really start using it. Fiji was also a part where some corners had to be cut to make everything fit. A change that would likely have reduced bandwidth consumption and slowed things down in many cases.

Even on a 384-bit bus, you can get over 600GB/sec with GDDR5x so I don't think you really need HBM until the GPU has enough compute needs until it's bound by ~600GB/sec.
IF it has a 384 bit bus. All the leaks so far are saying 128/256bit. If Polaris 10 is 384 and we've only seen the cut down versions then I'd agree. AMD still seems to think Vega will need 1TB/s.

Now given the fact nV will definitely be able to hit the same power envelope possibly even lower at the same performance using GDDR5, I think the cost in margins going to HBM won't be a good idea to either of the companies in this category.
Why is that a fact? We're also talking 16nm vs 14nm here in addition to what will likely be significant architecture changes as demonstrated in that patent. If the move to FINFET yields say 30% higher clocks and an architecture change allows them to boost say 50% higher like Nvidia already did the previous generation, that's potentially yielding a 95% increase in compute capability per shader. I'm going off the assumption nearly doubling compute capability will correspondingly increase bandwidth needs.

Above is for 48k while the other version is for 40k.

For comparison, this Fury X card shipped at 81k on the same day that 40k Baffin version was shipped as well.
More interesting is Baffin being the smaller Polaris 11, that would be half the price of a chip 4x it's size with HBM included. If those prices are even remotely close to realistic they are doing something really interesting there. Polaris 11 x2 with 4GB HBM MCM? Not sure why they wouldn't just use Polaris 10 which is suggested to be roughly twice the size. Would only make sense if the ratio of ROPs or TMUs was different and they had a strong interconnect.
 
Why do we think Fiji bandwidth was wasted? We're only now starting to see compute heavy games that would really start using it. Fiji was also a part where some corners had to be cut to make everything fit. A change that would likely have reduced bandwidth consumption and slowed things down in many cases.

Show me where Hawaii/ Maxwell 2 are ever bandwidth limited......

IF it has a 384 bit bus. All the leaks so far are saying 128/256bit. If Polaris 10 is 384 and we've only seen the cut down versions then I'd agree. AMD still seems to think Vega will need 1TB/s.

How many ALU's will Polaris 10 come with, that should tell ya right there, there has been rumors and leaks that are quite credible.

Why is that a fact? We're also talking 16nm vs 14nm here in addition to what will likely be significant architecture changes as demonstrated in that patent. If the move to FINFET yields say 30% higher clocks and an architecture change allows them to boost say 50% higher like Nvidia already did the previous generation, that's potentially yielding a 95% increase in compute capability per shader. I'm going off the assumption nearly doubling compute capability will correspondingly increase bandwidth needs.

If you are combining the transistor performance and power consumption drop and that will give you possible GPU performance, that is just wrong. Transistor performance is not GPU performance. And then you have to look at the less ALU's and if they start increasing clock speeds to equalize that, the power consumption goes up.

And if you want to look at what transistor performance, look at Intel chips, did they get 60% power drop and 30% increase in performance? Neither of those did happen, definite not the 30% increase in over all performance even in best cases and definitely not a 60% drop in power either in best cases. That should tell ya transistor performance isn't the same as chip performance.
 
Last edited:
If you are combining the transistor performance and power consumption drop and that will give you possible GPU performance, that is just wrong. Transistor performance is not GPU performance.
How is that wrong? That's literally what the fab producing the chips and AMD have publicly stated... not to mention how transistors work. I have clearly stated those benefits are separate from any architecture changes. Final GPU performance would be a combination of both. It's really simple, at 60% less power consumption and half the area the transistors are 40% faster. Link AMD likely shifted the curve a bit with their implementation, but stated 20-35% faster with similar area and power savings. Link

How many ALU's will Polaris 10 come with, that should tell ya right there, there has been rumors and leaks that are quite credible.
Rumors suggest 2560 for the full Polaris 10. That's 40% less than Fiji. With ZERO architecture changes, the process change will cover part of that gap with higher clocks. Not sure what you are getting at here unless you don't believe AMD increased clocks or changed the architecture for Polaris?

Show me where Hawaii/ Maxwell 2 are ever bandwidth limited......
Fiji and tonga are probably better examples. Yeah there are some architecture differences, but there are use cases for that bandwidth that likely haven't arisen yet. These still need to be forward thinking designs, not the designs of yesterday. Multi-chip being one consideration in addition to the page faulting setup AMD seems to be implementing. Heavier use of compute will play a part as well. A use that historically hasn't benefited as much from compression techniques.
 
How is that wrong? That's literally what the fab producing the chips and AMD have publicly stated... not to mention how transistors work. I have clearly stated those benefits are separate from any architecture changes. Final GPU performance would be a combination of both. It's really simple, at 60% less power consumption and half the area the transistors are 40% faster. Link AMD likely shifted the curve a bit with their implementation, but stated 20-35% faster with similar area and power savings. Link


Rumors suggest 2560 for the full Polaris 10. That's 40% less than Fiji. With ZERO architecture changes, the process change will cover part of that gap with higher clocks. Not sure what you are getting at here unless you don't believe AMD increased clocks or changed the architecture for Polaris?


Fiji and tonga are probably better examples. Yeah there are some architecture differences, but there are use cases for that bandwidth that likely haven't arisen yet. These still need to be forward thinking designs, not the designs of yesterday. Multi-chip being one consideration in addition to the page faulting setup AMD seems to be implementing. Heavier use of compute will play a part as well. A use that historically hasn't benefited as much from compression techniques.

The Samsung link, it means one or the other, Forbes writer definitely miss understood what has been stated.

I am not suggesting anything. You tell me the math that hits 2.5 perf/ watt increase for polaris, as a best case, and then go from there. I already gave it higher than that with saying with the amount of ALU's it will hit more than that but not much. So if AMD wants to use 2.5 perf/watt as their best case, and I'm already giving the possibility it might be higher than that but not much what do you make of it?

Fiji is not a good example, as its the one we are talking about, and we know 100% it is never bandwidth limited.

A chip that has half the ALU through put (probably not going to be half more like 30-40% less is going to need 500gb/sec bandwidth? What you think our programming paradigms are going to change that drastically that bandwidth is now the only bottleneck?

PS using a semicolan like they did in Samsung denotes equal importance to both parts, in other words means OR

So what it means is either you can go for 40% increased performance and 50% smaller die or 60% drop in power consumption and 50% smaller die.
 
Last edited:
No, memory overclocking didn't give it a linear it had to be both the GPU and Memory if I'm not mistaken. Memory did at lower resolutions though. Which I would expect that as you aren't really pushing shader performance much. but you have figure in a GPU with the half the ALU's and that much bandwidth do the same?

Given that i don't think it was memory throughput limited but some other limitation in the memory pipeline that gave fiji its improvement from memory OC. Trying to compare that to GDDR5 or GDDR5x is going to be nothing but guessing, also then factor in there has to be other reasons for the terrible performance of Fiji relative.

Also on my 290 memory OC does help a lot on my fav game of all time , oblivion with ~35 gigs of active mods and Oblivion graphic extender. A lack of any real culling makes more memory bandwidth awesome :), but thats an outlier.
 
damn thats alot of mods man!

When overclocking HBM what parts of the GPU will over clock too? with GDDR, just the bus would over clock, but I'm not sure how HBM is set up just wondering.
 
IF it has a 384 bit bus. All the leaks so far are saying 128/256bit. If Polaris 10 is 384 and we've only seen the cut down versions then I'd agree. AMD still seems to think Vega will need 1TB/s.
.

I know that Polaris is supposed to be 256-bit and I don't expect a 384-bit one. My point was that even a hypothetical followup, a mid tier GPU, could still use GDDR5x on a 384-bit bus and have sufficient BW.
 
So what it means is either you can go for 40% increased performance and 50% smaller die or 60% drop in power consumption and 50% smaller die.
I'm saying none of the above. The lowest SKU of Polaris 10 or 11 likely will hit that 2.5x number. The highest SKU may in fact be worse perf/watt, but excellent perf/mm2.

When overclocking HBM what parts of the GPU will over clock too? with GDDR, just the bus would over clock, but I'm not sure how HBM is set up just wondering.
I recall reading somewhere that it scaled really weird. You had to adjust the clock in intervals of 24MHz or something or it did nothing.

I know that Polaris is supposed to be 256-bit and I don't expect a 384-bit one. My point was that even a hypothetical followup, a mid tier GPU, could still use GDDR5x on a 384-bit bus and have sufficient BW.
The followup could just use faster ram available at the time. My current concern is that really pushing one will outpace available memory bandwidth. If bandwidth was a concern and HBM not an option we'd have seen that 384 bit bus already. I still can't help but think we're going to see HBM and/or a MCM.
 
I'm saying none of the above. The lowest SKU of Polaris 10 or 11 likely will hit that 2.5x number. The highest SKU may in fact be worse perf/watt, but excellent perf/mm2.


I recall reading somewhere that it scaled really weird. You had to adjust the clock in intervals of 24MHz or something or it did nothing.


The followup could just use faster ram available at the time. My current concern is that really pushing one will outpace available memory bandwidth. If bandwidth was a concern and HBM not an option we'd have seen that 384 bit bus already. I still can't help but think we're going to see HBM and/or a MCM.
The lack of progress on memory bandwidth requirements is certainly at least partially a symptom of TripleA games being developed based on the limited memory bandwidth of the consoles and their memory contention.
 
Last edited:
Question in regards to HBM... does the ultra wide but slower clocks of HBM require an architectural rethink of some sort or another to get performance as higher clocked memory? I don't know more CU's or something?
 
I'm saying none of the above. The lowest SKU of Polaris 10 or 11 likely will hit that 2.5x number. The highest SKU may in fact be worse perf/watt, but excellent perf/mm2.
You know that isn't that great right, yeah its a hell of a lot better than what AMD has right now, but comparatively the 2.5 times the performance per watt, means they didn't do much to advance it through architectural changes.

I think its the highest sku of Polaris that will hit that figure, not the lowest, there is a range based architecture and process where they maximize performance per watt and usually that target is mid range chips. Just look back at the gtx 970 and gtx 980, those were the best performance per watt chips for Maxwell 2 line. The GTX 960 and 950 were worse as with the 980ti and titan. If its anything other than the highest sku, well best case goes down very very fast.

Now if you look at Hawaii, Tonga, and Fiji, which one was the one with the highest perf/watt? Fiji nano had! Interesting isn't it? It wasn't the traditional mid range that had the best perf/watt and definitely wasn't the low end.
 
Now if you look at Hawaii, Tonga, and Fiji, which one was the one with the highest perf/watt? Fiji nano had! Interesting isn't it? It wasn't the traditional mid range that had the best perf/watt and definitely wasn't the low end.
Strange, I thought the rest of the fiji line was faster than the nano. Didn't realize they were also using fiji for mid range parts... Go figure, the parts with the lowest clocks and voltages win the perf/watt metric. A mark that would almost always be the minimum performance spec for a chip without a power constraint. I don't see why Polaris would be any different. Every chip except maybe some of the mobile parts have worked that way.
 
Status
Not open for further replies.
Back
Top