AMD: Speculation, Rumors, and Discussion (Archive)

Status
Not open for further replies.
This slide did confuse me a little bit lol, cause L2 cache for all GCN is globally shared, at least that is what I thought and was told, so where is this Global Data Share, are they talking about L1 and why is it the same grey as the L2 Cache?

Maybe I'm just reading into it too much but......marketing can make anything new lol.
GDS has always been a separate entity from L2 Cache, likely sitting on the Export Bus (since it shares the same counter with shader export). Like LDS it is a scratchpad but globally shared, and it also comes with a thing called Global Wave Sync for ordered count and barrier IIRC.
 
This slide did confuse me a little bit lol, cause L2 cache for all GCN is globally shared, at least that is what I thought and was told, so where is this Global Data Share, are they talking about L1 and why is it the same grey as the L2 Cache?
GDS is still separate. This feature goes back to r600 (which didn't have coherent L2). I suppose you could do all the synchronization through L2 nowadays, but GDS is presumably faster.
But the L2 cache for GCN so far is not _quite_ globally shared. In particular the render backends have their own color and depth caches (16kB and 4kB respectively per RBE, at least for gcn 1.0), which are non-coherent. That's why I think this picture represents a change there (not quite as revolutionary maybe since nvidia does that since Fermi, but still imho very significant difference to older chips).
 
Yeah whoever releases first will have the best product margin for a little while; just look at the NVIDIA 980, which in reality was priced way too high initially (could be argued still is) and even 970 was rather high.
I doubt either manufacturer will want to lose that early profit margin edge.
Cheers


The way I look at it, is if they release low end to OEM's and system builders and high end to AIB partners initially, that will be the best way to capitalize on the situation they are in currently. And this could be why they are splitting up the manufacturing between Samsung and GF.
 
Last edited:
Yeah whoever releases first will have the best product margin for a little while; just look at the NVIDIA 980, which in reality was priced way too high initially (could be argued still is) and even 970 was rather high.
I doubt either manufacturer will want to lose that early profit margin edge.
Cheers

If they aren't launching until mid 2016 then I certainly hope Nvidia are the first to launch the next gen GPU's. I also think the halo product counts for a lot even if it doesn't sell much volume and a don't think a dual GPU card (in this case Fiji) really counts any more. Although VR may make them relevant again if it avoids the the usual AFR compatibility issues.
 
The way I look at it, is if they release low end to OEM's and system builders and high end to AIB partners initially, that will be the best way to capitalize on the situation they are in currently. And this could be why they are splitting up the manufacturing between Samsung and GF.
And TSMC. (I'm 99.99% sure they've confirmed GPUs from TSMC too)
 
Then the numbers are even more impressive, if we take 40W from the rest of the system, Polaris uses 60% less energy.

There's something wrong with the system description. It says "Core i7 4790k" with 4x4 DDR4 2600. That's just not possible as Haswell and the Z97 boards don't support DDR4.
Furthermore, the numbers don't add up. A supposedly lower-power system with the GTX 950 was found to consume close to 160W in Dragon Age Inquisition (same FrostBite 3 engine).

Regardless, a 84W consumption from that configuration would put this "low-end Polaris" within a 40-50W power envelope. Just as I predicted (imagine hooded me saying this in Palpatine's voice, followed by an evil laughter).
Though it saddens me to see how Ryan is convinced (or sure, within his NDAs) that the laptop-oriented GPU is not coming with HBM2 but GDDR5 instead. I thought it would come with GDDR5X at the very least.
Imagine how tiny would be the footprint of this <120mm^2 GPU coupled with a single stack of HBM2 memory.

Then again, they're claiming console performance, which means the GPU will probably perform a lot closer to Pitcairn than Tonga. I wouldn't be surprised if in fact it carried around 18CUs at ~1GHz and with higher efficiency from architectural improvements it would claim a performance substantially better than Pitcairn's.
 
Regardless, a 84W consumption from that configuration would put this "low-end Polaris" within a 40-50W power envelope. Just as I predicted (imagine hooded me saying this in Palpatine's voice, followed by an evil laughter).
Though it saddens me to see how Ryan is convinced (or sure, within his NDAs) that the laptop-oriented GPU is not coming with HBM2 but GDDR5 instead. I thought it would come with GDDR5X at the very least.
Imagine how tiny would be the footprint of this <120mm^2 GPU coupled with a single stack of HBM2 memory.

For mobile, there are cost and volume headwinds for HBM especially, and I think there would be some for GDDR5X as well.
Assuming AMD want to really make a dent in the mobile market, the known example of HBM manufacturing has volumes that are not in the same league for satisfying the market, even if the cost situation were significantly better--and that might be a faint hope for mid-2016.
Thermal density might be slightly different, perhaps. Mobile can have pretty compact memory and GPU packaging, but HBM takes it very close and mobile might want to skimp on the space or cooler to handle a denser solution.

GDDR5X would be very new, and while it seems like it has aims to subsume the GDDR5 range at some point given its GDDR5-like slow mode, it would be early days for that tech in 2016. Its slow mode is also so much like GDDR5 that not as much would exist to differentiate it, although GDDR5X does shoot for lower voltage.


The primitive culling acceleration might be something that brings AMD's architecture closer to what Nvidia's does in various geometry benchmarks, where Nvidia can have a larger ratio of rejected to rendered geometry. Getting both types at least within arm's reach of the competition should help, and possibly better power efficiency based on whatever falls under that marketing point.
 
There's something wrong with the system description. It says "Core i7 4790k" with 4x4 DDR4 2600. That's just not possible as Haswell and the Z97 boards don't support DDR4.
Furthermore, the numbers don't add up. A supposedly lower-power system with the GTX 950 was found to consume close to 160W in Dragon Age Inquisition (same FrostBite 3 engine).
They were using vsync with medium preset at 1080p, so framerate was locked at 60 FPS(if I remember correctly 16nm FinFet alone provides 2x perf/watt gains over 28nm at iso perf), GTX 950 should be capable of much more, even 50 W 750 Ti can handle Battlefront at those settings with locked 60, especially in an empty X-wing training - http://media.gamersnexus.net/images...h/battlefront/battlefront-gpu-1080-medium.png
 
They were using vsync with medium preset at 1080p, so framerate was locked at 60 FPS(if I remember correctly 16nm FinFet alone provides 2x perf/watt gains over 28nm at iso perf), GTX 950 should be capable of much more, even 50 W 750 Ti can handle Battlefront at those settings with locked 60, especially in an empty X-wing training - http://media.gamersnexus.net/images...h/battlefront/battlefront-gpu-1080-medium.png


But Polaris Mini seems to be consuming less than 50W, and those numbers were apparently achieved with pre-production silicon.
And wouldn't the GTX 950 consume less when vsynced to 60 FPS? At least it's probably not going over its base clock.
 
And if the new chips are coming in mid-2016, I seriously doubt they're going to replace Fiji (by then, Gemini will be ~4 months old), so the other GPU should be something on a lower performance segment that replaces Hawaii.
Hawaii is a big chip with an expensive PCB and lots of memory chips and it's selling for rather low margins, to compete with the much cheaper GM204. It might very well be the chip that is getting the lowest margins to AMD and their AIBs, hence the urgency in replacing it.
Do you expect a Hawaii replacement to use GDDR5, GDDR5X, or HBM2?
 
But Polaris Mini seems to be consuming less than 50W, and those numbers were apparently achieved with pre-production silicon.
And wouldn't the GTX 950 consume less when vsynced to 60 FPS? At least it's probably not going over its base clock.

actually nV doesn't have the same power saving tech with frame rate lock as AMD does....

We have seen AMD's frame rate lock power savings on its current crop of cards around 25% which is not saying they did that in this demonstration but that's what Ryan is hinting at as a possibility and that's why take the figures as may not be actual at release.
 
Ok, let's the programming guide manual hounting leak begin. Hopefully we will see at least volumetric sparse textures instruction support...
 
Do you expect a Hawaii replacement to use GDDR5, GDDR5X, or HBM2?

Not that guy however purely for cost assumptions I would guess 256 bit bus + GDDR5 - HBM2 would be fantastic but I I would assume it's a bit too expensive for now, not GDDR5X because AMD have gone wider and slower with memory recently. Reasoning why - looking at the compression in the 380 and 380X, from 280/280X the memory bandwidth is down to 75%/65% what it was roughly (please correct if I'm wrong thanks):

http://www.anandtech.com/show/9784/the-amd-radeon-r9-380x-review

280X to 380X there is 288GB/s down to 182.4GB/s (about 63% of 288GB/s) and the 380 is 176GB/s down from the 280's 240GB/s iirc, which is about 73%. Looking at the benchmarks to me nothing suggests they're bandwidth limited, so AMD could drop down to a 384 bit bus just from there. However this will supposedly be improved for Polaris again:

"Finally, GCN 4 will also include a newer generation of RTG’s memory compression technology"

http://www.anandtech.com/show/9886/amd-reveals-polaris-gpu-architecture/2

Maybe even needing only half the bandwidth with a 20-25% improvement (0.65x.8 = 0.52), that'd drop it down to a 256 bit bus with GDDR5. GDDR5X is up to 12Gbit/s, a 128 bit bus would provide up to 192GB/s which is half the 390X, assuming compression improvements it's possible. A single stack of HBM2 is up to 256GB/s, along with the power and size benefits it'd be a dream but unsure how realistic it is because of costs.
 
Here's something I saw at Tomshardware (slow news day, don't blame me) and it actually makes sense.
The GTX 950 system is pulling 140W. Take out the 950's rated 90W and you get 50W of rest-of-the-system power consumption.
Now taking 50W from Polaris Mini's 86W and we get 36W for the card, which is.. the rated TDP of a Geforce 940M using the GM108.

That said, it doesn't seem like this card is targeting the GM107's power envelope. It's targeting the GM108's (Geforce 940M) power envelope at a performance close to the desktop GTX 950.
This would also mean that its performance is not between Pitcairn and Tonga, but rather between Bonaire and Pitcairn.
One of the slides says "Goal: Console-caliber performance in a thin-and-light notebook". That's why they compared it against a GTX 950, because it's the card from nVidia that performs closer to the PS4's Liverpool with 18 CUs and 32 ROPs. Plus, the GM107 with a 55W lowest TDP can't really fit inside a thin-and-light notebook (say, Macbooks and XPS 13).

Sooo... I must change my own speculation for the Polaris Mini.
I think this first new GPU will be a notebook-first part with a 25-35W TDP (expecting 2 or 3 SKUs from it). Then, the less-binned desktop part will come with higher clocks, but without a PCI-Express power connector. It'll be a tiny card oriented towards mini-ITX and/or low-profile systems.
My expectations are 12-14 CUs at 1GHz and 2/4GB of 128-bit GDDR5 bandwidth at 5000MT/s (or maybe 5700MT/s to be optimistic). A 90-100GB/s of dedicated bandwidth plus architectural improvements for higher throughput per CU should put this low-power GPU between the XBone and the PS4, depending on the SKU.


Coincidentally, the GM108 is the chip coming in the Surface Book's keyboard. I'm guessing it would be in Microsoft's best interest in refreshing the SBook's dedicated GPU towards something that would match the XBone in performance, with little to no impact on battery life.


Do you expect a Hawaii replacement to use GDDR5, GDDR5X, or HBM2?

Well we've seen Raja Koduri saying in an interview that there would be two new GPUs in 2016. This latest Anandtech article implies that:
1 - The Polaris family will have both GDDR5 and HBM
2 - Lower-end a Polaris cards will come with GDDR5 and higher-end Polaris cards will come with HBM

So if this notebook-oriented part is coming with GDDR5, then the other - if it's a Hawaii replacement - should come with HBM.
Unless... the second Polaris card is also coming with GDDR5 and 2016 won't see any higher-end card with the new architecture. I hope that's not the case.
But if there's a card coming this year with Hawaii's performance, I bet it'll come with HBM2.


actually nV doesn't have the same power saving tech with frame rate lock as AMD does....

They probably don't, but I don't think a Maxwell card that's giving solid 60FPS while vsynced to 60Hz, when it could be doing say 90FPS non-synced, is ever raising its clocks above the rated base values.
And the card at base clocks won't consume the same as a boosted one.
Regardless, this was a non-production sample and they stated that not all power-saving features were enabled yet. If we take that handicap from the Polaris Mini and the 950's lack of FRTC-esque power savings, maybe the comparison will even out in the end.
 
If they aren't launching until mid 2016 then I certainly hope Nvidia are the first to launch the next gen GPU's. I also think the halo product counts for a lot even if it doesn't sell much volume and a don't think a dual GPU card (in this case Fiji) really counts any more. Although VR may make them relevant again if it avoids the the usual AFR compatibility issues.
Yeah blow for consumers, I had my fingers crossed that both could release by end of Q1 (yeah in my dreams :) ).
I guess AMD are holding their breath for now to see when Pascal actually starts to appear in shops and with what models.
I think NVIDIA hit the perfect consumer business sales model releasing the 980 followed by the 970 as this gave them eye watering margins (good for business but sucks for us), and then follow up with other models below and also their flagship.
AMD really needs to be in the picture this time round at similar release times.
But the sales approach may change from NVIDIA this time as they are also releasing comparable business/professional models.

Before anyone criticises NVIDIA for their margins, lets all be honest any manufacturer will want to achieve the best margins they can; that margin and extra sales helps their share price and critically drives R&D (a lot of cost is eaten here by NVIDIA and also AMD, in relative terms more than Intel spends as a cost to revenue ratio).
What a consumer needs is for the competition to be there at the same time.

Cheers
 
http://www.pcper.com/reviews/Graphi...hnologies-Group-Previews-Polaris-Architecture

AMD’s Joe Macri stated, during our talks, that they expect this FinFET technology will bring a 50-60% power reduction at the same performance level OR a 25-30% performance increase at the same power. In theory then, if AMD decided to release a GPU with the same power consumption as the current Fury X, we might see a 25-30% performance advantage.

No wonder 20nm was a waste of time - FinFET is pretty underwhelming. Oh well, fingers-crossed that optical computation gets here in the next 10 years.


At the end of this video you can see how the benchmark comparison was done, and right at the very end is a description of the tested systems: CPU set to 80% power limit and the AMD GPU running at 850MHz.

850MHz. Does the dial even go that low? :rolleyes:

I get a very strong sense that AMD is chasing laptop/tablet power budgets exclusively. Enthusiasts are gonna get very little from FinFET, at least from AMD. We're looking at Hawaii versus Tahiti boost, at best, or perhaps as bad as Fiji versus Hawaii.
 
Assuming that's a low end part i don't see why chasing after laptops is a bad thing. You expect low wattage parts to be used in laptops, no?
 
I'm talking about the implications for enthusiast discrete: utterly miserable. 30% more performance from the node change, before architectural improvements, after 5 years is just horrible.
 
Status
Not open for further replies.
Back
Top