AMD: Speculation, Rumors, and Discussion (Archive)

Status
Not open for further replies.
I completely agree, too many variables for the tests that were done to be anything conclusive.

TSMC cancelled 16FF which was set against Samsung 14LPE and what we have on those rather boring comparisons for the two A9 variants is 16FF+ vs. 14LPE. I don't need a ton of synthetic results to guess that 14LPP is roughly on the same level as 16FF+.
 
Agreed, but I thought the compression was largely controlled by creating the resource as shared in DX11? DX12/Vulkan I'd have to check.

HyperZ is the only compression I recall hearing about, besides texture, in the open source drivers. It would have to be something added for tonga/figi/polaris. I can't imagine they missed something like that.

Found it, not in there yet. drm/amd/dal: Add framebuffer compression HW programming
https://lists.freedesktop.org/archives/dri-devel/2016-February/100524.html
Delta Color Compression (DCC) was a new feature added to GCN 1.2 (Tonga, Fiji, Carrizo). HyperZ, fast clear and MSAA color compression are older technologies (all introduced before GCN).

DX11/DX12 drivers use resource flags to determine whether a render target can be a shader resource (sample / load). There is no API for the game developer to state whether they want inferior compression or a decompression step (for RTs that are also shader resources). The driver likely defaults to inferior compression, but (driver) application profiles of course could be used to override this. If we had GCN 1.2 on a console, the game developer would benchmark each option and choose the fastest for their use case.
 
HyperZ is the only compression I recall hearing about, besides texture, in the open source drivers. It would have to be something added for tonga/figi/polaris. I can't imagine they missed something like that.
DCC is all there in the driver - but I only saw anything different for color, not depth.

Found it, not in there yet. drm/amd/dal: Add framebuffer compression HW programming
https://lists.freedesktop.org/archives/dri-devel/2016-February/100524.html
This is something different, a power saving feature for scan-out (so less memory has to be read when refreshing display). I don't think it's related to dcc at all (though, I don't know what the benefits are vs. simply being able to read DCC compressed surfaces in the display controller). intel had something like that in their linux drivers for years, though I'm not sure if it's enabled for any chips yet (lots of problems...).
 
Delta Color Compression (DCC) was a new feature added to GCN 1.2 (Tonga, Fiji, Carrizo). HyperZ, fast clear and MSAA color compression are older technologies (all introduced before GCN).

DX11/DX12 drivers use resource flags to determine whether a render target can be a shader resource (sample / load). There is no API for the game developer to state whether they want inferior compression or a decompression step (for RTs that are also shader resources). The driver likely defaults to inferior compression, but (driver) application profiles of course could be used to override this. If we had GCN 1.2 on a console, the game developer would benchmark each option and choose the fastest for their use case.
DCC I'm familiar with, but the open source linux drivers I just don't recall hearing much talk about it. HyperZ only got fully enabled in the last year if I remember correctly. It worked, but there were some issues on some hardware. Link Looks like DCC got support added for the 4.4 kernel. The AMDGPU version might give a better idea of what they are doing as it should be from Catalyst.

I figured the compression was chosen by flagging it as shared. Use best compression, unless something needs to read it. I guess that does still prohibit no reuse and low compression.

This is something different, a power saving feature for scan-out (so less memory has to be read when refreshing display). I don't think it's related to dcc at all (though, I don't know what the benefits are vs. simply being able to read DCC compressed surfaces in the display controller).
I thought they were the same as they're using the same resource and doing the same thing. Both should be lowering bandwidth usage, so why have different implementations if one is superior?
 
I thought they were the same as they're using the same resource and doing the same thing. Both should be lowering bandwidth usage, so why have different implementations if one is superior?
I don't really know. I think though fbc is really centered around static content, from a quick look there might be some things it does for not hitting too many memory channels or something too.
In any case, you wouldn't really see much code for display controller handling dcc, there'd just be a enable bit somewhere (and maybe it already does that, I didn't look for it).
 
Roy Taylor, Foot in Mouth Marketing Man Extraordinaire:
“The reason Polaris is a big deal, is because I believe we will be able to grow that TAM significantly. I don’t think Nvidia is going to do anything to increase the TAM, because according to everything we’ve seen around Pascal, it’s a high-end part.
...
I don’t know what the price is gonna be, but let’s say it’s as low as £500/$600 and as high as £800/$1000.
He can't possibly truly believe that, can he? That Nvidia will just leave a huge segment for AMD to take?

But I think it's clear that they realize that they're about to lose the high performance/high margin markets as soon as GP104 hits the market.
 
One interesting data point is the 7.5M number as installed base for 290/970 and up.
If we use an average selling price of $450, that give $3.4B, or about $1.5B per year, depending on what kind of timespan you use.
It's often claimed that the real money is in the mid-end, but $1.5B is nothing to sneeze at. (It's probably larger than AMD's total GPU revenue.) And probably much higher margin as well, and profit driver as well.
 
I usually use the enablement of Bluster Mode to calibrate the placement of AMD's rhetorical petard.
Until AMD Polaris out in a manner that decisively expands the reach of affordable and effective VR, and Nvidia is actually absent in that space, I see a tempted fate bear.

Giving AMD the benefit of the doubt in that there is a method to the madness, it could be a case of making a claim bold enough to elicit some kind of rebuttal that would shed light on what Nvidia might be planning. I cannot rule out the possibility that AMD knows something that gives them such confidence, nor the possibility that Taylor overstepped.
 
But I think it's clear that they realize that they're about to lose the high performance/high margin and low volume markets as soon as GP104 hits the market.
Fixed that for what Roy Taylor was trying to say.
 
See my follow up.

I saw your follow-up and it's flawed on too many levels.
First, you're talking about a generation of graphics cards that has probably been the most stagnant in the PC market since the S3 Virge, which is now going to be revamped due to a new process being available, new cards coming out, emerging markets demanding for it (VR, AR, etc.) and game console ports starting to finally push the PC offers' specs.
Second, last I checked the 970 has been selling for $300 for ages after a not much higher initial price of $330, and the 290 sold for a lot less than that for almost a full year. According to Steam's hardware survey, the 970 alone accounts for 5% whereas the 980 and 980 Ti together account for less than 2%.
Assuming the ASP of that 7.5M units' cake is $450 just because it's half-way between $300 and $600, as if all these cards were selling by similar amounts, doesn't make sense.
 
So, erm anyway, RV770 versus GT200 all over again?

The transistor count ratio looks like it's going to be in the same ballpark, something like 1.4x more transistors in NVidia's chip...
 
So, erm anyway, RV770 versus GT200 all over again?

The transistor count ratio looks like it's going to be in the same ballpark, something like 1.4x more transistors in NVidia's chip...
AMD will likely be denser than nvidia. I am not sure how you came to 1.4x either. I expect the transistor count to be pretty close even if nvidia's chip is bigger.
 
GP104 is reckoned to be about 316mm² and Polaris 10 about 232mm², I believe. So that's about 1.36x.

Assuming that a 256-bit interface takes up 36mm² of die on both chips (that's the size from RV770), that means the ratio for the non-PHY parts of the die is something like 1.43x. With faster GDDR5 (or GDDR5X) this ratio will climb, since the PHY will be much larger...

I agree that AMD has historically had more transistor density and 14nm could result in more density than 16nm. I still believe 1.4x is a reasonable estimate at this time...
 
Status
Not open for further replies.
Back
Top