AMD: Speculation, Rumors, and Discussion (Archive)

fellix · Aug 8, 2016

Gipsel said:
Pretty good for a private effort. More (and higher resolution [16Mpixels]) pictures can be found here (or here a direct link to a re-upload of the full res die shot with slightly increased compression).

Good Lord, what a threasure!!!

Kudos for the all the work.

3dilettante · Aug 8, 2016

The die shots of Polaris show it was striving heavily to keep a lid on die size. That might have something to do with TrueAudio 2.0 opting to put functions on the CUs, and the ROP count.
The FinFET transition seems to have gobbled up most of the attentions for Polaris 10/11 (and apparently at least one 16nm TSMC APU with the Xbox One S and a PS4 Neo of indeterminate node and architecture).
Hopefully some of what was learned from the teething pains with this pipe cleaner can carry over to the rumored architectural change with Vega. Polaris 10 brings density, but the nifty voltage tweaks are applied to cards whose voltages seem pretty stuck at higher levels.

One interesting difference from the first Polaris die shot and the more polished one is that the little bright areas are less visible in the newer shots. It looks like those might be interconnect links?

fellix · Aug 8, 2016

3dilettante said:
One interesting difference from the first Polaris die shot and the more polished one is that the little bright areas are less visible in the newer shots. It looks like those might be interconnect links?

Those could be part of the clock distribution tree.

3dilettante · Aug 8, 2016

fellix said:
Those could be part of the clock distribution tree.

That could be. It looks like it would be something with more localized metal to show up after grinding down to that level.
It seemed like there was some correlation with portions of the chip that also would be sending/receiving a fair amount of data, but those would weigh on the clock network at the same time.

Deleted member 13524 · Aug 9, 2016

New APU roadmaps leaked on semiaccurate forums:
http://wccftech.com/amd-roadmap-2016-2017-leaked-zen/

Big news here are the Raven Ridge specs.
4-core/8-thread that starts at 4W on mobile and an iGPU with up to 12 CUs / 768 sp.
I can't see anything regarding memory configuration, so it's probably just 128bit DDR4 which will probably strangle the GPU a lot.
I would love to see Raven Ridge carrying at least a single HBM2 stack, but when AMD's roadmaps are omitting stuff, I've learned to expect the least interesting stuff.

I can't wait to see what those 4W-15W Raven Ridge APUs can do, compared to Intel's Y and U lines. I imagine not all CUs will be enabled on the lower power variants, but AMD has the potential to beat every GT2 implementation from Intel.

Kaotik · Aug 9, 2016

ToTTenTranz said:
New APU roadmaps leaked on semiaccurate forums:
http://wccftech.com/amd-roadmap-2016-2017-leaked-zen/

Big news here are the Raven Ridge specs.
4-core/8-thread that starts at 4W on mobile and an iGPU with up to 12 CUs / 768 sp.
I can't see anything regarding memory configuration, so it's probably just 128bit DDR4 which will probably strangle the GPU a lot.
I would love to see Raven Ridge carrying at least a single HBM2 stack, but when AMD's roadmaps are omitting stuff, I've learned to expect the least interesting stuff.

I can't wait to see what those 4W-15W Raven Ridge APUs can do, compared to Intel's Y and U lines. I imagine not all CUs will be enabled on the lower power variants, but AMD has the potential to beat every GT2 implementation from Intel.

That's what the HPC-APU is for

Deleted member 13524 · Aug 9, 2016

Kaotik said:
That's what the HPC-APU is for

I know, but Raven Ridge would be a lot more interesting with even just a single HBM stack. Hynix seemingly only makes 4-Hi stacks at the moment but I think even a 2-Hi stack with 2GB would be more than enough for that iGPU to blast through all Intel offerings with equivalent TDP.

Maybe the mobile variant will support LPDDR4. Do any of the current AMD apus support lpddr3?

gamervivek · Aug 9, 2016

DrYesterday said:
AMD/Raja touted numerous architectural improvements with Polaris. Polaris also benefits from higher clock speeds and faster memory. So why is there so little improvement over the 3xx lineup? The 460 is basically even with a 370x. The 480 comes between the 390 and 390X. Was this really just a "dumb shrink"? Or are the drivers holding back performance?

370x = 2.8B transistors
460 = 3.0B transistors

390 = 6.2B transistors
480 = 5.7B transistors

The only worthwhile improvement AMD have shown is getting similar performance with fewer ROPs. Polaris has half the ROPs of comparable performing cards and that bodes well for Vega if you believe that ROPs held back Fury X performance.

homerdog · Aug 9, 2016

ToTTenTranz said:
I know, but Raven Ridge would be a lot more interesting with even just a single HBM stack. Hynix seemingly only makes 4-Hi stacks at the moment but I think even a 2-Hi stack with 2GB would be more than enough for that iGPU to blast through all Intel offerings with equivalent TDP.

Maybe the mobile variant will support LPDDR4. Do any of the current AMD apus support lpddr3?

AMD could make a kickass "steambox" type APU next year with Zen + GCN + HBM. If the price is right it could really sell in Asian markets I think. I would definitely be interested in such a thing for the living room and OEMs would have a ball with it.

no-X · Aug 9, 2016

DrYesterday said:
370x = 2.8B transistors
460 = 3.0B transistors

RX 460 doesn't have fully enabled GPU, two CUs are disabled, so the comparision isn't correct.

Razor1 · Aug 9, 2016

gamervivek said:
The only worthwhile improvement AMD have shown is getting similar performance with fewer ROPs. Polaris has half the ROPs of comparable performing cards and that bodes well for Vega if you believe that ROPs held back Fury X performance.

I don't think the Fury X was ROP bound...... Don't remember anytime we saw that, and this is why at higher resolutions it tended to perform better since shader needs were greater than the ROP needs.

Anarchist4000 · Aug 10, 2016

homerdog said:
AMD could make a kickass "steambox" type APU next year with Zen + GCN + HBM. If the price is right it could really sell in Asian markets I think. I would definitely be interested in such a thing for the living room and OEMs would have a ball with it.

Given all the resources IHVs appear to be dumping into linux drivers lately, I really wouldn't be surprised if something like this is in the works. Only caveat is a bunch of games would need to be looking at Vulkan for that to really be practical. Sure OpenGL works, but porting DX12 to Vulkan or using it natively makes far more sense than a DX11 to OpenGL port.

lanek · Aug 10, 2016

Anarchist4000 said:
Given all the resources IHVs appear to be dumping into linux drivers lately, I really wouldn't be surprised if something like this is in the works. Only caveat is a bunch of games would need to be looking at Vulkan for that to really be practical. Sure OpenGL works, but porting DX12 to Vulkan or using it natively makes far more sense than a DX11 to OpenGL port.

Well it is more a question of "time", intencively, when at the start, the choice appears, do we port it to Vulkan or OpenGL x.x ? I think the choice will be obvious. The good thing, is AMD have issue new drivers on Linux who seems pretty good. And when developpers develop with OpenGL and AMD in mind, the result seems way better than what we are used to see with the old OpenGL games, software implementation.

sebbbi · Aug 10, 2016

Razor1 said:
I don't think the Fury X was ROP bound...... Don't remember anytime we saw that, and this is why at higher resolutions it tended to perform better since shader needs were greater than the ROP needs.

Fury X has both high bandwidth and high compute performance. I would guess that Fury X compute units are most of the time underutilized in games. Geometry pipeline is likely a big bottleneck for it.

I suggest reading this GDC presentation by Graham Wihlidal:
http://www.frostbite.com/2016/03/optimizing-the-graphics-pipeline-with-compute/

Page 12 shows a GCN occupancy graph. As soon as you start pushing more (smaller) triangles and more draws, GCN cannot keep the occupancy up. Geometry pipeline (including fixed function units and the vertex shader) is the bottleneck. As a result not enough pixel shader waves are spawned to fill the GPU. This graph is from a console GPU with reduced CU count and reduced bandwidth as bigger PC parts (but the same geometry throughput). This problem should be more severe on PC (esp on Fury X). Unfortunately there are no PC tools that record runtime occupancy graph of the whole GPU. You can't see the geometry pipeline problems by doing static analysis to shaders.

Fortunately Polaris improved the geometry pipeline a lot. It is able to quickly reject triangles that don't contribute to the image (strip degenerates, sub-pixel sized, etc). This results in higher vertex shader occupancy, which leads to higher pixel shader occupancy. Polaris also added instruction prefetch. Prefect should reduce the stall when a new vertex shader starts execution (important when there's lots of small draws as the stall cascades through the whole GPU).

Fury X most likely was occasionally also ROP bound (big triangles close to camera create bursts of occupancy). It has quite low ROP : compute ratio. It has around 30% more bandwidth and compute than R9 390, but the same amount of ROPs. And it has DCC as well (= Fury X is practically is never memory bound). Hopefully Vega doubles the ROP count and further improves the geometry pipeline from Polaris. AMD would also benefit from more efficient rasterization. Nvidia added tiled rasterization in Maxwell, and got a big efficiency boost.

AnomalousEntity · Aug 10, 2016

sebbbi said:
Fortunately Polaris improved the geometry pipeline a lot. It is able to quickly reject triangles that don't contribute to the image (strip degenerates, sub-pixel sized, etc). This results in higher vertex shader occupancy, which leads to higher pixel shader occupancy.

Don't you run vertex shader before a triangle can be culled out in primitive discard stage? This shouldn't affect VS occupancy although should improve PS occupancy.

Ext3h · Aug 10, 2016

AnomalousEntity said:
Don't you run vertex shader before a triangle can be culled out in primitive discard stage? This shouldn't affect VS occupancy although should improve PS occupancy.

Effective culling prior to being bottle necked on the fixed function part prevents both the stall on the VS and the starvation on the PS, effectively increasing throughput and occupancy on both.

xEx · Aug 11, 2016

My biggest hope is that with vega lunch I will be able to buy a 480/70...Im actually now tempting for a 1060 since it has stock in amazon unlike AMDs. I dont know why AMD chose to match the 470 so close to the 480, its basically the same card.(Im talking about what it is the market)look ridiculous to me but anyways Ive been patient but they keep announcing and reviewing cards and then there is almost no stock. Im still waiting them to get stock in amazon but for some reason AMD and its partners doesn't want to be in the biggest store in the world.

Michellstar · Aug 11, 2016

xEx said:
My biggest hope is that with vega lunch I will be able to buy a 480/70...Im actually now tempting for a 1060 since it has stock in amazon unlike AMDs. I dont know why AMD chose to match the 470 so close to the 480, its basically the same card.(Im talking about what it is the market)look ridiculous to me but anyways Ive been patient but they keep announcing and reviewing cards and then there is almost no stock. Im still waiting them to get stock in amazon but for some reason AMD and its partners doesn't want to be in the biggest store in the world.

Being 470 a 480 with disabled CUs, so same die, they want everybody to buy 480 in this price segment

xEx · Aug 11, 2016

Michellstar said:
Being 470 a 480 with disabled CUs, so same die, they want everybody to buy 480 in this price segment

The price is practically the same...

RecessionCone · Aug 11, 2016

Was looking at the GCN instructions and had a question about DS_PERMUTE_B32: what happens if two lanes write to the same address? The ISA documentation doesn't mention a defined behavior, so I assume it's just a race condition?

AMD: Speculation, Rumors, and Discussion (Archive)

fellix

3dilettante

fellix

3dilettante

Deleted member 13524

Guest

Kaotik

Drunk Member

Deleted member 13524

Guest

gamervivek

homerdog

donator of the year

no-X

Razor1

Anarchist4000

lanek

sebbbi

AnomalousEntity

Ext3h

xEx

Michellstar

xEx

RecessionCone

Similar threads