AMD: Speculation, Rumors, and Discussion (Archive)

Razor1 · Jul 14, 2016

ToTTenTranz said:
The lack of perspective is notorious.

If we were complaining about tesselation or excess geometry being underutilized back in 2011, I wonder what the argument would be.
Main difference being that nvidia hardwires sub-pixel triangles into hairworks and other effects in gameworks, which serves only to cripple performance on AMD hardware and their own kepler/fermi cards, so we're not even talking about IQ-enhancing features anymore.

Now we have async compute that brings unprecedented performance boosts to the overwhelming majority of the console+PC market, has been acclaimed as such by pretty much all high profile developers (except for Epic because Tim and Jen are bros4life?), it's bringing major performance boosts to 4 year-old cards, but it's somehow a bad thing.

Yet you bring up an article that is well know to be incorrect in their assumptions with tessellation in Crysis?
I would rather have better mesh quality right now by the way and no that isn't due to tessellation actual mesh details. Never liked the pixel stretching of having one pixel of a texture stretched over multiple pixels of a monitor, just makes normal maps and lighting ugly.

Never stated it was bad, its just not the end all be all of performance! how hard is that to grasp? You think Pascal can't do it, you think nV is hiding something, its the same two guys here in this thread talking about Aysnc and the glories of it and then saying it can't be done in this https://forum.beyond3d.com/threads/no-dx12-software-is-suitable-for-benchmarking-spawn.58013/page-8 thread, on Pascal...... Wow, come on can you be any more transparent. And yet we are 100% sure Pascal can do it without the performance penalty that Maxwell had. MDolenc test has shown this, the same test that we used when this whole thing about async was blown up to ridiculous marketing endeavors.

Grall · Jul 14, 2016

lanek said:
Little round up of custom 480 and a little bit more.(korean Polaris launch event )

This link returns 403: Forbidden for me...

Anyone have squirreled away the pics and can re-post? Thz!

lanek · Jul 14, 2016

I hope that the images are not blocked as the site.

DavidGraham · Jul 14, 2016

Ike Turner said:
So Vulkan/DX12/Metal are now "Mantle in disguise wreaking havoc on non AMD HW". Gotcha...can't wait to see those moving goalposts once Nvidia gets its shit together in the near futur and the disguised Mantle suddenly becomes cool. It's like being on Neogaf in here lately..

Yes it's a pretty straight forward logical equation. When high level optimizations beat low level optimizations in a certain hardware, it means something is wrong with the low level optimizations or it's just a facade. Do you have another explanation for this? Please enlighten us.

And Yes causing havoc meaning degrading performance without increasing image quality in the slightest. DX10 was famous for this, it ended up despised, ignored and was quickly replaced with DX10.1 and then DX11.

Kaotik · Jul 14, 2016

sir doris said:
Love the ASUS info sheet in front of their RX480 with the wrong values against the fields for 'Base Clock' and 'Boost Clock', also the use of the term CUDA cores

EDIT: What's a semi-passive fan?

What? You mean I won't get a card with baseclocks @ PCI Express 3.0 and boostclocks @ OpenGL 4.5?
That's it, I'm cancelling my Strix RX 480 preorder! :runaway:

lanek · Jul 14, 2016

Kaotik said:
What? You mean I won't get a card with baseclocks @ PCI Express 3.0 and boostclocks @ OpenGL 4.5?
That's it, I'm cancelling my Strix RX 480 preorder!

If we trust Asus, you will have too a new G-sync capable GPU soon. .. look like they have decide to economise on the marketing material and use the same for all brands and gpu's.

Its a little bit cheap from Asus.

Kaotik · Jul 14, 2016

DavidGraham said:
Yes it's a pretty straight forward logical equation. When high level optimizations beat low level optimizations in a certain hardware, it means something is wrong with the low level optimizations or it's just a facade. Do you have another explanation for this? Please enlighten us.

And Yes causing havoc meaning degrading performance without increasing image quality in the slightest. DX10 was famous for this, it ended up despised, ignored and was quickly replaced with DX10.1 and then DX11.

It's quite simple really, AMD aimed at Mantle & co since dawn of GCN, and hoped it will be good enough for DX11 & co.
NVIDIA meanwhile did everything they could to get DX11 & co perfect, and in the process forgot to think forward, and what Mantle & co could bring to the table.
Changing to (partially) software scheduling brought great power savings for NVIDIA, but it also lost some flexibility compared to Fermi & GCN to my limited understanding, and there's probably other elements stacking on that, too.

DavidGraham · Jul 14, 2016

Kaotik said:
It's quite simple really, AMD aimed at Mantle & co since dawn of GCN, and hoped it will be good enough for DX11 & co.
NVIDIA meanwhile did everything they could to get DX11 & co perfect, and in the process forgot to think forward, and what Mantle & co could bring to the table.

That's all good and beautiful in a fairy tale story, when the good deeds are rewarded and the bad ones are punished. It doesn't work like that in the tech world. There is no thinking forward when low level access yields worse performance than high level. Low access means adapting to the hardware at a deeper level, circumventing weaknesses and exploiting strengths, it should yield better fps. High level on the other hand shouldn't so that. And thus should be the bearer of worse fps. We have the situation reversed here. If an alien would look at this, he would think DX11 and OGL are low level APIs.

Changing to (partially) software scheduling brought great power savings for NVIDIA, but it also lost some flexibility compared to Fermi & GCN to my limited understanding, and there's probably other elements stacking on that, too.

Losing flexibility means a higher reliance on hand made low level code. Not the other way around. NVIDIA having better performance with the abstracted, automated code means they have higher flexibilty than you think.

Deleted member 13524 · Jul 14, 2016

Ike Turner said:
So Vulkan/DX12/Metal are now "Mantle in disguise wreaking havoc on non AMD HW". Gotcha...can't wait to see those moving goalposts once Nvidia gets its shit together in the near futur and the disguised Mantle suddenly becomes cool. It's like being on Neogaf in here lately..

Yes, the goalpost movers in this forum have been hard at work since the RX 480 came out, and this has significantly worsened since all these async compute implementations started coming up. It's like they're feeling threatened somehow.
Just the fact that there's a thread created with the sole intent to badmouth something AMD claimed over a year ago about driver optimization towards VRAM savings, when there was already a thread dedicated to that in the second page of this forum, tells you just as much.

DavidGraham said:
Yes it's a pretty straight forward logical equation. When high level optimizations beat low level optimizations in a certain hardware, it means something is wrong with the low level optimizations or it's just a facade. Do you have another explanation for this?

Your IHV has been a lot more successful by convincing developers to use proprietary high-level tools that intentionally cripple the performance on the competitor's hardware (and their own hardware from previous generations), than it could ever be by simply sticking to standards.
That's all the explanation needed.

CSI PC · Jul 14, 2016

Bringing the conversation back to AMD 480.
It is great that it has such a performance boost with Vulkan in Doom without noteably increasing the power consumption, has a nice effect on the perf/watt calculations (caveat though is context is limited by API).
Would be nice to see figures in comparison with previous model such as 380/390/390x/Fury, but so far I have only read one analysis looking at power draw between openGL and Vulkan and that was just 480.

Cheers

MDolenc · Jul 14, 2016

Kaotik said:
Changing to (partially) software scheduling brought great power savings for NVIDIA, but it also lost some flexibility compared to Fermi & GCN to my limited understanding, and there's probably other elements stacking on that, too.

What does scheduling instructions within SM have to do with scheduling of Draw/Dispatch commands (which I presume is what you mean by aiming into the async compute direction)? We have been over this: just because it contains word "schedule" doesn't mean the two are connected in any way.

AlNom · Jul 14, 2016

Are the X-AMD flop cards performing closer to the X-nV flop cards in the Vulkan/Doom comparison?

e.g. is a 6TF AMD GPU performing what you'd see a 6TF nV GPU do?

DavidGraham · Jul 14, 2016

ToTTenTranz said:
Your IHV has been a lot more successful by convincing developers to use proprietary high-level tools that intentionally cripple the performance on the competitor's hardware (and their own hardware from previous generations), than it could ever be by simply sticking to standards.
That's all the explanation needed.

If you are capable of having an adult technical discussion instead of being an oversensitive bully that skirts around facts, you'd know that proprietary effects are just that, effects that increase visual quality while also costing performance on both vendors hardware, they never claimed otherwise. They never claimed to be standards that improve performance while offering that in selective manner. Or worse yet, causing fps degradation without an increase in visual quality and then claim they are doing so with close to the metal access!

Deleted member 13524 · Jul 14, 2016

AlNets said:
Are the X-AMD flop cards performing closer to the X-nV flop cards in the Vulkan/Doom comparison?

e.g. is a 6TF AMD GPU performing what you'd see a 6TF nV GPU do?

Depends on which nvidia architecture you're comparing to.
Performance scales very close to linear if you compare compute capabilities in pretty much all GCN cards (except for multi-GPU solutions which are still not supported in the game for either vendor).
As for nvidia Pascal is well above the perf/flop, Maxwell 2 is above but within ~10% and Kepler is a complete crapfest where a GTX 780 Ti dips below a cutdown Pitcairn.

Razor1 · Jul 14, 2016

http://videocardz.com/62250/amd-vega10-and-vega11-gpus-spotted-in-opencl-driver

More sightings of Vega

The SI, CI, VI and GFX stand for GPU generations. The latest, yet unreleased architecture is GFX9, which includes Greenland, Raven1X, Vega10 and Vega 11. For quite some time Greenland was rumored to be just another codename for Vega10, but since it’s listed separately, we should assume that Greenland is something else, probably an integrated graphics chip.

Ike Turner · Jul 14, 2016

ToTTenTranz said:
Depends on which nvidia architecture you're comparing to.
Performance scales very close to linear if you compare compute capabilities in pretty much all GCN cards (except for multi-GPU solutions which are still not supported in the game for either vendor).
As for nvidia Pascal is well above the perf/flop, Maxwell 2 is above but within ~10% and Kepler is a complete crapfest where a GTX 780 Ti dips below a cutdown Pitcairn.

Benchmark above was done with Async Off (smaa was enabled).
http://gamegpu.com/action-/-fps-/-tps/doom-api-vulkan-test-gpu

Deleted member 13524 · Jul 14, 2016

SK Hynix will have HBM2 available in Q3 2016, and only in 4-Hi stacks:
http://www.smartredirect.de/redir/c...rg/vbulletin/showthread.php?t=562438&page=479

Ike Turner said:
Benchmark above was done with Async Off (smaa was enabled).

Oh, wow.. that changes things even further then..

3dilettante · Jul 14, 2016

Kaotik said:
It's quite simple really, AMD aimed at Mantle & co since dawn of GCN, and hoped it will be good enough for DX11 & co.
NVIDIA meanwhile did everything they could to get DX11 & co perfect, and in the process forgot to think forward, and what Mantle & co could bring to the table.

The process for putting together an industry standard API can be a protracted one, with differing logjams and compromises that generally do not get commented upon by the stakeholders after the fact.

One possible interpretation of the Mantle situation is that whatever ongoing efforts into the successor APIs there were had come to an impasse, and Mantle served as a way to break the impasse by putting an actual lower-level API into the market and drawing in developers.

Nvidia may have seen that a lower-level API would be useful in the future, however it was also coming from a different place where it also had some notable positions of advantage with what was already in place. The trend was that Nvidia was leading in driver resources and devrel, so it could get more of those benefits without ripping out its investments.
It's also possible that what Nvidia wanted, if it wanted a lower-level API, differed more than the other major stakeholders. Mantle or something resembling it would not have been the only way of going about things.

At least some of the performance increases with Vulkan versus OpenGL for AMD look to be examples where AMD was notably underperforming with similarly positioned Nvidia GPUs, so Vulkan's benefit is at least in part that it's bypassing a decent chunk of the traditional AMD driver performance tax, or at least inflicting some of the software immaturity and weak optimization penalties on competing silicon that had gained an insurmountable lead on the old APIs.
AMD's situation is such that even if Nvidia does eventually rectify these problems (there's enough money, talent, and inertia to figure something out), it's could still be a win if the costs for keeping up on the thick APIs was becoming too high for the weaker competitor. At least if AMD becomes second-best at Vulkan and the like, the reduced load on AMD might make that affordable.

Changing to (partially) software scheduling brought great power savings for NVIDIA, but it also lost some flexibility compared to Fermi & GCN to my limited understanding, and there's probably other elements stacking on that, too.

The primary scheduling change that comes to mind for Nvidia is the shift of dependence checking for ALU instructions when going from Fermi to Kepler, taking out of an extra layer of hardware monitoring and encoding it in the instruction stream.
That's below the level of the APIs, and not relevant for asynchronous compute or special operations exposed with intrinsics.
Poorly optimized instruction streams for this can affect the number of stall cycles due to dependences, leading to less effective use of ALU resources.
However, the general case should not present an insurmountable challenge, and there is little evidence that Nvidia is suffering in terms of getting performance per hardware FLOP. It's not like Fermi is even showing up for this particular fight, and Kepler to Maxwell to Pascal shows that they have stayed with this without showing a plateau effect due to this particular architectural feature.

It's not like AMD's instruction scheduling is actually flexible beyond constraining its execution loop so that dependences must resolve prior to the next issue cycle. The impact that might have in anything else that needs to fit into that execution loop or what changes can be made to it might be evidenced by what just happened to the RX 480's power ceiling. Hardware interlocks or decent static dependence checking are not the biggest problems out there.
AMD's choice also not a panacea, as this forum is rife of examples and commentary on how poor AMD's instruction generation is, particularly in the PC space and its non-presence in a lot of the compute space. If Nvidia is experiencing problems with just its GPUs needing hints on what instructions depend on others in a handful of cycles, other pitfalls apparently can make up for it on the other side.

smw · Jul 14, 2016

AlNets said:
Are the X-AMD flop cards performing closer to the X-nV flop cards in the Vulkan/Doom comparison?

e.g. is a 6TF AMD GPU performing what you'd see a 6TF nV GPU do?

If the numbers here are to be believed https://www.computerbase.de/2016-07...md-nvidia/#diagramm-doom-mit-vulkan-2560-1440 then 980ti which has 1.1 the TFlops of 390 is about 1.13 times faster in 1080p and 1.1 times faster in 1440p. However, Fury X is only 1.43 times a 390 and 1.3 times a 980ti rather than 1.68x and 1.52x, respectively.

Kaotik · Jul 14, 2016

Razor1 said:
http://videocardz.com/62250/amd-vega10-and-vega11-gpus-spotted-in-opencl-driver

More sightings of Vega

Greenland is most likely, just like Fudzilla leaked ages and ages ago, the iGFX in the fabled HPC MCM APU

AMD: Speculation, Rumors, and Discussion (Archive)

Razor1

Grall

Invisible Member

lanek

DavidGraham

Kaotik

Drunk Member

lanek

Kaotik

Drunk Member

DavidGraham

Deleted member 13524

Guest

CSI PC

MDolenc

AlNom

Moderator

DavidGraham

Deleted member 13524

Guest

Razor1

Ike Turner

Deleted member 13524

Guest

3dilettante

smw

Kaotik

Drunk Member

Similar threads