AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Bondrewd · Oct 1, 2017

Pressure said:
Yeah, we can revisit the discussion when and if they get enabled.

As the months are passing I am not holding my breath but I would love to be proven wrong. We need a healthy, competitive market.

Silence is usually not the best indicator though.

Silence is actually the best indicator possible.
The likes of nVidia is yet to talk about their tiling implementation.

Samwell · Oct 1, 2017

Wasn't amd publishing a major driver update around november the last years? I would give amd time till that driver for the features to get activated. If they don't manage even that, then my hopes are lost.

Bondrewd · Oct 1, 2017

Samwell said:
Wasn't amd publishing a major driver update around november the last years? I would give amd time till that driver for the features to get activated. If they don't manage even that, then my hopes are lost.

Crimson ReLive was December afaik.
Actually they've already presented some vague poly throughput numbers with NGG Fast Path in the whitepaper (and claimed it was tested on 17.320 internal testing branch).
We can only wait now.

Anarchist4000 · Oct 1, 2017

Pressure said:
Yeah, we can revisit the discussion when and if they get enabled.

Isn't that what's being discussed with Forza7? Vega looks to be running rather efficiently in those benchmarks. Frame times near perfect, faster than 1080ti, and scaling inline with theoretical expectations. I'm guessing RPM here, however the results manifest differently than some expect.

Samwell said:
Wasn't amd publishing a major driver update around november the last years? I would give amd time till that driver for the features to get activated. If they don't manage even that, then my hopes are lost.

Might need to step back and look at what some features were designed to do. Primitive shaders were largely intended for culling and feeding consolidated geometry to the front end efficiently. That can also be performed with compute shaders as some recent papers have shown or streams with only valid triangles. Roughly the GPU driven approach as well. That may very well be the case here. RPM on the other hand can reduce power. Twice the math or half the time leaving hardware idle. Idle hardware obviously using less power, less contention, and allowing higher clocks in turn. Other features, HBCC for example, aren't always practical for obvious reasons.

w0lfram · Oct 1, 2017

Bondrewd said:
The likes of nVidia is yet to talk about their tiling implementation.

Explain, plz.

Digidi · Oct 1, 2017

What I wondering AMD is really quiet about Primitive Shader that makes me nervous that something is wrong. Maybe chip is broken?

I don't hope so because I like the work and ideas of AMD.

entity279 · Oct 2, 2017

I think Primitive shaders where explained pretty well here... In that either games should be (1) coded with PS or maybe (2) drivers could be made smart enoungh to do shader replacement with PSs. It's not something that will be announced overnight

But granted, AMD could have had something prepared on Vega launch day.

Bondrewd · Oct 2, 2017

entity279 said:
I think Primitive shaders where explained pretty well here... In that either games should be (1) coded with PS or maybe (2) drivers could be made smart enoungh to do shader replacement with PSs. It's not something that should will be announced overnight

But granted, AMD could have had something prepared on Vega launch day.

Rys said that they are not going to expose primitive shaders any time soon. Its up to the driver to handle them.

Digidi said:
What I wondering AMD is really quiet about Primitive Shader that makes me nervous that something is wrong. Maybe chip is broken?

I don't hope so because I like the work and ideas of AMD.

AMD is pretty silent about the most interesting features of their products nowadays.
They never talked SEV and SME up until the very EPYC launch day.

itsmydamnation · Oct 2, 2017

Just spit balling here

Could tiled based raster + high geometry throughput be far more game/diver specific then we think? Could Kepler's perceived drop off in performance in some part be Maxwell not using tiles very often initially but getting used more and more as driver development moved forward?

Maxwell also had more ALU:mtu:rop and clock then Kepler to help hide this, Vega just has clock.

Might amd only be targeting dx12 as well for these kinds of features?

It will be interesting to see Vega vs fury clock for clock over the next 12 months, dx11 and dx12.

3dilettante · Oct 2, 2017

Bondrewd said:
AMD is pretty silent about the most interesting features of their products nowadays.
They never talked SEV and SME up until the very EPYC launch day.

A search for AMD and memory encryption turned up references going as far back as an April 2016 whitepaper from AMD discussing those features.
Advance disclosure would be important for these since they plug into system and security infrastructure with long lead times, and they would need to be enabled by outside parties.

Unlike the CPU features, primitive shaders and the rasterizer primarily have only AMD as the responsible party for enabling them, and AMD has multiple pseudo-launch events and marketing salvos touting them.
It cannot un-blurt itself back into always being silent about them in the same manner as Nvidia when it went several product generations without marketing anything about its rasterization change.

roybotnik · Oct 2, 2017

Achieving 4k@60fps on the X1X seems like a good motivator for building well-optimized DirectX 12 game engines... If Forza is any indicator, it seems like the console could be an important 'fix' for Vega.

Deleted member 13524 · Oct 2, 2017

Just a little info tidbit of anecdotal experience:

During compute tasks, my reference air-cooled Vega 64 clocks at a nearly constant 1624MHz even if I'm using the Power Saving profile.
Here's my GPU-Z screenshot in the middle of a 3-hour run of using Radeon ProRender for a Solidworks animation:

This GPU is pumping out a constant 13.3 TFLOPs at ~200W (with a wall power meter, my system goes from 110W idle to 315 when I start the run).
AFAICT, the NCUs are very power efficient so if there's anything wrong with Vega it might be in the back-end (also thinking back at @Ryan Smith 's post where he guesses the hot spot in Vega 10 might be in the ROPs).

That said, I see no reason why the Radeon Instint Mi25 would run at any less than its 1.5GHz "peak values", and unless those 8-Hi stack are using an enormous amount of power in comparison to the 4-Hi stacks then I seriously doubt those cards will be using anywhere near 300W.

Grall · Oct 2, 2017

ToTTenTranz said:
Just a little info tidbit of anecdotal experience:

Interesting.

I'm also curious why GPU/mem clocks stick at their top values when GPU activity drops periodically. With AMDs advanced clock and voltage controls you'd expect clocks to follow load down as well as up...

Bondrewd · Oct 2, 2017

3dilettante said:
A search for AMD and memory encryption turned up references going as far back as an April 2016 whitepaper from AMD discussing those features.

I was referring to the very specific implementation in Zeppelin though.

3dilettante said:
It cannot un-blurt itself back into always being silent about them in the same manner as Nvidia when it went several product generations without marketing anything about its rasterization change.

They actually gave some vague poly throughtput (using NGG Fast Path that is) data in whitepaper. Just no clear details. At all.

ToTTenTranz said:
This GPU is pumping out a constant 13.3 TFLOPs at ~200W (with a wall power meter, my system goes from 110W idle to 315 when I start the run).
AFAICT, the NCUs are very power efficient so if there's anything wrong with Vega it might be in the back-end (also thinking back at @Ryan Smith 's post where he guesses the hot spot in Vega 10 might be in the ROPs).

It's either ROPs or front-ends.
I'm betting on front-ends.

3dilettante · Oct 2, 2017

ToTTenTranz said:
AFAICT, the NCUs are very power efficient so if there's anything wrong with Vega it might be in the back-end (also thinking back at @Ryan Smith 's post where he guesses the hot spot in Vega 10 might be in the ROPs).

The compute-specific products versus gaming seem to give a quarter to a third of their power budget to graphics and the more aggressive clocking.
If the ROPs aren't being exercised in a compute load, there's also a significant chunk of the CU texturing block, shader engine front ends, and additional CU activity related to those unexercised front ends.

One would think that if the L2 was serving the ROPs clients decently enough their power cost might have improved, given the alternative is to directly spill/fill over the DRAM bus. If not, perhaps there is some confounding factor at higher power/clock settings like fewer stalls, a difference in the L2's power contribution, or the ROPs not getting the same amount of physical optimization as the NCUs did.

Bondrewd said:
I was referring to the very specific implementation in Zeppelin though.

How specific does it need to get?
SME and SEV were cited specifically over a year ago, as was the structuring of their methods and implementation in the dedicated Security Processor.

They actually gave some vague poly throughtput (using NGG Fast Path that is) data in whitepaper. Just no clear details. At all.

The nature of the draw stream binning rasterizer and primitive shaders got multiple marketing slides and various internal figures based on internal estimates, and then a whitepaper and various statements/interviews by AMD engineers and staff over the course of six months.

If the contention is that we are to infer from AMD's current retreat into silence as a positive, Nvidia flat-out said nothing at all about the roughly equivalent feature change with Maxwell, and didn't say much for some time after outside testing outed the change.
To a more limited extent, Nvidia held off discussing delta compression for at least one product iteration, as part of the announcement was retroactive.

Deleted member 13524 · Oct 2, 2017

Grall said:
Interesting.

I'm also curious why GPU/mem clocks stick at their top values when GPU activity drops periodically. With AMDs advanced clock and voltage controls you'd expect clocks to follow load down as well as up...

This is rendering a scene and then encoding the frame into a video. The high clock during low GPU utilization might be because it's using the video encoder during those times.
Or the plugin could just be forcing those high clocks.

Digidi · Oct 2, 2017

The Situation with the deactivated features is realy curious. Why they advertise them when it's not working? When these features are not working in one year you can give back your Card because they made a promise which they can't ceep.

CarstenS · Oct 2, 2017

ToTTenTranz said:
This GPU is pumping out a constant 13.3 TFLOPs at ~200W (with a wall power meter, my system goes from 110W idle to 315 when I start the run).

Is it? I mean in the sense that Blender/ProRender does give you that number or did you calculate this from running at given clock speed? My Vega56 runs at 1430 MHz for XMR mining with 100% load and under 90 Watt GPU-only power draw and a 95 Watt delta before and after starting mining (it's not at idle clocks without mining due to "mining settings").

3dilettante · Oct 2, 2017

Digidi said:
The Situation with the deactivated features is realy curious. Why they advertise them when it's not working? When these features are not working in one year you can give back your Card because they made a promise which they can't ceep.

The whole process is of developing and launching a product is a pipeline of sorts. Each element from design, silicon, manufacturing, software, documentation, and marketing can have variable timelines and would add up significantly if various stages weren't allowed to run in parallel.
I'm presuming they expected that the features would be hammered into shape by various dates, and for various reasons they weren't in a state that made them broadly acceptable.
The DSBR is starting to peek out in some Linux patches and is cited specifically as being tuned for Raven Ridge, so the Windows effort is potentially ahead and may show a combination of difficulty and triage for the mobile market.

There are signs of piecemeal enablement of some of the features in specific scenarios, so it may be a matter of these optimizations not always being profitable to enable, or problems in making them work correctly when faced with a wider set of software than their preliminary tests would have covered.
Since they are mostly internal to the cards and by design shouldn't affect the output, it may be hard to argue that there's a specific thing a user would be missing since the reviewed performance figures are reflective of what the cards could do.

Other elements have been marketed and left off, from most Jaguar SOCs not having turbo to various AVFS elements being allegedly off across multiple APU and GPU products or not showing any clear benefit if they were on.

Anarchist4000 · Oct 2, 2017

Digidi said:
The Situation with the deactivated features is realy curious. Why they advertise them when it's not working? When these features are not working in one year you can give back your Card because they made a promise which they can't ceep.

They may be "working", but in need of significant tuning. So only enabled for very specific cases, but those cases are growing in number. So the features work, just not widely as mentioned above. And yeah, I'd say they hit some snags on the software side which happens. Doesn't mean the features won't show up, and some like primitive shaders were expected to take some work. They are more or less dev toys for experimenting.

AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Bondrewd

Samwell

Bondrewd

Anarchist4000

w0lfram

Digidi

entity279

Bondrewd

itsmydamnation

3dilettante

roybotnik

Deleted member 13524

Guest

Grall

Invisible Member

Bondrewd

3dilettante

Deleted member 13524

Guest

Digidi

CarstenS

Moderator

3dilettante

Anarchist4000