AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Ethatron · Sep 2, 2017

I made a diff in Photoshop of the screenshots, and I see no Edge AA. Regardless, even if we find 1 clear undisputable MSAA edge, I would say the result totally not deserves the feature be called "MSAA on".

Malo · Sep 3, 2017

The next step would be to compare it on Nvidia, or even a 480.

DavidGraham · Sep 3, 2017

Malo said:
The next step would be to compare it on Nvidia, or even a 480.

I will do a comparison tomorrow on my 1070, after I get back home.

dirtyb1t · Sep 4, 2017

My first post ^_^.
Considering that vega is a compute first platform and judging it based on this, where are the full run of OpenCL, Rendering, and Deep Learning benchmarks?
I would also like to dig far more deeper into what exactly HBCC is and what its capabilities are. Is anyone aware if HBCC is exposed to compute APIs? What makes it different from pinned memory that gets DMA'd back and forth in systems memory which is available on Nvidia consumer cards 1050-1080? Going through the various flavors of cards on Radeon's website, you have the Pro, Instinct, and the RX Vega. To highlight a feature that they have gimped on the RX Vegas, only the Pro/Instinct cards list RDMA functionality. It is not listed for the $1,000 Vega Frontier Edition or the RX Vega 56/64. So, for compute, what makes RX Vega (based on a compute first architecture) a standout card vs. a 1070/1080/1080ti ? I don't see it but would love to dive into the technical details if someone does.

I feel like Vega as an architecture is being advertised w.r.t their top of the line Pro card's features whereas these features don't translate across their product line.
I mean, will RX Vega even have AMD DirectGMA (allowing two cards within the same chassis to communicate and coordinate)? Correct me if I'm wrong but even Consumer line GTX cards have this functionality...
Radeon is marketing Vega as a compute first platform but I keep seeing feature parity of less features when I compare RX Vega (56/64) to Nvidia's GTX cards.

Hope this is a good first post and hope I can contribute something to this community in the times ahead.

DavidGraham · Sep 4, 2017

Ok ran the game on my 1070, used the internal benchmark at 1080p & 1440p, DX11, max settings with Temporal AA:

Performance at 1080p:
0X MSAA 60 fps
2X MSAA 45 fps
4X MSAA 32 fps
8X MSAA 16fps

Performance at 1440p:
0X MSAA 42 fps
2X MSAA 30 fps
4X MSAA 20 fps
8X MSAA 9 fps

Sample @1440p 8XMSAA no TAA:

https://imgur.com/a/hGhD8

Sample @1440p 8X MSAA + TAA:

https://imgur.com/kyPlBK1

sebbbi · Sep 4, 2017

3dcgi said:
The rate at which a GPU performs coverage (depth) testing is limited by its rasterization rate though GPUs have different rates for depth than color. Depth rates are typically 2x or 4x faster than color.

Also it's worth noting that depth-only fill rate can be different than MSAA sample rate. For example Xbox 360 has 2x depth-only fill rate, but 4x MSAA sample rate. Thus on Xbox 360, you can enable 4xMSAA with no loss of fill rate. But you don't have 4x depth-only fill rate for shadow map rendering. You can however render shadow maps at 2x2 lower resolution with 4xMSAA, but that gives you jittered result (as MSAA samples aren't in ordered grid). Similar trick can be used on modern consoles, and is especially handy with custom MSAA sampling pattern (ordered grid). However rendering at 2x2 lower res with 4xMSAA makes pixel shading quads 2x2 larger. This is because each quad is constructed from the same sample of 2x2 pixel neighborhood (not 2x2 samples of one pixel). Thus quads become 2x2 larger in screen space. This results in more quad overshading because of small triangles. Also the memory layout with MSAA differs from 2x2 higher resolution image (this is both a negative and a positive thing, depending of your use case).

Vega lots of changes regarding to rasterization. If there's MSAA performance bottleneck, it is hard to know the exact reason. MSAA increases render target BW and footprint. It might be that the new tiling rasterizer isn't as efficient with MSAA (smaller tiles fit). It might be that the new ROP caches under L2 are a liability (trash L2 with high MSAA). It might be that the new DCC modes offer lower MSAA compression rate (and this cost overwhelms the advantages of being able to avoid decompress + sample compressed). Or it might be simply that the game tested needs to perform lots of other extra work when MSAA is enabled. MSAA doesn't only affect geometry rendering pass anymore. With deferred rendering and modern HDR post processing pipeline, MSAA is much more complex to support. Efficient MSAA is developers responsibility now, and not all games get it right. MSAA is often added as brute-force ultra PC setting for those people who own top tier GPUs. Do we have results from multiple games (both forward rendered and deferred), or is this result from a single game?

Arzachel · Sep 4, 2017

Jawed said:
This is prolly why some journalists decided simply to troll and hope it goes viral on various Vega-related topics: way less effort and it might just result in a meaningful response - and boy look at that advertising revenue. The entitled millennial generation has infected nearly everyone else, it seems.

GPU rumour/clickbait sites are as old as consumer GPUs. Instead of ranting about straw-milennials, maybe get a fidget spinner to calm your nerves

sebbbi · Sep 4, 2017

dirtyb1t said:
So, for compute, what makes RX Vega (based on a compute first architecture) a standout card vs. a 1070/1080/1080ti ? I don't see it but would love to dive into the technical details if someone does.

Vega has some advantages, but none of them are seen in current games or current software. It's hard to predict how much these new features will matter in the close future (when Vega is relevant).

For gaming, the biggest change in Vega is full DirectX feature level 12_1 support. Both Intel and Nvidia had already released two GPU generations with these features. These features allow implementing more efficient algorithms for many rendering problems. Now that AMD also supports these features, we can expect more game devs to start adding support to their games. AMD has slightly higher DX12 feature level support than Nvidia regarding to some aspects. Resource heap tier 2 (vs 1), conservative raster tier 3 (vs 2) and support for stencil ref from PS. My prediction is however that none of these slight advantages will have an actual impact during the Vega life time. But the common 12_1 features (that are supported by all IHVs) will certainly be used in some games, and that's when Vega will prove to be highly superior compared to previous AMD cards. There are also some other new gaming related features (such as primitive shaders) that are not yet exposed to developers. Whether these features will be used during Vega life time (on PC) remain to be seen. Evolution of PC graphics APIs hasn't been that fast lately. I don't expect big changes that only benefit one IHV. Consoles could of course be a good proving ground, but it will take some time until Vega base GPUs reach consoles.

For compute, Vega has double rate fp16 support. This is handy for some compute problems, such as deep learning. Nvidia's consumer cards don't have this feature (it's limited to professional P100 and V100 + mobile). Vega also has a brand new CPU<->GPU memory paging system. The GPU on-demand loads data from DDR4 to HBM2 at fine granularity, when a page is accessed. This allows 8 GB Vega to behave like a GPU with much larger memory pool, assuming that the memory accesses are sparse. This is a common assumption in games and also in many professional software. Usually games only access a tiny fraction of their total GPU memory per frame and the accessed data changes slowly (smooth camera movement and smooth animation). This could be a big game changer if games start to require more than 8 GB of memory soon. If a 8 GB Vega behaves like a 16 GB card with this tech, it could greatly extend the life time of the card. The same is of course true for many compute problems. However Nvidia Pascal also has a similar GPU virtual memory system for CUDA applications, but their tech doesn't work for games. CUDA has an additional page hint API to improve the efficiency of the automated paging system. Developer can tell the system to preload data that is going to be accessed soon. Whether or not the automated paging system will be relevant during Vega life time depends on software and games you are running. Xbox One X console with 12 GB memory (and a 24 GB devkit) is coming out very soon. Maybe we need more than 8 GB on PC soon. Let's get back to this topic then.

Currently the appeal of Vega is mostly forward looking. There's lots of new tech. AMD was lagging behind the competition before, but now they have a GPU that has lots of new goodies, but no software is using them yet. It remains to be seen whether AMDs low PC market share is a roadblock for developers to adapt these techniques, or whether AMDs presence in the consoles helps AMD to get high enough adoption for their new features.

dirtyb1t · Sep 4, 2017

Wow Sebbi, thank you for the detailed reply !

sebbbi said:
For compute, Vega has double rate fp16 support. This is handy for some compute problems, such as deep learning. Nvidia's consumer cards don't have this feature (it's limited to professional P100 and V100 + mobile).

Really looking forward to probing this feature on the compute side once it becomes more accessible in software. This really is a big deal so I hope Radeon does it justice and allows full raw access and I hope the performance numbers match !

sebbbi said:
Vega also has a brand new CPU<->GPU memory paging system. The GPU on-demand loads data from DDR4 to HBM2 at fine granularity, when a page is accessed. This allows 8 GB Vega to behave like a GPU with much larger memory pool, assuming that the memory accesses are sparse. This is a common assumption in games and also in many professional software. Usually games only access a tiny fraction of their total GPU memory per frame and the accessed data changes slowly (smooth camera movement and smooth animation). This could be a big game changer if games start to require more than 8 GB of memory soon. If a 8 GB Vega behaves like a 16 GB card with this tech, it could greatly extend the life time of the card. The same is of course true for many compute problems. However Nvidia Pascal also has a similar GPU virtual memory system for CUDA applications, but their tech doesn't work for games. CUDA has an additional page hint API to improve the efficiency of the automated paging system. Developer can tell the system to preload data that is going to be accessed soon. Whether or not the automated paging system will be relevant during Vega life time depends on software and games you are running. Xbox One X console with 12 GB memory (and a 24 GB devkit) is coming out very soon. Maybe we need more than 8 GB on PC soon. Let's get back to this topic then.

So, is the big difference between Vega and Nvidia that Radeon allows for this paging system to work in games? Otherwise, what is this technically? regular pinned memory w/ paging/page management? Or is this more like the DirectGMA feature that was found exclusively on FirePro cards? I hope this becomes more detailed on Sept 13/14th when the ProSSG/Instinc cards launch. I'm really at a loss for why these heavily marketed features haven't gotten a proper technical walk through by Radeon to make clear what this hardware is and is capable of.

sebbbi said:
Currently the appeal of Vega is mostly forward looking. There's lots of new tech. AMD was lagging behind the competition before, but now they have a GPU that has lots of new goodies, but no software is using them yet. It remains to be seen whether AMDs low PC market share is a roadblock for developers to adapt these techniques, or whether AMDs presence in the consoles helps AMD to get high enough adoption for their new features.

Yeah, I'm on the development side so all I need is proper drivers/api's/access. That being said, as you state, a good amount of software/drivers/tools need to be available before this can begin.
Please provide any detail about HBCC that you can. I really want to understand what's behind the marketing hype asap !

sebbbi · Sep 4, 2017

dirtyb1t said:
Yeah, I'm on the development side so all I need is proper drivers/api's/access. That being said, as you state, a good amount of software/drivers/tools need to be available before this can begin.
Please provide any detail about HBCC that you can. I really want to understand what's behind the marketing hype asap !

I am an ex-Ubisoft senior rendering lead, and I have been working with virtual texturing systems since Xbox 360. This is one of the key areas of my interest. We almost beat id-software's Rage to be the first fully virtual textured game. I am really interested about these new hardware virtual memory paging systems. It is mind blowing how quickly the GPU data sets have increased recently. One additional texture mip level increases memory footprint by 4x. But the highest mip level is only accessed by the GPU when the texture is very close to the screen, and usually only a small part of the texture is accessed at highest level. Paging this data on-demand will be even more important in the future. GPU memory needs to get faster to scale up with the computational increases, but in order for the memory to be fast enough, it needs to be close to the chip, meaning that the memory size can't scale up infinitely. In order to support larger data sets, you need to have a multi-tier memory system. Big chunk of DDR for storage and fast small HBM pool is a perfect solution for games, as long as you can page data from DDR to HBM on demand at low latency. Intel has already done this with their MCDRAM based cache on their Xeon Phi processors and Nvidia has P100 and V100 with hardware virtual memory paging. I am thrilled that this is actually happening in consumer space so quickly. Nvidia's solution is CUDA centric, and geared towards professional usage. I don't know whether their hardware could support automated paging on OpenGL, DirectX and Vulkan or whether it only supports the CUDA memory model (which doesn't use resource descriptors to abstract resources). AMD has a product right now in the consumer space that works with existing software, so that's really good news. This kind of tech seems to work in practice also for gaming.

We still don't have good benchmarks to analyze how well AMDs HBCC works in games. Mostly that's because there's still no games that would need more than 8 GB of GPU memory. Maybe there will be some 4 GB Vega 11 models (it's AMDs lower end Vega chip) released soon, allowing testing in more memory constrained scenarios before games before demanding enough to overcommit the 8 GB Vega.

itsmydamnation · Sep 4, 2017

I am not at home right now but in heaven 4.0 with vega I tested msaa and in it 8x only sees about a 30% drop vs no aa. It also seems like it isn't smashed by tessellation like older amd products. Forcing tessellation factor to 2x only makes a very small different vs application preference.

Rootax · Sep 4, 2017

I'm so f... weak. I told myself "My Fury X is good enough, even if some games I love are not very happy with only 4gb"... I had a very good deal on a Vega FE air cooled (approx. same price as RX56, around 450euros, brand new, never opened) so I took it and ordered a EK waterblock... New driver for FE are supposed to be out on september 13, with the ability to run RX drivers too (The presentation is not very clear if it's only for pro cards or not https://pro.radeon.com/en-us/announ...tion-vega-based-radeon-professional-graphics/) , so, I guess I'm in... I'll have to drain my loop but... Oh and I hope the RX drivers will see the 16gb of vram.

Rasterizer · Sep 4, 2017

sebbbi said:
Vega lots of changes regarding to rasterization. If there's MSAA performance bottleneck, it is hard to know the exact reason.

Thanks for the comments, though they are still a fair bit beyond my level of understanding of this stuff. I understand that there can be lots of reasons MSAA performance could be bottlenecked, but I was in a sense asking the opposite question: IF a GPU has a rasterization rate bottleneck, would such a bottleneck in turn be likely tank MSAA performance?

itsmydamnation said:
I am not at home right now but in heaven 4.0 with vega I tested msaa and in it 8x only sees about a 30% drop vs no aa. It also seems like it isn't smashed by tessellation like older amd products. Forcing tessellation factor to 2x only makes a very small different vs application preference.

I would be very eager to see whatever results you can contribute for Vega in Unigine Heaven 4.0 at differing levels of MSAA and tessellation, whenever you have time. Here are the results I've already seen, which appear to correlate with Vega's underwhelming performance in known tessellation heavy games like GTA V and Watch Dogs 2, as well as Vega's MSAA issues:

Anarchist4000 · Sep 4, 2017

dirtyb1t said:
Yeah, I'm on the development side so all I need is proper drivers/api's/access. That being said, as you state, a good amount of software/drivers/tools need to be available before this can begin.
Please provide any detail about HBCC that you can. I really want to understand what's behind the marketing hype asap !

There isn't much to it from a programming perspective. Think of it like programming for L3 on a CPU. It should occur transparently, just a matter of system configuration and perhaps some cache priming.

dirtyb1t said:
So, is the big difference between Vega and Nvidia that Radeon allows for this paging system to work in games? Otherwise, what is this technically? regular pinned memory w/ paging/page management? Or is this more like the DirectGMA feature that was found exclusively on FirePro cards? I hope this becomes more detailed on Sept 13/14th when the ProSSG/Instinc cards launch. I'm really at a loss for why these heavily marketed features haven't gotten a proper technical walk through by Radeon to make clear what this hardware is and is capable of.

Pinned memory with improved granularity and page management. Being able to intelligently evict pages is significant. It works in games as it's transparent and more efficient depending on how it ties in with MMUs for moving data. It should be able to request a page from the memory controller without involving a process. The controller would see it as another CPU accessing data, so less latency and overhead.

DirectGMA was a direct transfer between devices. With linked adapters and some configuration it should still be there. How it interacts with HBCC will depend on implementation as HBCC VRAM probably shouldn't be mapped directly. Resources should be backed by system memory/SSG/SAN, but VRAM partitioned so part of it may be mapped. Think 2GB(framebuffer, shared, etc) VRAM with a 6GB HBCC victim cache.

More documentation would be helpful, but I think AMD is still tweaking the implementation based on Linux commits. At the very least there is added overhead with smaller page sizes as huge pages yielded significant(~10%) improvements on Linux.

Deleted member 13524 · Sep 4, 2017

Rootax said:
I had a very good deal on a Vega FE air cooled (approx. same price as RX56, around 450euros, brand new, never opened)

And here I thought I had made a good deal on my RX V64...

DavidGraham · Sep 4, 2017

Vega still shows signs of early CPU limitation:

One potential fly in the ointment concerns AMD's DirectX 11 driver - we swapped out GTX 1080 Ti for a Vega 64 and found that the Pentium G4560's lock on 60fps could drop to the mid-40s in the heaviest scenes, a scenario you're likely to encounter with Core i3 chips too. In this case, the CPU requirement for 60fps Destiny 2 increases to the Core i5 level when using an AMD card.

http://www.eurogamer.net/articles/digitalfoundry-2017-what-does-it-take-to-run-destiny-2-at-1080p60

gamervivek · Sep 4, 2017

GTA V isn't that tessellation heavy, afaik it only works on some trees. Deus Ex doesn't look like a 2016 game, I thought the screenshots were from an old game only put in for comparison purposes.

DavidGraham said:
Vega still shows signs of early CPU limitation:

http://www.eurogamer.net/articles/digitalfoundry-2017-what-does-it-take-to-run-destiny-2-at-1080p60

Should've done some tests with ryzen as well, what differences could arise with various cpu/gpu combinations.

Malo · Sep 4, 2017

DavidGraham said:
Vega still shows signs of early CPU limitation

Well I doubt AMD have done any serious work on their DX11 driver just for Vega. I don't see any reason why this would have changed.

gamervivek · Sep 4, 2017

Vega does do better than what you'd expect with a CPU bottleneck, TPU has Fury X inching closer to Vega cards as the resolution increases rather than the other way round.

roybotnik · Sep 5, 2017

Rootax said:
I had a very good deal on a Vega FE air cooled (approx. same price as RX56, around 450euros, brand new, never opened) so I took it and ordered a EK waterblock... New driver for FE are supposed to be out on september 13

That's a steal! Very nice

. Long-term, it shouldn't be any different from having a RX card, just with the option of using pro drivers.

Those new pro drivers can't come soon enough. I bought my FE at launch, and after 2 months the only options other than launch day drivers are the buggy beta drivers, which are missing the 'gaming mode' toggle... Which is crucial since wattman is pretty much a necessity at the moment.

AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Ethatron

Malo

Yak Mechanicum

DavidGraham

dirtyb1t

DavidGraham

sebbbi

Arzachel

sebbbi

dirtyb1t

sebbbi

itsmydamnation

Rootax

Rasterizer

Anarchist4000

Deleted member 13524

Guest

DavidGraham

gamervivek

Malo

Yak Mechanicum

gamervivek

roybotnik