AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Thats not really the way vega operates, its more around how you "ask" it to consume power. Its "average" clock is very dynamic. but the difference in clock rate between ~175watt and 275watt normally isn't many mhz's but that can occur at any clock range depending on game/location.
Sure, using only frequency was a simplification. The whole picture is much more complicated. The question is, when you tune Vega10 to ~Fiji's power level, how much is the resulting perf. delta between these two?
 
Well if the software side still isn't fully hashed out, I doubt anyone would reasonably expect to measure huge gains in actual testing when those measurements were taken in 2016 I believe.
OTOH why would anyone reasonably expect that a secondary optimization would have more impact that the ones in the slides?

Perf and perf/W improvements of 10% for some cases are already quite impressive. GPUs have long passed the point where a single feature has an outsized impact.
 
OTOH why would anyone reasonably expect that a secondary optimization would have more impact that the ones in the slides?
Because if many of the mechanisms it relied upon weren't, and last I checked still haven't, been implemented, then it would be sub-optimal at best. The same could be said for async compute originally implemented with Hawaii. The hardware was there, but implementation largely non-existent until years later. Those original DSBR numbers would more than likely just be the intelligent workgroup distributor being enabled. The fetch and shade once may not even be a part of that. What bandwidth/power gains were achieved could easily be the result of higher cache utilization from the tiling. Out of order rasterization across tiles essentially.

Perf and perf/W improvements of 10% for some cases are already quite impressive. GPUs have long passed the point where a single feature has an outsized impact.
That doesn't mean a dev can't code a path slightly differently and break a feature. Or in the case of the FP16 mishap, completely disable the path until the compiler gets updated. TBDR has the potential to have a large impact if fully implemented. Especially for the mobile platforms where it is more common. Mobile Vega seems rather popular and the Intel deal also interesting, so Intel may know more than we do. That move is either Intel defending against Ryzen APUs or offensive against ARM mobile devices. The former seems odd with AMD as Intel doesn't stand to sell many more CPUs that way or benefit from GPU sales. Current products don't appear in direct competition either. AMD at low end with Raven, Intel low-mid with KabyG, and the mid tiers a bit of an unknown still, but I wouldn't be surprised if AMD had one for SIMD workloads with 4/8 memory channels feeding it. The offensive against ARM makes more sense, unless Intel just wants to stick it to Nvidia. So far Intel seems to want ultra-thin mobiles with higher graphics performance which neither AMD or Nvidia really compete. Back to the original point, that would require a bit more efficiency to keep processing power down. A more efficient DSBR would do that, along with the packed math as FP16 is more common with mobile.

Then there is the VGPR Indexing that last I checked was still disabled and would likely be associated with that virtual vector register file patent that was recently linked. The one with Koduri and Mantor listed as inventors is likely significant to one of the architectures. Having been already published would suggest Vega or perhaps a software/compiler implementation? We haven't seen any mGPU implementations either which would effectively multiply apparent cache for the purpose of binning.
 
Mobile Vega seems rather popular and the Intel deal also interesting, so Intel may know more than we do. That move is either Intel defending against Ryzen APUs or offensive against ARM mobile devices.
Yes, Intel may know more than we do. There were multiple hints the GFX IP inside Intel's chip is not a GFX9.x but the good old GFX8 + various tweaks. Was there any confirmation of a "Vega feature" besides HBCC? I mean NGG, DBSR, NCU, etc.?
 
Those original DSBR numbers would more than likely just be the intelligent workgroup distributor being enabled.
That would be unfortunate since AMD markets the work distributor as part of the new geometry path rather than the pixel engine that the DSBR is under.

TBDR has the potential to have a large impact if fully implemented. Especially for the mobile platforms where it is more common.
Is this back to equating the Draw Stream Binning Rasterizer to actual Tile-Based Deferred Rendering implementations?

Mobile Vega seems rather popular and the Intel deal also interesting, so Intel may know more than we do. That move is either Intel defending against Ryzen APUs or offensive against ARM mobile devices.
Given the alleged roadmap of Intel's going to an EMIB-based Gen 12 and 13, it's also possible that AMD's custom chip is meant to temporarily fill a gap in Intel's product line due to the 10nm blowup sinking Intel's internally-sourced graphics efforts for 1-2 major product cycles.
Outside competition would factor into this, but also Intel's need to get something out even versus itself.

Then there is the VGPR Indexing that last I checked was still disabled and would likely be associated with that virtual vector register file patent that was recently linked.
That would seem to run counter to the "virtual" component of the register file patent, and the timing appears off for it being applicable. VGPR indexing is software-visible, as it is used by the shader code--whose view of the register file is being spoofed by the virtual register file scheme.

The initial filing for what appears to have become the DSBR was 2013. There's a lot of games that could be played with when disclosures are filed, but there's a multi-year gap that does seem consistent with these two techniques being part of different designs.
 
Nowhere in there could I find an explanation for Raja saying Infinity is larger than necessary for gaming devices though. The implication would be it connects something that only shows up in server or compute more robustly. SSG, Infinity mesh for MI25 with Epyc, etc as shown in that presentation.
Try this interpretation:
It's overbuilt because with only so few clients, IF is not yet pushed to it's limits.
 
As an aside, it appears that the PS4 Pro is next up. The current shot is pretty blurry, although presumably a more straight-on shot is on its way. It seems like it was more difficult to etch down cleanly with a chip on this process (comments indicating GV100 is a goal?).

Perhaps a more clear shot would be of interest in the console forums. It's an interesting example of the layout games that can be played. There are at least three CU variations from what I can see so far. There's been a bit of speculation as to the ROP count for the Pro, although I don't know if this shot would be sufficient to settle it given the willingness to vary blocks for area or other reasons, and some custom hardware. There are some borrowed elements from Polaris and Vega somewhere in there.
The chip.
Fritzchens Fritz did a follow up with clear PS4 Pro shots:

There is the open question if the PS4 Pro has 64 ROPs because it has four Shader Engines and deactivates two for backwards compatibility.

----

But speaking of pure Vega, according to Marc Sauter (y33H@) from the german IT site Golem, AMD said in a breakout session shortly before the CES2018 that the implicit driver path for primitive shaders was cancelled and it will be only up to developers with explicit control to make use of it.
Now the waiting game shifts to when AMD will provide direct control for developers and after that, when a game will actually utilize it.
https://www.forum-3dcenter.org/vbulletin/showthread.php?p=11610696#post11610696
https://www.forum-3dcenter.org/vbulletin/showthread.php?p=11611522#post11611522
 
Too complicated IMO... So, with Primitive Shaders out, what's left to enable ? Is FP16 working in compute and pixel shaders ? DSBR ? NGG fast path ? Or nothing and Vega is performing like an OC Fiji and that's it ?
 
Large as in 10%? I consider that large. And that’s what AMD sees in some cases.

I think your performance improvement expectations of some individual features are wildly optimistic.
Really depends on the game with more potential in lesser optimized or immediate mode titles. The improvements are based on how much overdraw currently exists in various titles. Not to mention the bandwidth savings Nvidia likes to tout with their compression or more appropriately, better culling implementation. Maxwell got a bit more than 10% there.

Until FP16/RPM is more widely functional and primitive shaders become documented, I wouldn't consider the features fully functional. Just working at a limited capacity and not necessarily synergizing. RPM should improve primitive shaders, PS improve culling, culling improving binning with less clutter, binning limiting overdraw and fragment dispatch. It just seems there is a lot of software work to be done still.

Is this back to equating the Draw Stream Binning Rasterizer to actual Tile-Based Deferred Rendering implementations?
Not equating, but moving much of the culling/overdraw savings from fragment shaders into the primitive pipeline as earlydepthstencil in less than optimal rendering orders. Deferred Draw Stream Binning Rasterizer with Tile Based Rendering might make more sense. With async compute, having an execution gap between the primitive pipeline and fragment shaders is less of an issue. Only catch is needing async compute, which is still somewhat limited. It's possible the effects only work well with DX12/Vulkan, but my thinking is a new toolchain is required and that's the huge rewrite that is still occurring. At least the public commits aren't what I would call stable with major functionality being added. A couple of months ago even FP16 wasn't working across the entire product stack.

Given the alleged roadmap of Intel's going to an EMIB-based Gen 12 and 13, it's also possible that AMD's custom chip is meant to temporarily fill a gap in Intel's product line due to the 10nm blowup sinking Intel's internally-sourced graphics efforts for 1-2 major product cycles.
Outside competition would factor into this, but also Intel's need to get something out even versus itself.
That's possible, but wouldn't necessarily explain AMD not attaching a similar chip to Ryzen. AMD appears to have made APUs smaller and possibly larger, but not in direct competition. An 8 core Ryzen with 32 CU Vega and HBM2 and big 120mm cooler would dominate right now. In part because discrete parts became scarce.

That would seem to run counter to the "virtual" component of the register file patent, and the timing appears off for it being applicable. VGPR indexing is software-visible, as it is used by the shader code--whose view of the register file is being spoofed by the virtual register file scheme.

The initial filing for what appears to have become the DSBR was 2013. There's a lot of games that could be played with when disclosures are filed, but there's a multi-year gap that does seem consistent with these two techniques being part of different designs.
Not counter as much as attacking the problem from different angles. VGPR spilling technically allows the larger register file size, just with unacceptable performance in most cases. The virtual RF would address that with a renaming and paging mechanism that should be transparent to the shader or DSBR model. It would be transparent to the original design as it would be on par with simply providing a larger cache or register file and relaxing the bin size requirements. Only begin raster on a bin when hitting a context limit, running out of geometry, or hinting from a prior frame all geometry is present. Actual bin size would be more complex to model as register pressure could vary significantly based on the shader.

Try this interpretation:
It's overbuilt because with only so few clients, IF is not yet pushed to it's limits.
That would imply IF is a fixed configuration and there are nodes on the network that don't attach to anything or are only active in server/pro scenarios. That could be the case if physically dividing the CUs into separate virtual hardware devices, but that runs counter to what AMD has been advertising. Where the ACEs allow load-balancing many clients in a secure fashion. That's why I think the network was enlarged to accommodate additional IO for Vega10 in server/pro parts. Extra space in the form of larger/additional PHYs with internal routing for growing the network like Epyc. 32 PCIe lanes on a gaming part would be largely wasted, but practical on SSG, duo, or APU if using the same part.
 
Yes, Intel may know more than we do. There were multiple hints the GFX IP inside Intel's chip is not a GFX9.x but the good old GFX8 + various tweaks. Was there any confirmation of a "Vega feature" besides HBCC? I mean NGG, DBSR, NCU, etc.?
All the literature says "Vega", so GFX9.x seems more likely. The overview mentions "Vega Pixel Engine" along with HBCC, so it should be full Vega. Pixel Engine in the whitepaper covers DSBR at least, so more than likely it's all Vega.
 
That would imply IF is a fixed configuration and there are nodes on the network that don't attach to anything or are only active in server/pro scenarios. That could be the case if physically dividing the CUs into separate virtual hardware devices, but that runs counter to what AMD has been advertising. Where the ACEs allow load-balancing many clients in a secure fashion. That's why I think the network was enlarged to accommodate additional IO for Vega10 in server/pro parts. Extra space in the form of larger/additional PHYs with internal routing for growing the network like Epyc. 32 PCIe lanes on a gaming part would be largely wasted, but practical on SSG, duo, or APU if using the same part.
No, it would only imply that IF has a certain amount of overhead that starts to amortize only beyond the requirement and current count of the clients.
 
Really depends on the game with more potential in lesser optimized or immediate mode titles. The improvements are based on how much overdraw currently exists in various titles. Not to mention the bandwidth savings Nvidia likes to tout with their compression or more appropriately, better culling implementation. Maxwell got a bit more than 10% there.
AMD said at multiple occasions that DSBR is only useful for SKUs with limited resources, and now their own testing confirms it. At Ultra settings, most games gain about 5%, or less, and likely in very specific scenarios too. And now with the cancellation of driver primitive shaders, I think it's time you abandon your theory of 30% more performance than a TitanXP through unicorn drivers. It was never a good theory to begin with. Writing was all over the wall that Vega was missing several things when the features were never enabled at launch or a few months after.
 
Wut?
So no primitive shaders support then.

Why?
How do you read it like that? The support is (will be) there if the dev decides to build the game using them, it's just not the first advertised automatic conversion from vertex+geometry or whatever
 
Fritzchens Fritz did a follow up with clear PS4 Pro shots:

There is the open question if the PS4 Pro has 64 ROPs because it has four Shader Engines and deactivates two for backwards compatibility.

----

But speaking of pure Vega, according to Marc Sauter (y33H@) from the german IT site Golem, AMD said in a breakout session shortly before the CES2018 that the implicit driver path for primitive shaders was cancelled and it will be only up to developers with explicit control to make use of it.
Now the waiting game shifts to when AMD will provide direct control for developers and after that, when a game will actually utilize it.
https://www.forum-3dcenter.org/vbulletin/showthread.php?p=11610696#post11610696
https://www.forum-3dcenter.org/vbulletin/showthread.php?p=11611522#post11611522

This is the first time I've ever used this. (╯°□°)╯︵ ┻━┻ This seems like gross mismanagement of RTG by Raja. PS stated as dev tool (Raja), to PS as driver magic(Raja), now back to the original plan of PS being dev tool(Su).
 
Raja is whatever, why did @Rys said it was driver magic?
Because it was supposed to be...

I guess it doesn't make much sense to spend resources on that now because few gamers have Vega cards and no gamer is buying AMD cards to play games, much less Vega chips.

It does bother me that driver development for games is seemingly decelerating, but high-end PC gaming as a whole is actually dying, and it might die really fast.
I wonder what will happen to PC game sales after a year of severe drought of performance graphics cards in the shelves.


The IHVs need to come up with a solution fast. AMD needs to move up the launch of high-performance gaming APUs with HBM as much as they can.
 
Back
Top