With async compute, having an execution gap between the primitive pipeline and fragment shaders is less of an issue.
My impression was there's still a direct feed from the primitive pipeline into the pixel shader stage. I'm not sure where the compute shader would interject.
That's possible, but wouldn't necessarily explain AMD not attaching a similar chip to Ryzen.
AMD's offering a discrete mobile Vega that appears roughly in the same range as the Intel custom chip.
Intel's EMIB implementation is riding a curve of manufacturing grunt, volume, and profitability that AMD is not positioned to match.
An 8 core Ryzen with 32 CU Vega and HBM2 and big 120mm cooler would dominate right now. In part because discrete parts became scarce.
It would be a niche product, which AMD seems to be de-prioritizing in favor of lower-hanging fruit in the form of Ryzen in its 8-core form or existing MCM CPU products. An APU would take on a significant portion of the up-front cost of a new package format while the GPU tied to an 8-core makes it less desirable for most markets either the GPU or CPU would otherwise be sold to.
Commercial customers would be less likely to care for the overkill GPU, while the GPU is rather low-tier for rigs that might want an 8-core. The larger amount of silicon would compromise its power characteristics for mobile.
The niche it might fit in is dominated by Intel, and with the custom product AMD is making money in that niche with Intel taking on most of the hassle.
Not counter as much as attacking the problem from different angles. VGPR spilling technically allows the larger register file size, just with unacceptable performance in most cases. The virtual RF would address that with a renaming and paging mechanism that should be transparent to the shader or DSBR model.
Indexing doesn't spill registers, and the overall register file is not really growing much or possibly shrinking if the virtual register scheme is implemented.
The DSBR doesn't have visibility on register addressing or spilling anyway, it's not in that part of the chip.
It would be transparent to the original design as it would be on par with simply providing a larger cache or register file and relaxing the bin size requirements.
Not if the sizing is dominated by other resource limits and fixed-function pipeline granularity, which the various patches indicate is the case.
Register usage is a CU occupancy constraint, which is highly variable to the primitive count, surface format, and sample count that the binning logic seems concerned with. Whether a bin is larger or smaller doesn't strongly correlate to the CU occupancy level, all else being equal. One bin with X wavefronts needed would if subdivided generally give N*X bins each needingX/N wavefronts--barring potentially redundant work for items spanning bins.
Extra space in the form of larger/additional PHYs with internal routing for growing the network like Epyc. 32 PCIe lanes on a gaming part would be largely wasted, but practical on SSG, duo, or APU if using the same part.
We have pictures of Vega, which don't seem to show extra IO (edits: besides some SATA, probably). PHYs don't do internal networking.
How do you read it like that? The support is (will be) there if the dev decides to build the game using them, it's just not the first advertised automatic conversion from vertex+geometry or whatever
Perhaps AMD would be publishing documentation on them, as well as what it would take to expose them.
Part of the earlier discussion of the internal driver path was that AMD hadn't figured out how it could give devs the chance to use them, and it's ominous if the company that best knows how to wrangle the architecture reversed its position after other engineers indicated it was a serious pain to use (for a driver that was supposedly almost always good at generating primitive shaders???).