Some of Pascal's architectural features like the 49-bit address space and an automatic facility for page migration seem to be leading features where Vega is playing catch-up.In order to support larger data sets, you need to have a multi-tier memory system. Big chunk of DDR for storage and fast small HBM pool is a perfect solution for games, as long as you can page data from DDR to HBM on demand at low latency. Intel has already done this with their MCDRAM based cache on their Xeon Phi processors and Nvidia has P100 and V100 with hardware virtual memory paging. I am thrilled that this is actually happening in consumer space so quickly. Nvidia's solution is CUDA centric, and geared towards professional usage. I don't know whether their hardware could support automated paging on OpenGL, DirectX and Vulkan or whether it only supports the CUDA memory model (which doesn't use resource descriptors to abstract resources).
Looking at the difference between Pascal/Volta and Vega, is the following statement correct?
Pascal/Volta need OS support for each page swap into the GPU and Vega does not.
Some of Pascal's architectural features like the 49-bit address space and an automatic facility for page migration seem to be leading features where Vega is playing catch-up.
https://www.pcper.com/reviews/Graph...ecture-Preview-Redesigned-Memory-ArchitectureWith a total addressable memory space of 512TB in this new system a 49-bit address space, it is similar to the x86-64 address space of 48-bit (256TB). That leaves a lot of room for growth on the GPU memory area, even when you start to get into massive network storage configurations.
Polaris (edit: I think) added 49bit adress-space:
https://www.pcper.com/reviews/Graph...ecture-Preview-Redesigned-Memory-Architecture
I'm reading that as being in the context of HBCC and the 512TB address space. I haven't had much luck finding a Polaris-specific reference for this.
The ATC bit goes back further, as it's mentioned for Sea Islands in Table 8.5.
OS support and a process or service to handle the transaction. Vega would only need minimal OS support or awareness for certain capabilities where both interact. Security, virtualization, sharing pointers, configuration/layout, etc that are usually handled by the driver. Arbitrary reads of system memory aren't necessarily desirable and more a programming issue in any case, but would be possible.Looking at the difference between Pascal/Volta and Vega, is the following statement correct?
Pascal/Volta need OS support for each page swap into the GPU and Vega does not.
So this might sound like a bit of a strange question, but I just want to make sure my understanding is correct as I'm having a discussion about Vega on another forum where this has come up: With respect to the next generation geometry engines in Vega, it's my understanding that it is the shaders within the geometry engines themselves (which were all fixed function shaders in Fiji) that have been (mostly) replaced with programmable non-compute shaders that can be reconfigured to act as primitive shaders rather than their default behavior, and it is not at all the case that primitive shaders work by bypassing the geometry engines entirely and using the compute units to process geometry instead of using the geometry engines, right?
The ATC bit goes back further, as it's mentioned for Sea Islands in Table 8.5.
http://developer.amd.com/wordpress/media/2013/07/AMD_Sea_Islands_Instruction_Set_Architecture.pdf
I'm not clear if the 48-bit base and ATC bit correspond to the same situation as the 48-bit base and mapped tile bit. Potentially, the HBCC doesn't need a bit in the shader's context to manage the resource's placement in the overall memory space.
Having an explicit ATC bit as part of a resource's descriptor seems to be related to the inability of prior generations to autonomously manage memory straddling CPU and GPU pools, since the resource needs to know which side it's on. Rather than a 49-bit space managed in a common handler, it's two disparate 48-bit spaces.
As far as the hardware's capability goes, what the shader sees may not be representative of the virtualized resource.
I'm not sure if the primitive shaders (a shader is a program, not a physical hardware component) are running in the ALUs / shader processors that reside in the NCUs or if they're completely new units that go into the front-end, but I think they're using the NCUs. That being the case, what's new in Vega compared to older GCN are the bridges (bus and caches) created between the shader processors and the front-end (geometry processors).
My understanding is that the primary processing grunt of the shader engines CUs. Primitive shaders as they're presented do not replace the fixed-function elements of the geometry pipeline. They exist in addition to the standard pipeline, which already has some capability to be fed via compute
They aren't talking about processing being done within the geometry engines as shown on the Vega block diagram? I guess that means I should ask which stages of the rendering pipeline are normally done within the geometry engines?To highlight one of the innovations in the new geometry engine, primitive shaders are a key element in its ability to achieve much higher polygon throughput per transistor. Previous hardware mapped quite closely to the standard Direct3D rendering pipeline, with several stages including input assembly, vertex shading, hull shading, tessellation, domain shading, and geometry shading. Given the wide variety of rendering technologies now being implemented by developers, however, including all of these stages isn’t always the most efficient way of doing things. Each stage has various restrictions on inputs and outputs that may have been necessary for earlier GPU designs, but such restrictions aren’t always needed on today’s more flexible hardware.
The HBCC wouldn't generally be involved without a page fault. If it's operating in a shared mode, it would need to track what ranges might fault. Hardware-generated offsets like some of the wave-level base pointers may have implicit restrictions where the GPU will use its known-local address range.I also don't expect the HBCC controller being invoken on all and any address or descriptor presented to the GPU.
The motivation stated for 49-bit addressing by the vendors is for unified memory addressing where the GPU can access the full CPU range in addition to what it can address independently. If it wants to be generally capable of accessing from the host's 48-bit range, it would exhaust the address space of the GPU's paging system without additional space.CPU address space is 48bits anyway, so having a true unified address pool of 49bits is useless if the GPU can't potentially address more than 48bits itself, which I don't think is the case.
ATC is the implementation of an IOMMUv2 feature for heterogeneous memory access. It allows the GPU to interface with the host's page tables and cache translations. For protection, everything the GPU does with unified memory treats it as a virtual guest.There is the possibility that ATC is/was for Hyper-V or GPU virtualization, and it's in reality N virtual adress spaces (sequentially acessible, not simultaniously), N being the number of virtualized instances.
The only truly fixed-function elements in the diagram are in the solid dark gray blocks. The various VS/DS and GS elements are actually running on the CUs. Internally, some of those actually decomposed into different variants depending on whether of tessellation and geometry shaders were invoked.Okay, now I'm definitely confused. The Vega whitepaper talks about "Next Generation Geometry Engines" and says:
Very unlikely, they won't use more processes than they absolutely have to (because cost, you know), and their main producer will always be GloFo. Vega 20 will most likely be their first 7LP product from GloFo, somewhere around Q4/2018. Also it could be mostly (only?) a Pro product, because it may have 4 stacks of HBM2.
7LP, despite name sounding like it, isn't a lowpower process like 14LPx's are, it's "7 Leading Performance" aimed at high performance products (read: GPUs, big x86 cpus)Well if GF's solutions keep increasing the efficiency gap to TSMC and Samsung's equivalent processes then AMD had better find a way to avoid being dragged down by them. At least for their halo products (which Vega 20 should be).
7LP, despite name sounding like it, isn't a lowpower process like 14LPx's are, it's "7 Leading Performance" aimed at high performance products (read: GPUs, big x86 cpus)