Ike Turner
Veteran
Wut!
Wut!
While the "Nx" comparisons could be against multiple GPUs, the "bandwidth per pin" and "capacity/stack" comparisons are specifically against Fiji*. So perhaps the other comparisons are also against Fiji.Hmm, 4× the power efficiency is good, but compared to what?
Unfortunately, the words I wanted to see; name FL12_1 or FL12_2 were missing. Intel leading the pack here, 'and for some time' is really just down right confusing.
I hope so on CR.What's Vega NCU? Is it the same as "Next Generation Computer Engine"? Also is that a typo "computer engine"?
Also a draw stream binning rasterizer sounds interesting alongside "Next Generation Pixel Engine". Maybe conservative rasterization?
Hmm, 4× the power efficiency is good, but compared to what?
https://forum.beyond3d.com/threads/amd-marketing-tactics-spin.59834/Can we stop derailing this thread now?
Some guess NCU stands for Next Compute Unit. A binning rasteriser sounds like a match to Nvidia's tiled rasteriser, with "draw stream" probably meaning it isn't TBDR (?).What's Vega NCU? Is it the same as "Next Generation Computer Engine"? Also is that a typo "computer engine"?
Also a draw stream binning rasterizer sounds interesting alongside "Next Generation Pixel Engine". Maybe conservative rasterization?
Some guess NCU stands for Next Compute Unit.
The simplest guess without too much fanciness is a new cache hierarchy, which might be a complement with AMD's claim (reported by EETimes) of Vega utilising the same data fabric (NoC) as Zen SoCs do, which is said to be scalable from SoC uses (<40 GB/s) to beyond 512 GB/s (two HBM2 stacks).Any guesses as to why we have a bullet point for the cache and cache controller?
GP100 extends GPU addressing capabilities to enable 49-bit virtual addressing. This is large enough to cover the 48-bit virtual address spaces of modern CPUs, as well as the GPU's own memory.
This allows GP100 Unified Memory programs to access the full address spaces of all CPUs and GPUs in the system as a single virtual address space, unlimited by the physical mem ory size of any one processor
AMD kinda did this in GCN already though. There is an ATC bit in various descriptors that specifies whether the address is in GPUVM or in ATC (host address space through IOMMU).Yes, same reason why nVidia supports this with Pascal:
As opposed to Graphics Core Next? I'm still unsure on what exactly NCU is in reference too. It might be their command processor or something.Guess they wouldn't want Compute Unit NexT.
Using the same fabric was extremely likely as separate Naples and Zen dice would have to coexist on the same package. Cache controller might also be related to ESRAM, partitioning of cache and units, or off package storage like SSG.The simplest guess without too much fanciness is a new cache hierarchy, which might be a complement with AMD's claim (reported by EETimes) of Vega utilising the same data fabric (NoC) as Zen SoCs do, which is said to be scalable from SoC uses (<40 GB/s) to beyond 512 GB/s (two HBM2 stacks).
They were already leaning on the CPU's IOMMU unit for addressing with the ATC. Considering the amount of memory controllers in a GPU as opposed to CPU, it could be a pin/pad issue requiring additional routing. Additional bits possibly being used for additional addressing of virtual pools, encryption, CRC, etc. While not a lot, it could add up.512TB virtual address space presumably means there is one extra address bit (49 bits) in GPU's own VM hierarchy over GCN3 (48 bits). No idea why they would just bump up one bit though... Are they going to map the entire host virtual address space into the GPUVM and unify the address translation hierarchies (ATC/GPUVM), heh?
A system, method and a computer program product are provided for hybrid rendering with deferred primitive batch binning A primitive batch is generated from a sequence of primitives. Initial bin intercepts are identified for primitives in the primitive batch. A bin for processing is identified. The bin corresponds to a region of a screen space. Pixels of the primitives intercepting the identified bin are processed. Next bin intercepts are identified while the primitives intercepting the identified bin are processed.
Those possibilities you mentioned aren't likely covered by this one addressing bit, however. Let's say ESRAM, it is hard to imagine it not being virtualised behind the per-process virtual address space. For the LDS or scratch in the flat address space, they just need an aperture base pointer to remap from a full 64-bit flat address, and those addresses would not touch address translation at all (scratch memory would, but the address would be the resolved one). Encryption like Zen would be a bit in the physical address, not the virtual memory address.They were already leaning on the CPU's IOMMU unit for addressing with the ATC. Considering the amount of memory controllers in a GPU as opposed to CPU, it could be a pin/pad issue requiring additional routing. Additional bits possibly being used for additional addressing of virtual pools, encryption, CRC, etc. While not a lot, it could add up.
Not by the one bit, but they could have influenced the ability to add more bits. While 512TB is a lot of space, that's not a whole lot for the exascale systems. Might be something they could change for customers actually needing more than 512TB on a GPU in a single pool. The HMM would definitely be a possibility along the same lines as that compute wave save/restore.Those possibilities you mentioned aren't likely covered by this one addressing bit, however. Let's say ESRAM, it is hard to imagine it not being virtualised behind the per-process virtual address space. For the LDS or scratch in the flat address space, they just need an aperture base pointer to remap from a full 64-bit flat address, and those addresses would not touch address translation at all (scratch memory would, but the address would be the resolved one). Encryption like Zen would be a bit in the physical address, not the virtual memory address.
Perhaps it is a sign of supporting Linux HMM in its ideal form — flipping ATC/GPUVM at page granularity to support flexible hot migration of pages to the GPU local memory.