AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Ike Turner · Jan 2, 2017

gamervivek said:
Somebody got this off the site before AMD 403'd it.

http://imgur.com/a/5vdAd

Wut!

iMacmatician · Jan 2, 2017

Alexko said:
Hmm, 4× the power efficiency is good, but compared to what?

While the "Nx" comparisons could be against multiple GPUs, the "bandwidth per pin" and "capacity/stack" comparisons are specifically against Fiji*. So perhaps the other comparisons are also against Fiji.

I think this assumption is consistent with existing rumors and information:

"2x Peak Throughput per Clock": makes sense from the double rate FP16.
"4x Power Efficiency": The Radeon Instinct MI25 has 25 FP16 TFLOPS and a < 300 W TDP, which gives > 2.7x compared to the Fury X. I've seen a rumored 230 W TDP for some Vega 10 part, which gives 3.5x using the same TFLOPS value, so the 4x number could make sense.

Exactly 4x the FP16 GFLOPS/W of Fiji in ~230 W would give ~29 TFLOPS. It's possible that consumer Vega has higher clock speeds than the Instinct MI25. (The 4x number could also have been rounded up.)

* Or a non-AMD GPU, presumably, but I think Fiji makes the most sense.

seahawk · Jan 2, 2017

Could just be the power efficiency of HBM2 compared to GDDR5 per bandwith. They give HBM 3x times the efficiency, so HBM2 could easily be 4 times.

iroboto · Jan 2, 2017

gamervivek said:
Somebody got this off the site before AMD 403'd it.

http://imgur.com/a/5vdAd

Unfortunately, the words I wanted to see; name FL12_1 or FL12_2 were missing. Intel leading the pack here, 'and for some time' is really just down right confusing.

Infinisearch · Jan 2, 2017

What's Vega NCU? Is it the same as "Next Generation Computer Engine"? Also is that a typo "computer engine"?
Also a draw stream binning rasterizer sounds interesting alongside "Next Generation Pixel Engine". Maybe conservative rasterization?

Ike Turner · Jan 2, 2017

Can we stop derailing this thread now?

iroboto · Jan 2, 2017

Infinisearch said:
What's Vega NCU? Is it the same as "Next Generation Computer Engine"? Also is that a typo "computer engine"?
Also a draw stream binning rasterizer sounds interesting alongside "Next Generation Pixel Engine". Maybe conservative rasterization?

I hope so on CR.

Deleted member 13524 · Jan 2, 2017

Alexko said:
Hmm, 4× the power efficiency is good, but compared to what?

I'll guess the previous top-end solution, meaning Fiji in Fury X form.

Though it could/should be closer to 3x Hawaii's efficiency in actual performance.

AlNom · Jan 2, 2017

Ike Turner said:
Can we stop derailing this thread now?

https://forum.beyond3d.com/threads/amd-marketing-tactics-spin.59834/

pTmdfx · Jan 2, 2017

Seem like a new cache hierarchy and a new rasteriser are coming.

Infinisearch said:
What's Vega NCU? Is it the same as "Next Generation Computer Engine"? Also is that a typo "computer engine"?
Also a draw stream binning rasterizer sounds interesting alongside "Next Generation Pixel Engine". Maybe conservative rasterization?

Some guess NCU stands for Next Compute Unit. A binning rasteriser sounds like a match to Nvidia's tiled rasteriser, with "draw stream" probably meaning it isn't TBDR (?).

AlNom · Jan 2, 2017

pTmdfx said:
Some guess NCU stands for Next Compute Unit.

Guess they wouldn't want Compute Unit NexT.

McHuj · Jan 2, 2017

Any guesses as to why we have a bullet point for the cache and cache controller?

Perhaps the HBM sits inbetween the GPU and a traditional GDDR pool?

pTmdfx · Jan 2, 2017

McHuj said:
Any guesses as to why we have a bullet point for the cache and cache controller?

The simplest guess without too much fanciness is a new cache hierarchy, which might be a complement with AMD's claim (reported by EETimes) of Vega utilising the same data fabric (NoC) as Zen SoCs do, which is said to be scalable from SoC uses (<40 GB/s) to beyond 512 GB/s (two HBM2 stacks).

pTmdfx · Jan 2, 2017

512TB virtual address space presumably means there is one extra address bit (49 bits) in GPU's own VM hierarchy over GCN3 (48 bits). No idea why they would just bump up one bit though... Are they going to map the entire host virtual address space into the GPUVM and unify the address translation hierarchies (ATC/GPUVM), heh?

troyan · Jan 2, 2017

Yes, same reason why nVidia supports this with Pascal:

GP100 extends GPU addressing capabilities to enable 49-bit virtual addressing. This is large enough to cover the 48-bit virtual address spaces of modern CPUs, as well as the GPU's own memory.
This allows GP100 Unified Memory programs to access the full address spaces of all CPUs and GPUs in the system as a single virtual address space, unlimited by the physical mem ory size of any one processor

pTmdfx · Jan 2, 2017

troyan said:
Yes, same reason why nVidia supports this with Pascal:

AMD kinda did this in GCN already though. There is an ATC bit in various descriptors that specifies whether the address is in GPUVM or in ATC (host address space through IOMMU).

Anarchist4000 · Jan 2, 2017

AlNets said:
Guess they wouldn't want Compute Unit NexT.

As opposed to Graphics Core Next? I'm still unsure on what exactly NCU is in reference too. It might be their command processor or something.

pTmdfx said:
The simplest guess without too much fanciness is a new cache hierarchy, which might be a complement with AMD's claim (reported by EETimes) of Vega utilising the same data fabric (NoC) as Zen SoCs do, which is said to be scalable from SoC uses (<40 GB/s) to beyond 512 GB/s (two HBM2 stacks).

Using the same fabric was extremely likely as separate Naples and Zen dice would have to coexist on the same package. Cache controller might also be related to ESRAM, partitioning of cache and units, or off package storage like SSG.

pTmdfx said:
512TB virtual address space presumably means there is one extra address bit (49 bits) in GPU's own VM hierarchy over GCN3 (48 bits). No idea why they would just bump up one bit though... Are they going to map the entire host virtual address space into the GPUVM and unify the address translation hierarchies (ATC/GPUVM), heh?

They were already leaning on the CPU's IOMMU unit for addressing with the ATC. Considering the amount of memory controllers in a GPU as opposed to CPU, it could be a pin/pad issue requiring additional routing. Additional bits possibly being used for additional addressing of virtual pools, encryption, CRC, etc. While not a lot, it could add up.

Jawed · Jan 2, 2017

So, I have to log in to link this patent application:

http://www.freepatentsonline.com/y2016/0371873.html

HYBRID RENDER WITH PREFERRED PRIMITIVE BATCH BINNING AND SORTING

A system, method and a computer program product are provided for hybrid rendering with deferred primitive batch binning A primitive batch is generated from a sequence of primitives. Initial bin intercepts are identified for primitives in the primitive batch. A bin for processing is identified. The bin corresponds to a region of a screen space. Pixels of the primitives intercepting the identified bin are processed. Next bin intercepts are identified while the primitives intercepting the identified bin are processed.

Don't understand it as yet...

pTmdfx · Jan 3, 2017

Anarchist4000 said:
They were already leaning on the CPU's IOMMU unit for addressing with the ATC. Considering the amount of memory controllers in a GPU as opposed to CPU, it could be a pin/pad issue requiring additional routing. Additional bits possibly being used for additional addressing of virtual pools, encryption, CRC, etc. While not a lot, it could add up.

Those possibilities you mentioned aren't likely covered by this one addressing bit, however. Let's say ESRAM, it is hard to imagine it not being virtualised behind the per-process virtual address space. For the LDS or scratch in the flat address space, they just need an aperture base pointer to remap from a full 64-bit flat address, and those addresses would not touch address translation at all (scratch memory would, but the address would be the resolved one). Encryption like Zen would be a bit in the physical address, not the virtual memory address.

Perhaps it is a sign of supporting Linux HMM in its ideal form — flipping ATC/GPUVM at page granularity to support flexible hot migration of pages to the GPU local memory.

Anarchist4000 · Jan 3, 2017

pTmdfx said:
Those possibilities you mentioned aren't likely covered by this one addressing bit, however. Let's say ESRAM, it is hard to imagine it not being virtualised behind the per-process virtual address space. For the LDS or scratch in the flat address space, they just need an aperture base pointer to remap from a full 64-bit flat address, and those addresses would not touch address translation at all (scratch memory would, but the address would be the resolved one). Encryption like Zen would be a bit in the physical address, not the virtual memory address.

Perhaps it is a sign of supporting Linux HMM in its ideal form — flipping ATC/GPUVM at page granularity to support flexible hot migration of pages to the GPU local memory.

Not by the one bit, but they could have influenced the ability to add more bits. While 512TB is a lot of space, that's not a whole lot for the exascale systems. Might be something they could change for customers actually needing more than 512TB on a GPU in a single pool. The HMM would definitely be a possibility along the same lines as that compute wave save/restore.

AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Ike Turner

iMacmatician

seahawk

iroboto

Daft Funk

Infinisearch

Ike Turner

iroboto

Daft Funk

Deleted member 13524

Guest

AlNom

Moderator

pTmdfx

AlNom

Moderator

McHuj

pTmdfx

pTmdfx

troyan

pTmdfx

Anarchist4000

Jawed

pTmdfx

Anarchist4000