AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Koduri stated that Vega's fabric is optimized for servers, but I'm not sure what would be limiting it other than perhaps some additional overhead for items like generally unused error correction or expanded addressing. In fact, I'm not sure what "server-optimized" really adds if all the fabric is doing is sitting between memory, GPU, and standard IO.
There's the flash controller and I think the IO for that, though its impact should be modest.
I was wondering the same thing and only thing I can think of is that Infinity also crossed for PCIe lanes with Epyc. Having more than 16 lanes would make sense for certain "server" parts that may link components via Infinity. Mostly related to SSG hosted controllers for NVMe or SAN. APU/x2 could be another possibility as could a larger PCIe slot. A 32 lane PCIe slot wouldn't be unreasonable with unified memory on server. Without serving as a means to connect chips, I'm at a loss for how it could be server optimized.
 
While GF100b also enabled full-rate FP16, yes it's eerie. Especially when you take into account, that Fermi was Nvidias first try at fully distributed geometry and Vega is AMDs first chip where geometry can be shared through the shader engines (and i am not talking about that very slim line indicating load balancing in former quad-engine Radeons).
I feel like this has been discussed before, but I'm not sure what people think Radeons were lacking with respect to the ability to distribute geometry prior to Vega.
 
Vertex shaders default to highp precision. Fragment shaders don't have a default precision and you must specify highp/mediump/lowp either via the precision statement (precision highp float) or declare each variable in the shader with the required precision (mediump vec4 sum).
Yes. However there's no guarantee for highp support for fragment shaders. You would get compile error on such GPU if you haven't programmed mediump code path for your fragment shaders. Also many mobile GPUs incur significant performance hit when highp is used, so it is good to use mediump even when you aren't targeting absolute low end.

OpenGL ES 2.0 spec only guarantees >= 16 bits of mantissa precision for highp. This translates to minimum of 24 bit floating point (with exponent + sign bits). Thus you always need to ensure that 24 bit float precision is enough for your vertex shader work (similar to ATI DX9 GPUs). Mediump mantissa is guaranteed to be >= 10 bits, so it is at least as good as IEEE 16 bit float spec (10 bits mantissa, 1 bit sign, 5 bits exponent).

Guaranteed integer support in OpenGL ES 2.0 is also limited to 16 bits (mantissa precision of minimum highp format). Thus you need to ensure that integer code runs properly with 16 bit integers. This kind of code is easy to port to desktop DirectX (min16int type is perfect fit for that).
 
I feel like this has been discussed before, but I'm not sure what people think Radeons were lacking with respect to the ability to distribute geometry prior to Vega.
AMD Vega marketing material tells us that Vega has improved geometry load balancing across multiple geometry engines. That already implies that something had to be improved :)
 
If you're referring to early DX9-era hardware AMD went with FP24 on pixel shaders (vertex were always FP32), while NVIDIA supported little-less abysmally slow FP16 and abysmally slow FP32 for pixel shaders
Fp16 wasn't abysmally slow on Nvidia 7000 series. Well optimized PS3 fp16 pixel shader code was faring pretty well against Xbox 360s ATI GPU. There was a free fp16 normalize instruction :). Old Nvidia GPUs however sucked in code with branches and EDRAM gave Xbox 360 a nice bandwidth boost.
 
Fp16 wasn't abysmally slow on Nvidia 7000 series. Well optimized PS3 fp16 pixel shader code was faring pretty well against Xbox 360s ATI GPU. There was a free fp16 normalize instruction :). Old Nvidia GPUs however sucked in code with branches and EDRAM gave Xbox 360 a nice bandwidth boost.
I assumed he talked about the GeForce FX though... (given it was the same era as R300 which was FP24)
 
I feel like this has been discussed before, but I'm not sure what people think Radeons were lacking with respect to the ability to distribute geometry prior to Vega.
Of course, I would love to learn more, but as far as I was being told, geometry and associated pixel stayed inside the shader engine to whose front end they were initially assigned edit: with the caveat, that there was a path for load balancing if required. In Tahiti (yes, older example), Wavefronts were generated from up to 16 primitives, but those had to reside in one screen tile (assigned to a shader engine). With Fermi, again AFAIK, geometry was (necessarily) redistributed after tessellation stage, because each SM is fetching vertices individually over a number of clock cycles.
 
Last edited:
I assumed he talked about the GeForce FX though... (given it was the same era as R300 which was FP24)
Agreed. Geforce FX was awful. But Nvidia kept their fp16 + fp32 support for 6000 and 7000 series as well. 8000 was their first fp32 only product. PS3 was using Nvidia's last gaming GPU with fp16 support.
 
AMD Vega marketing material tells us that Vega has improved geometry load balancing across multiple geometry engines. That already implies that something had to be improved :)
Maybe we see improved geometry load balancing when Primitive Shader and DSBR are on and work together. For me it looks like Primitive Shader and DSBR is the optimized path for geometry load balancing and now we run on u unoptimized connections.
 
Of course, I would love to learn more, but as far as I was being told, geometry and associated pixel stayed inside the shader engine to whose front end they were initially assigned edit: with the caveat, that there was a path for load balancing if required. In Tahiti (yes, older example), Wavefronts were generated from up to 16 primitives, but those had to reside in one screen tile (assigned to a shader engine). With Fermi, again AFAIK, geometry was (necessarily) redistributed after tessellation stage, because each SM is fetching vertices individually over a number of clock cycles.
What you were told isn't how I would explain it. Since multiple shader engines have existed geometry could always be processed on one shader engine and its output could be sent to another shader engine. This is a requirement since when the original geometry is distributed to shader engines the triangle screen coverage isn't known. The load balancing you were told about happens more often than not with 4 shader engines. To make it clear to others, the wavefronts "generated from up to 16 primitives" are pixel waves.

Tonga was the first AMD chip that took tessellation factors into account when load balancing.
 
Well I can confirm that MSAA performance in Vega is broken at the moment.

I did some test runs in Deux Ex Makind Divided's internal benchmark @ 3440*1440, everything maxed out (really everything, just so there aren't any doubts), 105º FoV. Here are my average results using a Vega 64 in Power Save mode and a 10-core Xeon IvyBridge @ 2.9GHz:

No MSAA - 44.5 FPS
2x MSAA - 29.4 FPS
4x MSAA - 19.5 FPS
8x MSAA - 9.2 FPS

These performance hits are typical of SSAA or downsampling/VSR. Something's not working right.
Despite the game's constant warnings about VRAM usage with MSAA, it doesn't look like the game is topping the 8GB HBM2, as turning on HBCC with ~20GB allocated doesn't seem to make any difference.
I would avoid using MSAA with this card at the moment for anything but older/simpler games.


On the other hand, anisotropic filtering at 16x isn't taking any discernible performance hit over 8x, which is something that had been somewhat of a problem for prior GCN GPUs.
 
Last edited by a moderator:
I may have missed it but did AMD directly comment on the MSAA issue and possible solution?
 
How is the image quality. Does it apply just MSAA or something more?

To be honest in this game it's hard to tell. With everything cranked up to max the post effects hide the aliasing very well and TAA eliminates the little that is left. The difference between turning MSAA off or 8x is pretty much negligible in those conditions.
I looked around for a bit in the game's last level, but I couldn't find any transparent texture by myself (which is where MSAA would be more noticeable).

This is the best that I could do by turning off TAA, DoF bloom and lens flares:

No AA -
JH9s9rP.jpg



8x MSAA -
d0TvBui.jpg




I'll be happy to follow instructions on what to look for, or what graphics options to turn on/off to see this more clearly.
 
Looks like MSAA isn't applied at all. Does the game even support it?
Looking at this screen (t=190)
I would say it doesn't resolve MSAA but simply pulls the first sample (look at the crane, it get fainter and fainter the higher the sample count, because it essentially degrades to point sampling), and doesn't support MSAA at all.
 
Well I can confirm that MSAA performance in Vega is broken at the moment.

I did some test runs in Deux Ex Makind Divided's internal benchmark @ 3440*1440, everything maxed out (really everything, just so there aren't any doubts), 105º FoV. Here are my average results using a Vega 64 in Power Save mode and a 10-core Xeon IvyBridge @ 2.9GHz:

No MSAA - 44.5 FPS
2x MSAA - 29.4 FPS
4x MSAA - 19.5 FPS
8x MSAA - 9.2 FPS

I've been reading up on the basics of how MSAA works, and as far as I can understand this as a beginner, doesn't MSAA involve rasterizing at a higher resolution than the intended display resolution? In which case, wouldn't MSAA performance be limited by rasterization rate? Doesn't that mean that Vega's presently being limited to four triangles per clock would negatively impact MSAA performance?
 
Back
Top