AMD Vega Hardware Reviews

So, Overclockers UK sold over a thousand Vega based cards in under an hour (complete sell out). WTF.

---

According to the white paper, the draw stream binning rasteriser puts its working set in L2 cache. That's the only useful thing I've learnt today.

Congratulations to Carsten (and Raffael) for producing useful looking articles. Though there are rumours that major architectural features are still not activated in the driver, making me loathe to pay much attention.
 
Does anyone know how much integrated memory AMD's previous designs (Hawaii, Fiji and Ellesmere) have, to help put the 45MB number in some perspective?

BTW does Vega 10 have the Island-theme code name or is it truly the end of an era?
Vega is supposedly Greenland. Beyond that though, I think this is it. Navi shouldn't have an Island name.
 
Well, then that is where we differ in logic.
I think you are misunderstanding what I am saying. Like Malo and I posted earlier, FP16 will certainly make some algorithms go faster. I think blur filters were mentioned as a good candidate. But unless those algorithms were already taking up a bunch of render time in FP32, switching to FP16 cannot make much difference wrt the total frame render time.

In reality, most things in 3D rendering need ~FP32 for acceptable quality (FP24 would probably be okay for some things where FP16 is not, but it was decided long ago to make all the ALUs FP32 for the sake of simplicity and to allow pixel and vertex shaders to share the same hardware). Those things that can run at <FP32 with acceptable quality are in the minority. Remember, our GPUs used to have lower precision math units. They were removed/converted to FP32 because it didn't make sense to have them any more. FP16 only makes sense now because we've hit the power wall and it is fairly low cost to make an FP32 ALU that can run double rate FP16.
 
Last edited:
FP16 offers enough accuracy for a lot of computations. Do not mix FP16 with 16 bit color bitmaps, of course FP16 is not mean to be used for general prupose color blending on render targets...
 
FP16 offers enough accuracy for a lot of computations. Do not mix FP16 with 16 bit color bitmaps, of course FP16 is not mean to be used for general prupose color blending on render targets...
The question is whether or not FP16 eligible computations are taking up a significant portion render time in modern games. Remember even if FP16 computations make up 20% of render time (this is quite generous IMO), a 2x speedup of FP16 operations would only increase FPS by 10%.
 
I didn't see if this was already mentioned, but the Vega whitepaper upped the primitive rate to 17+ primitives per clock with primitive shaders.
 
Congratulations to Carsten (and Raffael) for producing useful looking articles. Though there are rumours that major architectural features are still not activated in the driver, making me loathe to pay much attention.
And not a peep from AMD as to whether any major architectural features were missing. Makes you kind of wonder whether there would be any difference in benchmarks, one way or the other.
 
The question is whether or not FP16 eligible computations are taking up a significant portion render time in modern games.

We already know they are (look at some of sebbbi and DICE's findings). A more important / interesting question is whether FP16 is worth supporting for PC developers in the immediate future (and it's not just as simple as selecting FP16 as a variable in shader code unfortunately as some care is needed).
 
FP16 offers enough accuracy for a lot of computations. Do not mix FP16 with 16 bit color bitmaps, of course FP16 is not mean to be used for general prupose color blending on render targets...

What is a lot for you ? Most of computation need fp32 nowadays.
 
I didn't see if this was already mentioned, but the Vega whitepaper upped the primitive rate to 17+ primitives per clock with primitive shaders.
Yet we still have no idea how well that translates to actual usefulness in games. Since apparently there is no developer intervention for primitives, their usefulness seems negligible when comparing Vega to Fiji.
 
In reality, most things in 3D rendering need ~FP32 for acceptable quality (FP24 would probably be okay for some things where FP16 is not, but it was decided long ago to make all the ALUs FP32 for the sake of simplicity and to allow pixel and vertex shaders to share the same hardware). Those things that can run at <FP32 with acceptable quality are in the minority. Remember, our GPUs used to have lower precision math units. They were removed/converted to FP32 because it didn't make sense to have them any more.

Yet we have sebbbi stating 70% of the pixel shaders in his games could be done using FP16, DICE claiming a 30% performance improvement from using it on Andromeda in the PS4 Pro version and VooFoo studios saying the dual-rate FP16 was one of the main drivers for them getting Mantis Burn to run at native 4K60 in the Pro.

It's obviously not the end-all for gaming scenarios and I guess most gamers out there would have preferred the area budget to have gone into e.g. more CUs/TMUs or even another shader engine, but you might be underestimating its potential.

I didn't see if this was already mentioned, but the Vega whitepaper upped the primitive rate to 17+ primitives per clock with primitive shaders.
From the original 11?
 
Specifically addressing FP16, I think there are far fewer use cases for it in graphics than you seem to believe. TINSTAAFL. And even in the cases where it is useful, unless it is an operation that is already taking a significant chunk of your frame time it wouldn't have a huge impact even if the speedup was ∞ (reduced time spent on that operation to 0).
HDR, physics, sound, shadows, etc could all use it. So there is a significant amount of time involved. I'm not expecting 100% gains, but ~30% for a optimized game. Stuff we haven't necessarily been heavily used in games yet, but features being actively discussed.

In the case of mobile FP16 is used almost exclusively, so for multi-platform engines the feature will see use.

We should expect a minor boost from the use of FP16 math in games that end up supporting it on Vega, and be (happily) surprised if it turns out to be of great significance.
I'm envisioning using it for culling with primitive shaders. Either by a dev or automatically through a driver. Convert all positions and maybe normals to FP16 and store. Then run that math on future frames for the culling pass with twice as many vertices indexed and twice the math rate. DICE in a presentation stated around 30%, but that also included reduced register pressure.

It is my understanding, that AMD's competitor can not decode it natively and uses FP16 emulation ? So it is possible for "certain people" to dismiss the importance of technologies like Rapid Packed Math. While other see instantly how it can help in many future game titles & engines.
Not emulation so much as run at FP32 at normal speed. Possibly with FP16 inputs to save register space. Casting would be cheaper that true 1/16th (?) rate execution. So there is more than just the double rate math.

Damn what is happening in this topic with fp16... It's not a miracle people, you won't double the compute performance. A lot of stuff need fp32.
Not a miracle, but if it added 30% it puts Vega on par with 1080ti. That's why it's of interest.
 
Yet we have sebbbi stating 70% of the pixel shaders in his games could be done using FP16, DICE claiming a 30% performance improvement from using it on Andromeda in the PS4 Pro version and VooFoo studios saying the dual-rate FP16 was one of the main drivers for them getting Mantis Burn to run at native 4K60 in the Pro.

It's obviously not the end-all for gaming scenarios and I guess most gamers out there would have preferred the area budget to have gone into e.g. more CUs/TMUs or even another shader engine, but you might be underestimating its potential.


From the original 11?
This is a good point as well. It's debatable whether FP16 acceleration would have made it into desktop GPUs if not for the whole GPGPU thing. That die space/engineering effort could have been used for other things that may have provided more benefit. But it's here now so I hope it is used as much as possible. I simply prefer to be pleasantly surprised rather than let down after the hype.
 
We already know they are (look at some of sebbbi and DICE's findings). A more important / interesting question is whether FP16 is worth supporting for PC developers in the immediate future (and it's not just as simple as selecting FP16 as a variable in shader code unfortunately as some care is needed).
I'm not saying sebbbi/DICE are wrong, but if FP16 is that big of a deal then going to all FP32 ALUs back in the DX10 era would have to be considered a colossal fuckup right? And this decision was made 10+ years ago when IQ was nowhere near what we have today.
 
The question is whether or not FP16 eligible computations are taking up a significant portion render time in modern games. Remember even if FP16 computations make up 20% of render time (this is quite generous IMO), a 2x speedup of FP16 operations would only increase frame time by 10%.
fixed. Even "only 10%" is a lot of time.

What is a lot for you ? Most of computation need fp32 nowadays.
everything that can be mapped using IEEE 754 half precision with enough accuracy between [0,1] or [-1,1] ranges. And outside RGBA color mapping, all that offers a lot values in those ranges: https://en.wikipedia.org/wiki/Half-...limitations_on_decimal_values_in_.5B0.2C_1.5D

Graphics engine developers are now fapping simply thinking about FP16 impact on compute shaders.
 
I'm not saying sebbbi/DICE are wrong, but if FP16 is that big of a deal then going to all FP32 ALUs back in the DX10 era would have to be considered a colossal fuckup right? And this decision was made 10+ years ago when IQ was nowhere near what we have today.

The original transition was to 32-bit position data and 24-bit pixel with optional half-precision, I think.
The realities of the state of the art and hardware have changed since then. There was no mandate that every ALU be 32 bits, but there were specific areas that would not tolerate reduced precision. This remains the case today.

What's changed is that there's more hardware slack to revisit adding more complex units that can handle more than one precision, or separate units while not compromising 32-bit throughput.
The more straightforward path of just adding more 32-bit hardware hit a wall with power, bandwidth, and density scaling faltering, and at the same time there's a deeper well of knowledge and accumulated improvements to software that allows for more complex management and analysis.

If GPUs were able to scale the 32-bit hardware as much as they wanted every generation without the consequences of today, it would be the path of least resistance.
 
Back
Top