That would be the other interesting test. Getting the difference between 16-bit processing (less pressure on registers as I remember), 32-bit processing and 2x16-bit.It would be interesting to do a comparison when running both modes on Tonga, Fiji, and Polaris.
Throw in another few percent from using the ID buffer, and perhaps the 4.2 TF GPU ends up working like 5 TF part of the same architecture. That closes the gap with the competition while being out earlier and costing less - all in all a sweet design win if so.So it is ~13% faster in a non-realistic benchmark.
So it is a nice to have but not a must have.
Throw in another few percent from using the ID buffer, and perhaps the 4.2 TF GPU ends up working like 5 TF part of the same architecture. That closes the gap with the competition while being out earlier and costing less - all in all a sweet design win if so.
4.2 ->6.0 is a 40% lift, he's definitely not suggesting a 40% lift.Did you mean 6 TF?
4.2 ->6.0 is a 40% lift, he's definitely not suggesting a 40% lift.
What, no. He implied a lift of 4.2 to 5. Which is huge already, nearly a 20% lift there. That's just a number in the sky as well. Because honestly, that so massive such that if 2 such features that are easy to implement can consistently provide that much lift, you're taking about nearly a shift of price performance for a family of GPUs. Everyone would have it.Oh thanks for the clarification. So 4.2TF performing like 5TF, and therefore 6TF performing like 5.2TF. That really does "closes the gap with the competition while being out earlier and costing less - all in all a sweet design win" as Shifty Geezer stated. A 0.2TF difference is minuscule compared to a gargantuan 0.7TF difference.
What, no. He implied a lift of 4.2 to 5. Which is huge already, nearly a 20% lift there. That's just a number in the sky as well. Because honestly, that so massive such that if 2 such features that are easy to implement can consistently provide that much lift, you're taking about nearly a shift of price performance for a family of GPUs. Everyone would have it.
He did not imply that 6TF would drop to 5.2. Why would it do that ? Why less?
It doesn't work quite like that.Sorry shouldn't you expect the same pie in the sky deficit (definitley some degree of performance lost) doing it in software, opposing the pie in the sky boost from doing it in hardware?
Why? Two products are identical. Then one adds a 20% boost. That doesn't cause the other to have a 20% penalty. I'm simply saying the double-pumped FP16 isn't massive in itself, but has the potential in cahoots with the ID buffer, another unknown quantity, to do a good job of enabling PS4Pro to punch above its weight. Which is interesting from a hardware and software engineering perspective, if a GPU can get a significant performance boost from a couple of 'free' extras.Sorry shouldn't you expect the same pie in the sky deficit (definitley some degree of performance lost) doing it in software, opposing the pie in the sky boost from doing it in hardware?
It doesn't work quite like that.
6TF is operating at 32bit precision when we refer to FLOPS, this is the "default" Unit for calculation. Rapid packed math is fitting 2 16bit values and performing the same operation (on both), but because you fit 2, you're operating at 2x. That's a gain in performance. It won't apply to everything since a great deal of calculations require 32bit. But for things that don't, if the ALU is the bottleneck, then rapid packed math would speed things up here.
For ID buffer, we have 0 way of knowing it's performance benefit over software emulation. I've not seen the numbers. Performance improvement caused by ID buffer will differ title to title depending on how the developer would like to resolve reconstruction. So once again, unsure.
Why? Two products are identical. Then one adds a 20% boost. That doesn't cause the other to have a 20% penalty. I'm simply saying the double-pumped FP16 isn't massive in itself, but has the potential in cahoots with the ID buffer, another unknown quantity, to do a good job of enabling PS4Pro to punch above its weight. Which is interesting from a hardware and software engineering perspective, if a GPU can get a significant performance boost from a couple of 'free' extras.
I wouldn't say that "great deal of calculations require 32 bit". Some calculations require 32 bit, but 16 bit is fine for lots of graphics calculations. However if shader requires 32 bit in some places, it must add conversion instructions to both sides (convert f16->f32, calculate at full precision, convert f32->f16). These conversions add extra ALU cost. This reduces the potential ALU gains, and can even slow down the shader if 16 bit and 32 bit calculations aren't separated well enough.6TF is operating at 32bit precision when we refer to FLOPS, this is the "default" Unit for calculation. Rapid packed math is fitting 2 16bit values and performing the same operation (on both), but because you fit 2, you're operating at 2x. That's a gain in performance. It won't apply to everything since a great deal of calculations require 32bit. But for things that don't, if the ALU is the bottleneck, then rapid packed math would speed things up here.
Yes, console software is able to take advantage of the hardware strengths and bottlenecks better, because you can target a single hardware configuration. But keep in mind that base PS4 still exists and it doesn't have double rate fp16. If you would start preferring extremely ALU heavy code that better suits a 2x fp16 machine, your performance could suffer on an 1x fp32 machine as result. I would't expect cross platform games to take whole advantage of fp16 until all major PC IHVs support it. With Vega, the situation improves. Now both Intel and AMD support fp16 in their GPUs. If Volta brings double rate fp16 to desktop, we can expect even more fp16 support in games. This of course would also help PS4 Pro in cross platform games.It may help in a console though, that the software (in theory) targets the hardware and not vice versa as is the case on PC. You could have a PC card (the old X1950 XT) with too many math resources and they'll idle. On console, in theory, nothing needs to idle.
Yes, console software is able to take advantage of the hardware strengths and bottlenecks better, because you can target a single hardware configuration. But keep in mind that base PS4 still exists and it doesn't have double rate fp16. If you would start preferring extremely ALU heavy code that better suits a 2x fp16 machine, your performance could suffer on an 1x fp32 machine as result. I would't expect cross platform games to take whole advantage of fp16 until all major PC IHVs support it. With Vega, the situation improves. Now both Intel and AMD support fp16 in their GPUs. If Volta brings double rate fp16 to desktop, we can expect even more fp16 support in games. This of course would also help PS4 Pro in cross platform games.
Definitely. But obviously there are some 1st party studios who are going to make really good use out of it on PS4 Pro. Latest Frostbite presentation also showed some gains, so fp16 adaptation is definitely under way. Vega launch will help adaptation, since developers can now optimize also on their development workstations and see gains immediately.It will proabaly have more impact on next generation console
"There are different ways to do checkerboarding as well," adds Giliam, who told us that they 'rolled their own' solution as opposed to using Sony's reference model. "You can have more information per pixel, or less information per pixel when rendering checkerboarding and depending on how much information you have, you can go for different checkerboard resolve techniques. We came up with one that doesn't need a lot of extra data at the per-pixel level and that gave us some performance boosts as well in the rendering of the whole geometry and the lighting pass."
Too bad they didn't ask a question about ID buffer or FP16 and how they used it in the Pro version. I am very surprising they didn't actually. Sounds like an odd omission and makes it an incomplete interview.On HZD Checkerboarding: http://www.eurogamer.net/amp/digita...zero-dawn-the-making-of-ps4-pros-best-4k-game
Sounds like how Sebbbi would describe using his own VT system over Tiled Resources. I assume Sony's reference model includes the usage of ID Buffer.