Native FP16 support in GPU architectures

xpea

Regular
Supporter
We get that you're primarily on Beyond3D to promote Tegra but you don't need to mention them in every single post you make, especially in every single non-NV thread you participate in, beating the same point-free drum over and over again.
hmmm, if it's "point-free", can you tell in front of everybody here that img tech will never ever use full fp32 pipeline in future mobile IP ? and that you will keep forever fp16 ?
 
Last edited by a moderator:

Lazy8s

Veteran
For those shaders and operations that specify FP32 precision, the performance from PowerVR's FP32 ALUs puts them at the top of many benchmarks and gaming results.

The fact that ImgTec figured out how to increase performance in existing applications by adding significant FP16 resources as well just means they paid attention to the real demands of real applications and added extra optimizations for it.
 

ams

Regular
Look, we've had this discussion already. Desktop, notebook, and eventually console GPU architectures moved away from mixed pixel shader rendering precision many years ago (since G80 times) because they were architected in a way such that there were no performance advantages in using lower pixel shader rendering precision. Mobile GPU architectures could be architected the same way too, but in ImgTech's case they haven't yet made FP32 ALU's an equal class citizen because they tend to include subtantially more FP16 ALU's than FP32 ALU's even in top of the line designs such as GX6650. Obviously they have their own reasons for doing that, and not everyone agrees with that approach (including Anandtech), but it is what it is.
 
Last edited by a moderator:

Ailuros

Epsilon plus three
Legend
Supporter
hmmm, if it's "point-free", can you tell in front of everybody here that img tech will never ever use full fp32 pipeline in future mobile IP ? and that you will keep forever fp16 ?

Would NVIDIA guarantee that they will in ALL future GPU generations keep dedicated FP64 units? Under normal conditions these things get analyzed on a per generation basis and the coin flips in that direction where the most benefits lie.

Did NV ever promise on the other hand that hotclocking ALUs is the "be all end all" solution from here to eternity? Kepler and onwards send their greetings.
 

Ailuros

Epsilon plus three
Legend
Supporter
Look, we've had this discussion already. Desktop, notebook, and eventually console GPU architectures moved away from mixed pixel shader rendering precision many years ago (since G80 times) because they were architected in a way such that there were no performance advantages in using lower pixel shader rendering precision.

ULP SoC has absolutely NOTHING in common with all those. None of these are limited to less than a handful of Watts peak power consumption.

Mobile GPU architectures could be architected the same way too, but in ImgTech's case they haven't yet made FP32 ALU's an equal class citizen because they tend to include subtantially more FP16 ALU's than FP32 ALU's even in top of the line designs such as GX6650. Obviously they have their own reasons for doing that, and not everyone agrees with that approach (including Anandtech), but it is what it is.
As long as perf/mm2 is still in their favor I don't think it's even worth debating. Again for you a FP64 unit at 1GHz under 28LP costs for synthesis alone 0,025mm2 and since FP16 SPs are obviously a LOT smaller and for a 6 cluster config and 768 such SPs the total die area needed is still a single digit value. As for Anandtech you're refering to one author's opinion while mobile developers and not just one have stressed more than once how "unimportant" FP16 really is in the ULP world.

Dedicated FP16 ALUs guarantees them more perf/mm2 and perf/mW, otherwise it would had been obviously a flawed design decision.

Series5/SGX were supporting FP32 and FP16 from the same ALUs, now they're separate (in a relative sense since you can't use them in parallel), big deal. Yes they have more FP16 than FP32 ALUs because they're dirt cheap.
 

xpea

Regular
Supporter
Would NVIDIA guarantee that they will in ALL future GPU generations keep dedicated FP64 units? Under normal conditions these things get analyzed on a per generation basis and the coin flips in that direction where the most benefits lie.

Did NV ever promise on the other hand that hotclocking ALUs is the "be all end all" solution from here to eternity? Kepler and onwards send their greetings.
thanks to prove my point. GPU vendors tell us what is best for their businez, nothing more, nothing less. For example whatever NV said, they were guilty of using low fp pipeline in prior to K1 mobile arch. It was a flaw and here many people said so, including myself.
So what img tech says about this issue has no interest, fp32 is the de facto standard in 3D rendering for many many years. We all know why img tech uses fp16 (consumption and silicon size) but the fact is that NV is now using fp32 and when we compare these 2 vendors, we should never forget that it's not apples vs apples comparison...
 
Last edited by a moderator:

Ailuros

Epsilon plus three
Legend
Supporter
thanks to prove my point. GPU vendors tell us what is best for their businez, nothing more, nothing less.

No it is NOT a business matter but an efficiency matter since engineering at any IHV obviously doesn't have to do anything with marketing. Any design decision is deemed by engineers as the best possible solution for each timeframe and the projected needs. That does NOT mean that all of them, at all times and from all IHVs are the correct design decisions, but then again there's nowhere any perfection either.

If a GPU architecture would be "perfect" they'd stop development for further efficiency increases.

Again since you think you're reading the same thing but in essence aren't: why did NVIDIA since Kepler use dedicated FP64 units?

For example whatever NV said, they were guilty of using low fp pipeline in prior to K1 mobile arch. It was a flaw and here many people said so, including myself.
It was neither a crime, nor taboo since they weren't alone in that design decision. In fact ARM limiting it's first generation Mali GPU IP to just FP16 for pixel shaders was actually worse. The crime was (if you're actually looking for one) that NVIDIA up to Tegra3 used on purpose 16bit only Z precision which caused quite a few occassional artifacts like occassional z aliasing in order to gain performance.

So what img tech says about this issue has no interest, fp32 is the de facto standard in 3D rendering for many many years (TNT anyone?). We all know why img tech uses fp16 (consumption and silicon size) but the fact is that NV is now using fp32 and when we compare these 2 vendors, we should never forget that it's not apples vs apples comparison...
You still don't seem to want to understand the entire thing; there are FP16 and FP32 ALUs available on Rogues only this time in dedicated units and not from the same ALUs as in former generations. NVIDIA uses half floats also and not only just a few in about any mobile aplication. It's only an integration issue and not that NV uses FP32 everywhere. If you still can't understand it then I'm afraid I can't help you any further.
 

Lazy8s

Veteran
Tegra K1 does seem the most immediately relevant point of comparison to A8X, so I see nothing wrong with it continually being brought up. Having two products and their respective SoCs with so much competitive overlap launching at the same time, the Nexus 9 with Denver K1 versus the Air 2 with the A8X, is too rare and exciting of an opportunity to pass up for comparing architectures!

With such a confined thickness dimension (and the compromises to thermals as well as battery capacity that goes with that), I can only imagine this will be one of the hottest running and shortest battery lived iPads so far when the real results come out, but I also imagine that bad by iPad standards will still outclass the kind of very hot and power hungry results Tegra K1 has been turning in so far (the A15 version, at least).

The argument against FP16 ALUs was lost from the start, however, as the improvement in game performance and benchmark results from a Series 6XT part (boosted in no small part by said FP16 ALUs) versus an otherwise similar Series 6 part with the same number of FP32 ALUs proved to be very real. FP32 ops performance obviously hasn't been much of a limiting factor.

If the argument is that GPU designers who support a more flexible range of precisions that more optimally target the workloads of real apps are somehow holding app designers and the industry back, then... well, that's a ridiculous argument. By that logic, nVidia's lower competitive performance, efficiency, whatever is holding app designers back. Their inability to provide unconditional color blending accuracy and floating point Z precision like PowerVR does with its tile buffer is holding the industry back. Their architecture's inability to cope with overdraw as well as PowerVR is holding things back, etc. That kind of argument is ridiculous.
 

Entropy

Veteran
Look, we've had this discussion already. Desktop, notebook, and eventually console GPU architectures moved away from mixed pixel shader rendering precision many years ago (since G80 times) because they were architected in a way such that there were no performance advantages in using lower pixel shader rendering precision. Mobile GPU architectures could be architected the same way too, but in ImgTech's case they haven't yet made FP32 ALU's an equal class citizen because they tend to include subtantially more FP16 ALU's than FP32 ALU's even in top of the line designs such as GX6650. Obviously they have their own reasons for doing that, and not everyone agrees with that approach (including Anandtech), but it is what it is.

You have got it backwards. It is not a case of FP32 catching up to FP16, it is FP16 capabilities being enhanced. It is happening in PC graphics as well, witness AMDs Tonga and where Intel is going since Gen8.

The reasons have been given before. All else being equal, FP16 operations take less power, requires less internal (and external) bandwidth, the hardware takes much less die area which for a given level of performance which lowers cost and improves yield which lowers cost again. Alternatively, for a given budget of die space and power draw, FP16 yields much better performance. Routinely using a compact numerical representation and only using larger formats when actually needed simply makes sense. Why waste limited resources?

I would contend, and recent developments in PC graphics space agrees, that rather than mobile graphics slavishly following in the footsteps of designs targeting high-end desktop PC/HPC, PC graphics will actually be more influenced by mobile solutions. Personal computing is moving to higher pixel densities (making small errors perceptually irrelevant) and laptops are moving towards lighter designs with longer battery lives, increasing demands on power efficiency. So rather than mobile loosing their constraints and being more enthusiast desktop like (SLI! Crossfire! 1200W PSUs!) which is a ridiculous notion, what is actually happening is that the bulk of personal computing is moving towards mobile constraints.
(Indeed, many who aren't emotionally rooted in PC space would contend that mobile is where the bulk of personal computing takes place these days. Windows PCs have become a (large) computing niche.)

If we project forward, these trends don't seem likely to turn around. New silicon tech is unlikely to make compromises unnecessary, rather the lithographic challenges going forward are increasing. If you want development to move forward, regardless of whether you are a tech hungry consumer, or a manufacturer who needs new stuff to sell, being ever more intelligent about how you use available resources seems like a very good idea.
 

xpea

Regular
Supporter
No it is NOT a business matter but an efficiency matter since engineering at any IHV obviously doesn't have to do anything with marketing. Any design decision is deemed by engineers as the best possible solution for each timeframe and the projected needs. That does NOT mean that all of them, at all times and from all IHVs are the correct design decisions, but then again there's nowhere any perfection either.
have you ever worked for a GPU vendor ? I did and I can tell as a fact that marketing is as much important (even sometimes more) than engineering. New features request are from management, sales and marketing teams, then engineers are in charge to make it true. not the other way around...

Again since you think you're reading the same thing but in essence aren't: why did NVIDIA since Kepler use dedicated FP64 units?
Yeah and why not fp128 ? ooh and about fp256 ? :rolleyes:
sarcasm aside, fp16 can cause artifacts and that's the reason why industry moved away from it years ago. and fp64 is useless in 3D rendering but mandatory for some HPC loads. It's all about the right precision for the right job.

You still don't seem to want to understand the entire thing; there are FP16 and FP32 ALUs available on Rogues only this time in dedicated units and not from the same ALUs as in former generations. NVIDIA uses half floats also and not only just a few in about any mobile aplication. It's only an integration issue and not that NV uses FP32 everywhere. If you still can't understand it then I'm afraid I can't help you any further.
what you can't understand is progress and competition. You can't ignore it. IHVs must push features/perf/consumption to stay alive. So what was good/enough few months ago may not be true anymore now.
NV went this generation to full fp32 pipeline because they use the same arch from mobile to HPC. This is a massive accomplishment and competition can't stay looking without reacting. Fact is that one vendor has now full fp32 pipeline and img tech is behind with his mixed implementation.
And finally, I can bet whatever you want that img tech will have also full fp32 pipeline soon (no later than 8 series). so wanna bet ?
 
Last edited by a moderator:

xpea

Regular
Supporter
If the argument is that GPU designers who support a more flexible range of precisions that more optimally target the workloads of real apps are somehow holding app designers and the industry back, then... well, that's a ridiculous argument. By that logic, nVidia's lower competitive performance, efficiency, whatever is holding app designers back. Their inability to provide unconditional color blending accuracy and floating point Z precision like PowerVR does with its tile buffer is holding the industry back. Their architecture's inability to cope with overdraw as well as PowerVR is holding things back, etc. That kind of argument is ridiculous.
I agree with the first part of the post, but this quoted last part is highly biased.
I can also say PowerVR lack of tesselation is holding industry back, ditto for lack of geometry engine, lack of opengl 4.4 support also holds industry back, and so on :rolleyes:
 

xpea

Regular
Supporter
xpea, please.
This is not the PC graphics card market of years gone by.
but it's exactly what people don't understand !
img tech was very lucky to find a refuge in the mobile market when they have been kicked out from the desktop. They could survive and it's a good thing. But today, the situation is different with Intel, AMD and NV entering mobile with strong ambitions.
To develop a new arch is an exponential cost increase and leveraging this cost between different markets is the way to go. In other words, img tech can't survive only with mobile in mind. They are a small company compared to others with a fraction of R&D resource. NV strategy, ie use same arch that can go to mobile/laptop/desktop/hpc is what they will do from now on. It means that img tech will face competition that brings much higher feature set that was traditionally required in this market (full fp32, tessellation, geometry engine, etc).
And I'm 200% sure that they know it, and hopefully they are prepared for this change of landscape. If not...
 

Ailuros

Epsilon plus three
Legend
Supporter
have you ever worked for a GPU vendor ? I did and I can tell as a fact that marketing is as much important (even sometimes more) than engineering. New features request are from management, sales and marketing teams, then engineers are in charge to make it true. not the other way around...

For someone that claims to have worked for a GPU vendor you're quite shortsighted and tragically ill informed on the specific topic.

Yeah and why not fp128 ? ooh and about fp256 ? :rolleyes:
sarcasm aside, fp16 can cause artifacts and that's the reason why industry moved away from it years ago. and fp64 is useless in 3D rendering but mandatory for some HPC loads. It's all about the right precision for the right job.
Did anyone tell, imply or prove here that a Rogue is using FP16 ALUs for tasks that should actually be handled with FP32 or high precision? Obviously not. Spare me the above nonsense if you're actually willing to be taken as someone that has even the slightest clue.


what you can't understand is progress and competition. You can't ignore it. IHVs must push features/perf/consumption to stay alive. So what was good/enough few months ago may not be true anymore now.
That's what everyone does, but that doesn't change the fact that you're still having the wrong picture here.

NV went this generation to full fp32 pipeline because they use the same arch from mobile to HPC. This is a massive accomplishment and competition can't stay looking without reacting. Fact is that one vendor has now full fp32 pipeline and img tech is behind with his mixed implementation.
Oh sweet Jesus...NV simply used a readily available desktop design that of Kepler and made a few rough adjustments for GK20A. Them using dedicated FP64 units to save power at the cost of lower power consumption is a lesson learned in the ULP mobile space and not the other way around. Maxwell takes those steps even one step closer to the ULP philosophy.

Again Rogues support both FP32 and FP16 values just from different ALUs; there's no "behind" not any half assed pipeline but only your rather clueless interpretation of the hw itself.

And finally, I can bet whatever you want that img tech will have also full fp32 pipeline soon (no later than 8 series). so wanna bet ?
If it would made sense we'd have FP64 only ALUs by now; we don't because to each his own.

Other than that I'm not placing any bets you're likeliest to lose.
 

Ailuros

Epsilon plus three
Legend
Supporter
but it's exactly what people don't understand !
img tech was very lucky to find a refuge in the mobile market when they have been kicked out from the desktop. They could survive and it's a good thing. But today, the situation is different with Intel, AMD and NV entering mobile with strong ambitions.

Intellectual property has advantages in a market like this. I'll let you figure out why ARM is so successful in the specific market and Intel isn't for CPUs.

To develop a new arch is an exponential cost increase and leveraging this cost between different markets is the way to go. In other words, img tech can't survive only with mobile in mind. They are a small company compared to others with a fraction of R&D resource. NV strategy, ie use same arch that can go to mobile/laptop/desktop/hpc is what they will do from now on. It means that img tech will face competition that brings much higher feature set that was traditionally required in this market (full fp32, tessellation, geometry engine, etc).
And I'm 200% sure that they know it, and hopefully they are prepared for this change of landscape. If not...
DX11 and higher is obviously on their immediate roadmap; this has however absolutely NOTHING to do with having dedicated for each precision type ALUs. They're full FP32 already.

Can you find the following extensions in NVIDIA's K1 drivers?

GL_EXT_color_buffer_half_float
GL_OES_texture_half_float
GL_OES_texture_half_float_linear
GL_OES_vertex_half_float

Now take a deep breath and prove where and why they're used for exactly.
 

Pottsey

Newcomer
but it's exactly what people don't understand !
img tech was very lucky to find a refuge in the mobile market when they have been kicked out from the desktop. They could survive and it's a good thing. But today, the situation is different with Intel, AMD and NV entering mobile with strong ambitions.
To develop a new arch is an exponential cost increase and leveraging this cost between different markets is the way to go. In other words, img tech can't survive only with mobile in mind. They are a small company compared to others with a fraction of R&D resource. NV strategy, ie use same arch that can go to mobile/laptop/desktop/hpc is what they will do from now on. It means that img tech will face competition that brings much higher feature set that was traditionally required in this market (full fp32, tessellation, geometry engine, etc).
And I'm 200% sure that they know it, and hopefully they are prepared for this change of landscape. If not...
I more wonder are Arm, Intel, AMD and NV prepared for the change in landscape once mobile real time ray tracing arrives. It’s getting closer by the day and now Unreal 4.5 engine support it, Unity is adding support the only thing missing is the mobile GPU that is getting closer. Personally I believe Wizard will come out when 16nm arrives then we might truly see a change in the landscape. Will be kind of crazy to have mobile games with better lights and shadows then desktop games with high end GPU’s.

IMG also have a lot of tech in the pipeline that is outside Intel, AMD and NV target markets. IMG should see substantial growth in the next 5 years. What happens if Apple do ray tracing on 16nm making it a must have feature how will Arm, Intel, AMD and NV be placed in the GPU mobile market then?
 

xpea

Regular
Supporter
For someone that claims to have worked for a GPU vendor you're quite shortsighted and tragically ill informed on the specific topic.
LOL I stopped here. You have no clue of what you are talking about.
I will just keep this in a corner of my bookmarks and will bring it back few months later :cool:
 

Ailuros

Epsilon plus three
Legend
Supporter
LOL I stopped here. You have no clue of what you are talking about.
I will just keep this in a corner of my bookmarks and will bring it back few months later :cool:

You know folks can actually read and judge for themselves around here. Just because you stayed out of any arguments and haven't been able to even address ONE point so far, it doesn't necessarily mean that I'm really as clueless after all.
 

tangey

Veteran
At what point does an admin decide to move discussions that have moved far away from the title, to another place.
 

Xmas

Porous
Veteran
Supporter
Look, we've had this discussion already. Desktop, notebook, and eventually console GPU architectures moved away from mixed pixel shader rendering precision many years ago (since G80 times) because they were architected in a way such that there were no performance advantages in using lower pixel shader rendering precision.
"They moved away from it because... they moved away from it."

Mobile GPU architectures could be architected the same way too, but in ImgTech's case they haven't yet made FP32 ALU's an equal class citizen because they tend to include subtantially more FP16 ALU's than FP32 ALU's even in top of the line designs such as GX6650. Obviously they have their own reasons for doing that, and not everyone agrees with that approach (including Anandtech), but it is what it is.
Why "yet"? G6400/G6200 have no FP16 ALUs. All operations are available in FP32 precision, and I would bet that the FP16 ALUs still take less area than the FP32 ALUs for Series 6XT.

So what img tech says about this issue has no interest, fp32 is the de facto standard in 3D rendering for many many years. We all know why img tech uses fp16 (consumption and silicon size) but the fact is that NV is now using fp32 and when we compare these 2 vendors, we should never forget that it's not apples vs apples comparison...
If it's not an apples to apples comparison then the application you're running probably has a bug. It's output that matters. If FP16 is sufficient precision, then the comparison is absolutely apples to apples.
 
Top