Nvidia Pascal Announcement

So you do not think they can tweak the fp32 core in the P100 (Tesla in general) back to being 1:1 for FP16 when released for consumers?
No, I think that's just what they did. The 1:1 ops still work as they used to. Only the new vector ops don't.
 
Not sure if it's been posted already down thread, but all other GP10x Pascals have 128 ALUs per SM, versus 64 in GP100.
Yeah mentioned that in the past and also Tesla has a larger GPU register file size, was a shame and will be interesting to see what happens with the GP102 die that is being shared across Tesla-Quadro-Titan/ti.
Maybe this will be more comparable to what NVIDIA did with Kepler that these dies are the only ones sort of comparable to the upper Tesla range.
Cheers
.
 
You're mistaken on this. The FP16 instructions in Nvidia's implementation use dedicated FMA units that consume a serious amount of transistors and die area. Just like FP64: the big xx0 chip has full rate units, but all the xxy chips have just enough units to allow software to run.
That is very much at odds with what NVIDIA has told me. The whole fundamental premise is to use vec2 packing to twice the FP16 rate out of the FP32 units.
 
I've seen several posts here by sebbbi and whatnot claiming that FP16 would be beneficial to graphics tasks as well. The thing is not that they didn't make FP16 2x as fast as FP32, it's that they made it 64x slower than FP32. What in the name of fuck is that. Seriously.
Nobody is saying that FP16 isn't useful, we all know it is, but if discrete GPUs were able to survive for over a decade without it, it's hard to make the case that it was necessary. It's not as if adding this feature to today's consumer GPUs would add a lot of marketing benefit.

At least if FP16 had been same rate as FP32, one could have benefited from data storage gains even if there were no performance gains, but this is ridiculous. We can only hope AMD rides to the rescue here.
Yes, it'd be a great addition, especially for deep learning. So let's make a separate SKU for that and use to to increase the bottom line. ;)

But if it doesn't improve the rate over FP32, then there is no reason to support FP16. So, FP16 support in consumer version of Pascal is a joke. Deep learning only/ machine learning only is just an excuse.
I don't think I'm following you here. FP16 support that doesn't equal FP32 (but keeps it at the same rate), is most definitely a useful feature to have. In fact, NervanaSys added FP16 support to deep learning library for Maxwell, despite that fact that there are FP16-FP32 conversion overheads.

I am talking about architecture features.
Yes, me too. From what we've seen up to now, Pascal's architecture is a very logical evolution of Maxwell. I don't see the hilarious architectural quirks that you seem to see everywhere.

Yet IMG/ PowerVR has had dedicated FP16 units in their mobile GPUs for years, and we've seen developers go on record saying that FP16 is enough for a number of shading effects and using the smaller variables when possible translates into lower bandwidth requirements, higher performance and lower power consumption. IIRC, GCN was initially praised for bringing dedicated instructions for FP16.

So which one is it?
You're seeing a contradiction that isn't there. Nobody is denying that 10-bit display support is a useful feature to have. Yet while AMD has it for its consumer GPUs, Nvidia only does so for its Quadro line. This is no different. FP16 support carries a silicon cost, and it adds value to a select set of customers who are willing to pay for it. Why give it away for free?
 
Why give it away for free?
What is this I don't even.

Nobody is "giving" ANYTHING away for "free". People buy these chips for money, you know. You could use the same bollocks argument about anything regarding GPU tech; why "give" blending, filtering, shading and so on away for "free"? Well, you're not really doing that, are you? Again, this is a product that costs money. Everything has a silicon cost, nothing is free in 3D and all that.

Again we see NV abusing their position and holding the market back with their ridiculous greed. For how many times now?! I'm kind of surprised NV hasn't found a way to attach a credit card reader to their geforce cards; to continue rendering, swipe your card, pay your fee to almighty NV. Jen Hsun could use a new leather jacket for his next stage presentation!
 
Nobody is saying that FP16 isn't useful, we all know it is, but if discrete GPUs were able to survive for over a decade without it, it's hard to make the case that it was necessary. It's not as if adding this feature to today's consumer GPUs would add a lot of marketing benefit.
It would, if developers could rely on double speed fp16 being present, and would utilize it where it had no or negligeable impact on the actual gameplay result. Better performance, less pressure on data paths, lower power...there are a lot of benefits to be extracted.
So nVidia seems to be playing market segmentation games here. Hopefully AMD can muster credible alternatives without such limitations, and game code will check for card capabilities and use less wasteful paths if possible, giving an overall better experience. Hopefully.
 
What is this I don't even.
Do you only odd?

Nobody is "giving" ANYTHING away for "free". People buy these chips for money, you know.
I think that explains the disconnect perfectly: people don't buy chips. They buy a GPU board.

You could use the same bollocks argument about anything regarding GPU tech; why "give" blending, filtering, shading and so on away for "free"? Well, you're not really doing that, are you?
You're right: I don't 'do' that for the examples that you give. Because the features that you pointed out are an essential requirement for the market for which a GTX 1080 is targeted: gaming. A gaming GPU that didn't blending would have a rather limited appeal.

But I could 'do' it for a lot of other features: the aforementioned 10-bit monitors, anti-aliased lines, multiple clipping planes, ECC memory protection, and a shitload of other restricted features that you can copy over from the Nvidia website.

Again we see NV abusing their position and holding the market back with their ridiculous greed.
They're not abusing anything. When they sell gaming GPUs, they enable gaming features. When they sell compute GPUs, they enable compute features. There's nothing ridiculous about it: there are millions of products where you pay to enable additional features: all phone apps with in-app purchasing options, all moderately complicated engineering software etc.

Look at Nvidia as a software company that includes a very complicated hardware dongle. :LOL:

Nvidia is not gushing money like some of it's Valley neighbors, but it's solidly profitable. With that comes a virtuous cycle: It allows them to hire and pay engineers who don't like to work in an environment where layoffs are always a conference call away (I've been in the situation, it's no fun). It allows them to spend more on marketing campaigns that, hopefully, will increase sales even more. It allows them to spend big on large conferences that accelerate demand for their professional products. It allows them to invest in new stuff that may or may not be successful eventually.

Yet when you'd take away the gross margins from their Tesla and Quadro lines and sell those features at consumer prices, you'd end up with a profit that's significantly lower. And they'd have to cut back on the virtuous cycle.

One thing is sure: if GP104 were just like Maxwell without any hardware FP16 support at all, you'd have made much less noise than you do now. It would have been a minor footnote without any practical impact.
 
It would, if developers could rely on double speed fp16 being present, and would utilize it where it had no or negligeable impact on the actual gameplay result. Better performance, less pressure on data paths, lower power...there are a lot of benefits to be extracted.
If Nvidia were to add double speed FP16 and heavily promote its usage, people would complain that they'd be pushing a feature that put AMD at a competitive disadvantage. ;)
 
... if GP104 were just like Maxwell without any hardware FP16 support at all, you'd have made much less noise than you do now

I agree!

Maxwell is a fantastic development platform and Pascal appears to be all of that but faster, cheaper, more efficient and better with new instructions (dpXa) and as of yet unknown other improvements.

FP16x2 is a neat feature and it was sort of expected after the Tegra X1 (sm_53) and GP100 (sm_60) both appeared with full-rate implementations.

But after spending the last few weeks actually implementing a "half2" code path, IMHO it's neither an easy feature to integrate (you have to structure your problem into explicit 2-wide ops per thread) nor is it widely applicable (fp16 is narrow).

Is plopping an extra 5120 FP16 units next to 2560 FP32 units a wise move if no consumer is using it today and the GP104 already has immense bandwidth and higher res FLOPS?

Maybe we can relight our torches if AMD ships fp16x2! :)
 
Now we know where another part the unaccounted transistors went into.
If Nvidia were to add double speed FP16 and heavily promote its usage, people would complain that they'd be pushing a feature that put AMD at a competitive disadvantage. ;)

Makes you wonder if all the GameWorks effects will be able to tap into FP16.
 
FP16 are great for tons of operations in graphics where 16FP is precise enough. But if it doesn't improve the rate over FP32, then there is no reason tu support FP16. So, FP16 support in consumer version of Pascal is a joke. Deep learning only/ machine learning only is just an excuse.


I am talking about architecture features. I am not talking about performance. Everyone can affirm without any doubt that performance are great.
But didn't AMD only support FP16 at the rate of FP32 in all their recent GPUs?
In fact Pascal is the 1st to do this type of multi-precision improvement, so I am not sure why some are getting so strongly critical that the consumer version does not have this yet.
No idea if this has improved with Polaris and how they will implement this across their whole product line, which could be a catalyst for Nvidia to change their decision, but it seems FP16 would be more of a longer term solution for gamers IMO.

Cheers
 
For clarification: I for one only voiced my concerns about FP16 throughput in Cuda being apparently massively throttled in comparison to non-native implementaions for example in Maxwell.
 
If Nvidia were to add double speed FP16 and heavily promote its usage, people would complain that they'd be pushing a feature that put AMD at a competitive disadvantage. ;)

Yep, it's pretty much guaranteed that if nVidia started pushing fp16 in games the allegations of cheating would come fast and furious.

I'm not even sure it would work. Devs don't seem to be going out of their way to leverage nVidia's geometry performance advantage.
 
From oc.net

980ti
M3w4SIr.png


1080
J3EOMHB.png
 
something is in there, yes. And it seems to be slower than emulation through FP32.
Just for belt and braces, I would like to see someone testing this who has a P100 and a 1080, just in case there are also quirks in using this.
I agree I really doubt FP16 is going to be fully supported for consumer, but still surprised they did not go 1:1 with FP32.
Why, because ScottGray feels other instructions are working full throughput such as dp4a.
Ok, just compiled 4 dp4a's in a row with no dependencies and the stall counts are all being set to 1. This means it's likley a true full throughput instruction. This means the 1080 has 8228-8873 * 4 = 33-36 Tops of int8.
Or I think nvidia likes to call these DLops (deep learning ops)
.
https://devtalk.nvidia.com/default/...e-gtx-1080-amp-gtx-1070/post/4889730/#4889730

Cheers
 
Back
Top