GP102/GP100 FP16 support vs. Performance in old DirectX 8 and DirectX 9

agent_x007 · Jul 23, 2016

Hello

Basicly : Reading great (and late) article on Anandtech about Pascal (LINK), I saw that new GP102 core is/should be build from FP32/2xFP16 capable Cuda Cores (GP104 is mostly FP32 only).
Then I remembered that old DirectX's (up to DX9.0), work or can work, on half precision numbers (FP16).

Does this mean that new Titan X (or Titan XP*) is better suited for old DirectX 8.X and DirectX 9 games, than new GPU's since DX10 (2006) up to this point (ie. GP104) ?
I know this is pure theory (since GP102 didn't lauch yet), but I found it interesting if it can be better utilised in them because of native FP16 support.

What do you guys think ?
I know this is really irrelevant by this point (since you can probably get 999+ FPS in those programs/games by now), but still... I'm curious if my thinking is correct.
I don't program shaders, so I wanted someone to clarify this for me.

Thank you for your time.

PS. That old NV30 path (mixed FP16/FP32), should work great on Titan XP*

*"Titan XP" was first used on LinusTechTips latest "WAN Show" video stream.

homerdog · Jul 23, 2016

Are you so sure GP102 will have the double rate FP16? I thought it was supposed to be a larger GP104 rather than a smaller GP100.

agent_x007 · Jul 23, 2016

GP100/GP102 = Big Pascal (Tegra P100/"Titan XP"),
GP104 = High End Pascal (GTX 1080/1070),
GP106 = Mid End Pascal (GTX 1060).

I think FP16 may be enabled as option in GeForce GP10x based cards as well, but it's too early to tell.

agent_x007 · Jul 23, 2016

There is also this quote :

As it turns out, when it comes to FP16 NVIDIA has made another significant divergence between the HPC-focused GP100, and the consumer-focused GP104. On GP100, these FP16x2 cores are used throughout the GPU as both the GPU’s primarily FP32 core and primary FP16 core. However on GP104, NVIDIA has retained the old FP32 cores. The FP32 core count as we know it is for these pure FP32 cores. What isn’t seen in NVIDIA’s published core counts is that the company has built in the FP16x2 cores separately.

Source : Anandtech article about Pascal linked before.

PS. Is there a way to edit my earlier post or do I need to write 10 posts in total to see that option ?
Would love to add "a" in "Pascl"

lanek · Jul 23, 2016

Well GP 102 is not GP100 ... All we know about GP102, is, it have a different memory controller obviously of the GP100 ( GDRR5x vs HBM2 on GP100) and it have full INT8 support, no FP64.. As Nvidia have not put in their marketing stuff anything about FP16 rate, only Int8, i can imagine there's no FP16 support too similar of the GP104 in this way.

theres a 25% decrease in transistor count for GP102 vs GP100..

http://www.anandtech.com/show/10510...-titan-x-video-card-1200-available-august-2nd

Nvidia is not completely crazy,i dont think they want to shoot out their own P100 Tesla solution,
Deep learning research are anyway not mean to be run by common folk at home, so maybe we could see full FP16 support ( even if for know nothing is indicate it and i relly dont think it is the case.).. , you have students on university who can engage to learn about it,
and then its a good opportunity for Nvidia to push their ecosystems of software and tools through a gpu as Titan.

Honestly, at this point i think GP102 is way closer of a GP104 with same SM count of the GP100.

Theres 2 possibilities i see.

1) Nvidia wanted till the start completely separate the " gaming " skus from the P100 as stated by Anandtech for finally separate the professional skus from the consumers one. And have from the start prepare 2 variant with GDDr5x and HBM2 .

2) HBM2 was available later than expected, and Nividia have start early to move their skus outside P100 to a standard GDDR5 (X) memory style for release their gpu's early. ( at contrario of AMD who seems have opt for wait the HBM2. )

Malo · Jul 23, 2016

I'll finally be able to run Half Life at 8k and 1 million fps?

agent_x007 · Jul 23, 2016

True... not enough information. INT8 is interesing, didn't catch that first time I read that update

But since it's a Titan (and FP16 sits inside of CudaCores), let's assume for now that it does have full FP16 support.

This still is a purely theoretical question (assuming it has full FP16 support (and double FLOPs from FP32) :
Can new Titan X be better suited to run old DirectX's INT8/FP16 mode (ie. have better hardware utilisation), than hardware from late 2006 (DX10) up to this poin ?

CSI PC · Jul 23, 2016

I do not think it is guaranteed to have the GP100 mixed precision Cuda core, however the GP100 misses out on some useful operations and that would be a good reason to create another mixed-precision GPU below the flagship, so logically it would make sense for the GP102 to be mixed-precision FP32/FP16 and also with the Int8/dp4a operation.
IMO it would still make sense for such a card to have somewhere between 1.2Tflops to 1.8 Tflops FP64 (disabled/reduced to maybe 1:8 or 1:24 or 1:32 for Pascal Titan), but then only so much can be crammed onto a smaller die.
This would make the GP102 viable across all segments and a more cost-effective price to the GP100, and this is important as some sources reported from Cray (who sell both Intel and Nvidia solutions) are suggesting Knights Landing is winning more large scale projects than P100.

But then with the Pascal Titan info so far there is no emphasis of mixed precision and just the int8, which aligns more with the GP104.
This makes it a confusing product because from a Deep Learning perspective (and this is how they present the Pascal Titan) it is limited in the functions-operations it is capable of without the mixed-precision FP32/FP16 Cuda core, and from a cost/complexity perspective not many would want to have multiple different dedicated GPUs where some workloads can be shared on a device.

So IMO it is too early to say exactly what Cuda cores are in the GP102 until they release more information.
Cheers

lanek · Jul 23, 2016

CSI PC said:
This makes it a confusing product because from a Deep Learning perspective (and this is how they present the Pascal Titan) it is limited in the functions-operations it is capable of without the mixed-precision FP32/FP16 Cuda core, and from a cost/complexity perspective not many would want to have multiple different dedicated GPUs where some workloads can be shared on a device.

So IMO it is too early to say exactly what Cuda cores are in the GP102 until they release more information.
Cheers

With a bit of luck, we should have more informations soon during Siggraph, or in some weeks if some sites got them. But for be honest, i dont wait much on features side for it ( its just good marketing ).

High margin, low availability ( only from Nvidia shop ). This seems to me a good way for just get the gpu's just "there"...

Grall · Jul 23, 2016

Not a games programmer, but I could imagine that the shaders used in the NV30/DX9 time era are so simple that today's GPUs would bottleneck elsewhere long before FP16 had a chance to make an impact performance-wise. All you'd see is your graphics shading getting coarser due to not using 32bpp precision for its calculations...

DavidGraham · Jul 24, 2016

Why would that matter? a modern GPU can run the original F.E.A.R. (a very taxing DX9 game for it's time) at more than 400fps @1080p! I bet a Pascal Titan X can pump the same performance or more @4K!

agent_x007 · Jul 24, 2016

@Grall is probably right.
ROP/TMU/VRAM bandwidth (or CPU/RAM), would limit max. performance faster (but to the point of FP16 not relevant anymore... don't know).

I think FP16 could have helped in GPU's utilisation (and power usage), since it would enable to extract more performance from smaller ammount of resources.

Still - I don't know if FP16 in Pascal has anything to do with FP16 in old Shader Models - ie. is this even possible ?

@up I does not matter - true, but wouldn't that "be cool" if true

Razor1 · Jul 24, 2016

I think the CPU will be the bottleneck

Unless ya game on like 3 4k monitors !

GP102/GP100 FP16 support vs. Performance in old DirectX 8 and DirectX 9

agent_x007

homerdog

donator of the year

agent_x007

agent_x007

lanek

Malo

Yak Mechanicum

agent_x007

CSI PC

lanek

Grall

Invisible Member

DavidGraham

agent_x007

Razor1

Similar threads