Nvidia Pascal Announcement

Great! Don't make us wait! :-|

Does FP16x2 exist in GP104 or are we getting an old FP32-only architecture thrown at our feet and making us cry?
Truthfully I'm still waiting on an official comment from NVIDIA on this. I have my answer, but I'm kind of afraid you guys (or someone reading this) are going to go ape before I have a chance to write something enlightened on the matter as part of the full GTX 1080 review.
 
I have my answer, but I'm kind of afraid you guys (or someone reading this) are going to go ape

I'm still surprised SiSoft's benchmark worked as I'm staring at FP16x2 code right now that won't compile arithmetic operations unless you explicitly target sm_53 or compute_53.

We await your writeup! :)
 
Damien's excellent review specifically says no FP16 except for P100. :(
Le support du FP16 est spécifique au GPU computing et par conséquent n'est pas présent sur Pascal G.

Hiroshige Goto specifically says yes FP16 for GP104. :)
Pascalで導入された、FP16の2-way SIMD(Single Instruction, Multiple Data)仕様は、GP104でも引き継がれるだろう。

The Pascal parallel_forall blog post has noncommittal "P100 ISA" word choices in the fp16 discussion as opposed to "Pascal".
The GP100 SM ISA provides new arithmetic operations that can perform two FP16 operations at once on a single-precision CUDA Core, and 32-bit GP100 registers can store two FP16 values.
 
Last edited:
If fp16x2 support is not in GP104 then I'm actually impressed.

Why? Because NVIDIA is demonstrating their tremendous engineering and marketing discipline.

They have the silicon and tool chains ready to go but decided against including the feature.

What would it have gained them with game playing consumers?

Yet GP100 and embedded customers are willing to pay for it for machine learning and vision performance.

I'm still hoping the feature is in GP104 but its absence from marketing materials all but guarantees it's not.

Can you imagine the near flawless NVIDIA marketing machine saying, "Oh shoot, we forgot to trumpet a feature that would've let us claim 16.4 TFLOPS of fp16x2 FMAs!"

I cannot.
 
Last edited:
Back to the "Simultaneous multi projection" and "VR" for a moment. Please correct me when I'm wrong at some point.

  1. Both AMD and Nvidia support viewport arrays and geometry shaders.
  2. I can apply the world space to screen space projection as late as in the geometry shader.
  3. A traditional viewport can only be rectangular and aligned in 90° steps, but may be offset and scaled arbitrarily inside a buffer.
  4. I'm not actually forced to subdivide the screen space like Nvidia did. I can as well just subdivide it into rectangular viewports.
So, as long as I can construct it as such that the output viewports are all rectangular, and the rectangles don't overlap, I don't actually need the new hardware support?

I can achieve the same additional 30-40% speedup Nvidia achieved from eliminating oversampling perfectly well also with rectangular viewports and legacy hardware.

I have some loss as the outer rectangles have to be masked partially, but as long as I don't traverse them during post processing, and the geometry was already properly culled before, that's not a major issues.


OK, so if I were to develop an VR application, I would be pretty stupid to use the irregular shaped viewports as Nvidia suggested in their demonstration, wouldn't I? To me it looks as if I were better off using the legacy compatible option with rectangular viewports, accept perhaps a 10-15% overdraw during g-buffer creation if the geometry wasn't culled properly, but in general achieve the same savings without being dependent on an exclusive hardware feature.

Do I have any mistake in my logic, or are the distorted viewports actually just something you would want to *avoid* in general, with regard to portability?
 
What you are describing is very similar to NVIDIA multirate shading tech in VRWorks already supported on Maxwell http://www.pcworld.com/article/2926...ding-tech-could-help-vr-reach-the-masses.html

It uses NVIDIA fast geometry shader path, so it won't run very well on current AMD HW. They also do per viewport triangle culling in the geometry shader.

Lens matched shading is likely a better approximation to lens distortion than 9 or more viewports, so it might give you better perf and better image quality.
 
GTX 1080: What's not being discussed.

Some very valid points:

1 - Obvious BS with "faster than 980 SLI" general claims.

2 - Initial "9 TFLOPs" number ninja-edited to 8.2 TFLOPs after the presentation and before the reviews

3 - GTX 1080 results absent from AOTS benchmark database

4 - "Async Compute" claimed everywhere, but zero performance gain observed from the only game that uses it (maybe AOTS for nvidia is still using the old dedicated nvidia path without async enabled, so there's some benefit of the doubt in here IMO)

5 - Rise of the Tomb Raider being benched everywhere in an admittedly (by the devs themselves) broken DX12 mode. Is every reviewer out there so damn ignorant regarding this case?.

6 - New SLI bridges are not compatible with the old ones and are not bundled with the new cards, cost $30 and are rigid. This means if you want to do SLI, pay another $30. Change motherboards with different spacing, pay another $30.

7 - Where is Doom's Vulkan mode? It was available for a live demo 2 weeks prior to the launch but it wasn't available for launch?
I wonder what the performance upgrades between IHVs will be for an API whose origin is a fork of Mantle...

8 - This one is the funniest:
When the Fury and Fury X came out, every reviewer tested with the factory-overclocked (and some even manually overclocked) 980 and 980 Ti cards because that's what they had in their hands. Come the time to review the GTX 1080, magically everyone has stock-clocked 980 and 980 Ti cards to compare to.
 
Last edited by a moderator:
Tottentranz you might want to do some research about the guy that wrote that article pretty sure he is on Overclockers.net forums......
 
Back
Top