Nvidia Pascal Announcement

Razor1 · May 17, 2016

http://www.timeanddate.com/countdow...0001&p0=886&msg=NVIDIA+NDA+Lift&font=sanserif

trinibwoy · May 17, 2016

Since when do NDAs lift at midnight? Thought it was 6am PST.

Ext3h · May 17, 2016

Yep. The countdown just expired and nobody published anything yet.

Osamar · May 17, 2016

CSI PC said:
Would be interesting if someone could run on this on a Fermi GPU, to see if there is a difference between the architectures on this.
Cheers

If someone can upload or point to an executable, I would try with my GTX 560.

Deleted member 13524 · May 17, 2016

Ext3h said:
Yep. The countdown just expired and nobody published anything yet.

The countdown is now a countup.

Ryan Smith · May 18, 2016

pixelio said:
Great! Don't make us wait!

Does FP16x2 exist in GP104 or are we getting an old FP32-only architecture thrown at our feet and making us cry?

Truthfully I'm still waiting on an official comment from NVIDIA on this. I have my answer, but I'm kind of afraid you guys (or someone reading this) are going to go ape before I have a chance to write something enlightened on the matter as part of the full GTX 1080 review.

pixelio · May 18, 2016

Ryan Smith said:
I have my answer, but I'm kind of afraid you guys (or someone reading this) are going to go ape

I'm still surprised SiSoft's benchmark worked as I'm staring at FP16x2 code right now that won't compile arithmetic operations unless you explicitly target sm_53 or compute_53.

We await your writeup!

spworley · May 18, 2016

Damien's excellent review specifically says no FP16 except for P100.

Le support du FP16 est spécifique au GPU computing et par conséquent n'est pas présent sur Pascal G.

Hiroshige Goto specifically says yes FP16 for GP104.

Pascalで導入された、FP16の2-way SIMD(Single Instruction, Multiple Data)仕様は、GP104でも引き継がれるだろう。

The Pascal parallel_forall blog post has noncommittal "P100 ISA" word choices in the fp16 discussion as opposed to "Pascal".
The GP100 SM ISA provides new arithmetic operations that can perform two FP16 operations at once on a single-precision CUDA Core, and 32-bit GP100 registers can store two FP16 values.

Razor1 · May 18, 2016

LOL in English! hehe love that line from Usual Suspects.

RecessionCone · May 18, 2016

My understanding is that only GP100 has 2x FP16 rate. All the other Pascals do not.

Razor1 · May 18, 2016

Pretty sure that is true

xpea · May 18, 2016

RecessionCone said:
My understanding is that only GP100 has 2x FP16 rate. All the other Pascals do not.

so GP100 has "dual speed" FP16
Tegra Pascal and GP106 also have it (Drive PX2 platform)
but still no words on GP104... odd...

spworley · May 18, 2016

xpea said:
so GP100 has "dual speed" FP16
Tegra Pascal and GP106 also have it (Drive PX2 platform).

And Maxwell based Tegra X1 also has it.

pixelio · May 18, 2016

If fp16x2 support is not in GP104 then I'm actually impressed.

Why? Because NVIDIA is demonstrating their tremendous engineering and marketing discipline.

They have the silicon and tool chains ready to go but decided against including the feature.

What would it have gained them with game playing consumers?

Yet GP100 and embedded customers are willing to pay for it for machine learning and vision performance.

I'm still hoping the feature is in GP104 but its absence from marketing materials all but guarantees it's not.

Can you imagine the near flawless NVIDIA marketing machine saying, "Oh shoot, we forgot to trumpet a feature that would've let us claim 16.4 TFLOPS of fp16x2 FMAs!"

I cannot.

RecessionCone · May 18, 2016

xpea said:
so GP100 has "dual speed" FP16
Tegra Pascal and GP106 also have it (Drive PX2 platform)
but still no words on GP104... odd...

GP106 does not have dual speed FP16. The DL ops in Drive PX2 are not FP16.

Ext3h · May 18, 2016

Back to the "Simultaneous multi projection" and "VR" for a moment. Please correct me when I'm wrong at some point.

Both AMD and Nvidia support viewport arrays and geometry shaders.
I can apply the world space to screen space projection as late as in the geometry shader.
A traditional viewport can only be rectangular and aligned in 90° steps, but may be offset and scaled arbitrarily inside a buffer.
I'm not actually forced to subdivide the screen space like Nvidia did. I can as well just subdivide it into rectangular viewports.

So, as long as I can construct it as such that the output viewports are all rectangular, and the rectangles don't overlap, I don't actually need the new hardware support?

I can achieve the same additional 30-40% speedup Nvidia achieved from eliminating oversampling perfectly well also with rectangular viewports and legacy hardware.

I have some loss as the outer rectangles have to be masked partially, but as long as I don't traverse them during post processing, and the geometry was already properly culled before, that's not a major issues.

OK, so if I were to develop an VR application, I would be pretty stupid to use the irregular shaped viewports as Nvidia suggested in their demonstration, wouldn't I? To me it looks as if I were better off using the legacy compatible option with rectangular viewports, accept perhaps a 10-15% overdraw during g-buffer creation if the geometry wasn't culled properly, but in general achieve the same savings without being dependent on an exclusive hardware feature.

Do I have any mistake in my logic, or are the distorted viewports actually just something you would want to *avoid* in general, with regard to portability?

renderstate · May 18, 2016

What you are describing is very similar to NVIDIA multirate shading tech in VRWorks already supported on Maxwell http://www.pcworld.com/article/2926...ding-tech-could-help-vr-reach-the-masses.html

It uses NVIDIA fast geometry shader path, so it won't run very well on current AMD HW. They also do per viewport triangle culling in the geometry shader.

Lens matched shading is likely a better approximation to lens distortion than 9 or more viewports, so it might give you better perf and better image quality.

Deleted member 13524 · May 18, 2016

GTX 1080: What's not being discussed.

Some very valid points:

1 - Obvious BS with "faster than 980 SLI" general claims.

2 - Initial "9 TFLOPs" number ninja-edited to 8.2 TFLOPs after the presentation and before the reviews

3 - GTX 1080 results absent from AOTS benchmark database

4 - "Async Compute" claimed everywhere, but zero performance gain observed from the only game that uses it (maybe AOTS for nvidia is still using the old dedicated nvidia path without async enabled, so there's some benefit of the doubt in here IMO)

5 - Rise of the Tomb Raider being benched everywhere in an admittedly (by the devs themselves) broken DX12 mode. Is every reviewer out there so damn ignorant regarding this case?.

6 - New SLI bridges are not compatible with the old ones and are not bundled with the new cards, cost $30 and are rigid. This means if you want to do SLI, pay another $30. Change motherboards with different spacing, pay another $30.

7 - Where is Doom's Vulkan mode? It was available for a live demo 2 weeks prior to the launch but it wasn't available for launch?
I wonder what the performance upgrades between IHVs will be for an API whose origin is a fork of Mantle...

8 - This one is the funniest:
When the Fury and Fury X came out, every reviewer tested with the factory-overclocked (and some even manually overclocked) 980 and 980 Ti cards because that's what they had in their hands. Come the time to review the GTX 1080, magically everyone has stock-clocked 980 and 980 Ti cards to compare to.

trinibwoy · May 18, 2016

Maybe folks aren't discussing those things because they're irrelevant / unimportant?

It's not like people bought cards and then reviews came out weeks after. Nobody is being hoodwinked, at least not yet.

Razor1 · May 18, 2016

Tottentranz you might want to do some research about the guy that wrote that article pretty sure he is on Overclockers.net forums......

Nvidia Pascal Announcement

Razor1

trinibwoy

Meh

Ext3h

Osamar

Deleted member 13524

Guest

Ryan Smith

pixelio

spworley

Razor1

RecessionCone

Razor1

xpea

spworley

pixelio

RecessionCone

Ext3h

renderstate

Deleted member 13524

Guest

trinibwoy

Meh

Razor1

Similar threads