Nvidia Turing Architecture [2018]

DavidGraham · Aug 27, 2019

3D Mark VRS feature test is online now, and 3D Mark is claiming up to 50% uplift from VRS!

Also Intel is claiming close to 40% uplift on their Gen11 iGPUs

JoeJ · Aug 27, 2019

trinibwoy said:
Cool I suppose but still feels like cheating. VRS makes the most sense to me for foveated rendering in VR.

Imagine each PC monitor had built in eye tracking. Could save multiple orders of magnitude of GPU power, and maybe also useful as input device replacing / enhancing mouse?
But at this point rasterization becomes questionable. Instead rendering multiple rectangles of varying resolution it would be more efficient to render a single low res image with non planar projection having continuous magnification towards the focus point, and upscale / blur / unproject from there. There is research to do this using tessellation (curved triangle edges necessary), but RT or splatting would beat this. (Current VRS would become obsolete as well, so i don't see it as a VR feature but one more thing to keep up with crazy 4K / 8K demands.)

techuse · Aug 27, 2019

pharma said:
NAS boosts Wolfenstein: Youngblood performance by up to 15%
"Using VRS [Variable Rate Shading], we created NVIDIA Adaptive Shading (NAS), which combines two forms of VRS into one content aware option"
August 26, 2019

https://www.reddit.com/r/nvidia/comments/cvr12h

NAS Deep Dive
https://www.nvidia.com/en-us/geforce/news/nvidia-adaptive-shading-a-deep-dive/

Out of everything new in Turing, VRS is the feature id like to see get the most use.

chris1515 · Aug 27, 2019

https://twitter.com/x/status/1166017727380803589

https://twitter.com/x/status/1166255375957614592

Funny

EDIT:

https://twitter.com/x/status/1166416683512320001

trinibwoy · Aug 27, 2019

fellix said:
The setup rate and fragment output is the same as in Pascal. The geometry processing (culling/tessellation/projection) throughput is a bit higher, since Turing has much more multi-processors to balance the workload within a GPC.

Under heavy tessellation, the TU104 slightly edges GP102:

Seems tessellation performance is still setup/culling bound. I would think at some point the rasterizers would be the bottleneck but that doesn’t seem to be the case. TU102 is 50% faster than GP102 with the same number of rasterizers.

milk · Aug 27, 2019

JoeJ said:
Imagine each PC monitor had built in eye tracking. Could save multiple orders of magnitude of GPU power, and maybe also useful as input device replacing / enhancing mouse?
But at this point rasterization becomes questionable. Instead rendering multiple rectangles of varying resolution it would be more efficient to render a single low res image with non planar projection having continuous magnification towards the focus point, and upscale / blur / unproject from there. There is research to do this using tessellation (curved triangle edges necessary), but RT or splatting would beat this. (Current VRS would become obsolete as well, so i don't see it as a VR feature but one more thing to keep up with crazy 4K / 8K demands.)

Does raaterization need to be done with a regular matrix of samples though? Can't it be reworked to support irregular and/or non-linear sampling across the screen. Things like MSAA, programable sampling locations, VRS and some other features already require some flexibility from GPUs on that part. Can't that be expanded further.
In fact, I've always wondered how all those features that alter sampling positions, fragment quantities and etc mesh with the rest of the gpu's rendering pipeline...

JoeJ · Aug 27, 2019

milk said:
Does raaterization need to be done with a regular matrix of samples though? Can't it be reworked to support irregular and/or non-linear sampling across the screen.

This would be an option. But it would still feel wrong. If you render a frame with fov close to 180, the content in the center has worst resolution and at the border the most. The opposite from what we want.
Inverting this with controlled sampling density would work, but i assume it would add too much complexity to rasterisation HW, which already seems bloated.
Sadly i can not find the paper another dev showed me. Can't remember exactly, but something like 500 x 500 pixels would be enough to reach 4K quality. I did not believe this at first, but how many words on screen can you read without moving the eyes? 3 or 4 in a row?
Is it still worth to have dedicated hardware, limited to just primary visibility, to render a 500 px frame? Likely not.
(I failed to convince you guys RT cores would not be necessary, so now i try to target ROPs, hahaha

)

milk · Aug 27, 2019

JoeJ said:
Inverting this with controlled sampling density would work, but i assume it would add too much complexity to rasterisation HW, which already seems bloated.
Sadly i can not find the paper another dev showed me.

How much more complexity is what I wonder. As I said, some of the groundwork is already there thanks to MSAA, VRS and such. How cool would it be if before we ditched rasterization altogether, we had a few more GPU gens with hardware accelerated but still highly programable rasterization. With amount of samples and their position being controllable with high granularity with compute. I'm talking rasterization shaders over here. Do to rasterization what Mesh Shaders/NGG is doing to the geometry pipeline. Kill specific pre-programmed MSAA and VRS modes and give devs the tools to come up with their own. Open up Early-Z and Zbuffer compression for devs too while at it. And can I get the same thing to be done with texturing too? Can I dream?

JoeJ said:
Can't remember exactly, but something like 500 x 500 pixels would be enough to reach 4K quality. I did not believe this at first, but how many words on screen can you read without moving the eyes? 3 or 4 in a row?
Is it still worth to have dedicated hardware, limited to just primary visibility, to render a 500 px frame? Likely not.
(I failed to convince you guys RT cores would not be necessary, so now i try to target ROPs, hahaha )

Ok, that changes everything. For 500px, we don't even need RT acceleration. Just general compute would do a better job at it. Maybe even a CPU might get similar performance to the damn GPU. That would throw everything we know about rendering out the window.
But then you remember you wanna be able to play games on your TV with your pals watching and your eye tracking goes to shit.
Keep those ROPs in there JoeJ...

JoeJ · Aug 28, 2019

milk said:
But then you remember you wanna be able to play games on your TV with your pals watching and your eye tracking goes to shit.

Totally old school! Wife can even serve the beer without clothes on because all the pals waering hip VR goggles all the time.

But you're likely right - we'll get texel shaders, rasterization shaders, subsample position shaders, subsample pixel micro shaders, hierarchical subsample pixel micro shaders and all this long before i get what i want

milk · Aug 28, 2019

JoeJ said:
we'll get texel shaders, rasterization shaders, subsample position shaders, subsample pixel micro shaders, hierarchical subsample pixel micro shaders and all this

I don't know if it was the naked wife serving beers or this second part that did it, but I got aroused. Just kidding, it was obviously the second part.

JoeJ · Aug 28, 2019

As you wish. In that case i'll give the only AR device i have to another pal

dobwal · Aug 28, 2019

milk said:
How much more complexity is what I wonder. As I said, some of the groundwork is already there thanks to MSAA, VRS and such. How cool would it be if before we ditched rasterization altogether, we had a few more GPU gens with hardware accelerated but still highly programable rasterization. With amount of samples and their position being controllable with high granularity with compute. I'm talking rasterization shaders over here. Do to rasterization what Mesh Shaders/NGG is doing to the geometry pipeline. Kill specific pre-programmed MSAA and VRS modes and give devs the tools to come up with their own. Open up Early-Z and Zbuffer compression for devs too while at it. And can I get the same thing to be done with texturing too? Can I dream?

Ok, that changes everything. For 500px, we don't even need RT acceleration. Just general compute would do a better job at it. Maybe even a CPU might get similar performance to the damn GPU. That would throw everything we know about rendering out the window.
But then you remember you wanna be able to play games on your TV with your pals watching and your eye tracking goes to shit.
Keep those ROPs in there JoeJ...

Nevermind that these games would kill YouTube streaming and VG review sites that heavily produce video content.

JoeJ · Aug 28, 2019

dobwal said:
Nevermind that these games would kill YouTube streaming

What a loss!

Streamers could still render the games full res on powerful HW, while the few that actually still prefer to play a game instead watching it, can play it on cheap low power HW.

But seriously, i guess having robust gaze tracking from a distace is hard. It might work for computer monitors eventually, but not so well for distant TV sets i guess. Surely more a VR/AR option. (...was just an idea because i've seen some Intel laptop with eye tracking.)
Also i think the savings on rendering would not be that dramatic:
I assume AA has to be very good, also (or mainly) at the borders of the screen to avoid flicker that we would perceive as moving objects. So, many sub samples.
For RT lighting you still need a high sample count as well, which is somehow independent of resolution.
World space based lighting methods to time slice GI would not benefit so much either.

Dictator · Aug 28, 2019

JoeJ said:
For RT lighting you still need a high sample count as well, which is somehow independent of resolution.

Couldn't it be done like BFV variable ray tracer except based on the areas of the screen you are looking it? Or ... in some other way... making the denoiser wokr extra hard and be more expensive on the focal area and less/expensive/less accurate in areas out of the viewer focus?

JoeJ · Aug 28, 2019

Dictator said:
ouldn't it be done like BFV variable ray tracer except based on the areas of the screen you are looking it? Or ... in some other way... making the denoiser wokr extra hard and be more expensive on the focal area and less/expensive/less accurate in areas out of the viewer focus?

Sure there are options, but you will not see a 10 times speedup just from going down to 500x500px.
I'm no expert here, but i see some intersting things, not obvious on the first thought:
When changing focus quickly to another section of the screen, it takes 1/4 second until i see that sharply. This is great because the previous low res results from that area should be still good enough to get going from there. Also the human perception of motion in the focused area is 'laggy' in comparison to the peripheral border area (coming from the primal need to detect dangerous animals quickly, they say).
So we need high spatial quality in the center but temporally stable results at the borders i guess. I assume laggy lighting is still acceptable everywhere because it's likely not so important to detect motion.

But the problem is we have bad neighborhood information in the borders, because pixels cover large solid angles. And this will break both denoising and TAA, which kinda defies the whole foveated rendering idea.
So this will not make high quality path tracing cheap - you'd just need to do more samples per pixel than before.
A solution would be something like prefiltered voxels for example. Here you could pick the voxel mip from the pixel solid angle and there would be no aliasing or flickering (see Cyril Crassins works before VCT - i don't say this is parctical, but there are not many options to get prefiltered graphics).
For the lighting some world space based methods have similar properties allowing such good filtering, and i assume this works well here. Though, this would not benefit from the lower resolution.
Still, the win could be: Expensive RT to get high frequency details like sharp reflections and hard shadows would be necessary only in the focused area at all, allowing for much higher quality in return. Requirement is both lighting techniques have to be accurate so they match and can be blended - VCT would fail here.

... far fetched random thoughts, ofc

But we see similar dilemmas aready now with DLSS: Even if we could do RT at quarter resolution and upscale just that while the rasterization happens at full resolution, we would loose samples for denoising. So the current standard to upscale the whole frame instead seems a compromise between a lot of things.

PSman1700 · Aug 28, 2019

chris1515 said:
https://twitter.com/x/status/1166017727380803589

https://twitter.com/x/status/1166255375957614592

Funny

EDIT:

https://twitter.com/x/status/1166416683512320001

Funny in the way that the PS2 wasn't really the way forward nor developer friendly. Hardware vertex shaders where though. Voodoo6000 that never really made it to the market was trying the same approach.

Kaotik · Aug 28, 2019

PSman1700 said:
Funny in the way that the PS2 wasn't really the way forward nor developer friendly. Hardware vertex shaders where though. Voodoo6000 that never really made it to the market was trying the same approach.

Huh? Voodoo 5 6000 had exact same features as 4500/5500 which made it to the markets. If you're referring to Rampage instead, that was supposed to have PS and VS 1.0

PSman1700 · Aug 28, 2019

Kaotik said:
Huh? Voodoo 5 6000 had exact same features as 4500/5500 which made it to the markets. If you're referring to Rampage instead, that was supposed to have PS and VS 1.0

Now i'm unsure.... wasn't it one of the unreleased voodoo products doing PS2-like rendering in some way?

DavidGraham · Aug 28, 2019

Turing GPUs gain close to 50% uplift in the VRS 3DMark feature test.

https://hothardware.com/news/3dmark-variable-rate-shading-test-performance-gains-gpus

jlippo · Aug 29, 2019

DavidGraham said:
Turing GPUs gain close to 50% uplift in the VRS 3DMark feature test.

https://hothardware.com/news/3dmark-variable-rate-shading-test-performance-gains-gpus

The test is Tier1 VRS, so Intel should see similar speedup as well.
It is also somewhat visible and would be preferable to be mixed with TAA or something to mask it.

In VRS talks there was nice ideas to use it for areas with strong DoF which at least in older version of UE4 was very expensive in itself.
Certainly a nice tool to have.

Nvidia Turing Architecture [2018]

DavidGraham

JoeJ

techuse

chris1515

trinibwoy

Meh

milk

Like Verified

JoeJ

milk

Like Verified

JoeJ

milk

Like Verified

JoeJ

dobwal

JoeJ

Dictator

JoeJ

PSman1700

Kaotik

Drunk Member

PSman1700

DavidGraham

jlippo

Similar threads