IHV Preferences, Performance over Quality? Spawn

Locuza · Mar 14, 2017

sebbbi said:
[...]Doesn't really matter in practice since all FL 12_1 GPUs support all three features.

Currently every FL12_1 capable device does support TR3 (Maxwell v2/Pascal, Intel Gen 9) but hypothetically an IHV could only support FL12_1 without TR3 capability.

Too bad that stencil ref output is not supported by Nvidia. Both AMD and Intel support it.

If you have time could you explain to a layman what possible usage such feature has?

You can't use FL 11_1 features on Kepler. If you query for 11_1 feature support, you get nothing. Even those features that are actually supported by the hardware cannot be used. DX 12 seems to be different than previous DX versions in this regard. All the new features are marked "optional" for FL 12_0 and FL 11_1. Doesn't really matter in practice since there's no existing FL 12_0 or 11_1 hardware with support to new features introduced in FL 12_1. There's also the problem that DX 11.3 API is required for most of the new FL 11_1 and FL 11_0 optional features, and 11.3 API doesn't exist on Windows 7. So these new optional features can't be used on the most important gaming OS.

It's important to make a distinguish here, between DX 11.x/12 and its feature levels.
For example DX11.1 brought some updates in general like partial constant buffer updates which for example DICE uses in their Frostbite engine.
https://developer.nvidia.com/content/constant-buffers-without-constant-pain-0
That's something every DX11.1 capable device supports even without FL11_1.
DX11.2 brought Tiled Resources Tier 1 and Tier 2, these features where optional, under Kepler you would use DX11.2 with FL11_0 asking for Tiled Resources Tier 1.

DX12 made much more features optional.
I heard that the old DX9 days where full of cap-bits, DX10 had a very strict baseline everyone needed to support, now we got this more relaxed compromise.

In practise you could use ROVs under Haswell and Broadwell which only support FL11_1.
Forza Motosport: Apex uses FL11_0 with a strict requirement of Resource Binding Tier 2, these way it doesn't run on Haswell/Broadwell but on Kepler and of course GCN.

Sadly Vulkan doesn't support CR or ROVs currently and I believe there aren't even vendor extension for these features out there.

Dodecahedron · Mar 14, 2017

sebbbi said:
To explain me better: a pixel is a rectangle. The projection of the rectangle in the UV space forms a trapezoid (assuming the surface is planar). You want to take a sum (integral) of all the (sub-)texels inside this trapezoid and divide the result by the area.

I disagree. What you describe is anti-aliasing using a box filter but there are other filters that are more effective at removing aliasing. Those filters have a non-constant weight function which does not immediately go to zero at the pixel boundary.

We don't still know whether the AMD shimmering was a hardware bug or intentional optimization to improve performance.

Agreed.

Blurry output usually means lower resolution data is used = faster, but shimmering is harder to judge.

A blurry image is also a bit difficult to judge. An inadequately filtered image may appear "sharper" even if it is less detailed than the "blurry" one. If there are details missing in the blurry image, it is certainly reasonable to suspect using low resolution data.

MDolenc · Mar 14, 2017

sebbbi said:
To explain me better: a pixel is a rectangle. The projection of the rectangle in the UV space forms a trapezoid (assuming the surface is planar). You want to take a sum (integral) of all the (sub-)texels inside this trapezoid and divide the result by the area. IIRC hardware anisotropic filtering simply takes up to N trilinear samples along the UV gradient using up to -log2(N) mip bias. N is basically log2 difference between U and V gradient. This is not trapezoid, but a pretty good cheap approximation.

But a pixel is not a little square! Yes with bilinear filtering a rectangle comes out sort of naturally, but bilinear is not the most natural thing to start with. Especially with anisotropic filtering this really comes out. It's not trapezoid but ellipses.

sebbbi · Mar 15, 2017

Dodecahedron said:
I disagree. What you describe is anti-aliasing using a box filter but there are other filters that are more effective at removing aliasing. Those filters have a non-constant weight function which does not immediately go to zero at the pixel boundary.

MDolenc said:
But a pixel is not a little square! Yes with bilinear filtering a rectangle comes out sort of naturally, but bilinear is not the most natural thing to start with. Especially with anisotropic filtering this really comes out. It's not trapezoid but ellipses.

Anisotropic filtering is unfortunately not a well defined term. Since bilinear and trilinear are both linear filters (linear filter kernel), the programmer would assume that anisotropic is also a linear filter, but with added angle-dependency. The API doesn't imply anything else.

We all agree that linear ramp and box are not the best filter kernels for most purposes. Cubic filter is often better than linear. However, if an IHV used bicubic filter when I asked bilinear, I would consider it a driver bug (as I requested bilinear). Obviously the cubic would look better in most cases, but it is important that we know what is the target spec. The question here is: is the target spec anisotropic linear filtering or is anisotropic something else. It is hard to judge implementations if you don't know what this filter is trying to achieve.

There are uses for constant weight kernels. For example standard MSAA resolve (including per sample shading = supersamping) uses constant weight for all samples. When a pixel is in a slope (= anisotropic filtering), MSAA sampling pattern forms roughly a trapezoid shape in UV space. See 16x MSAA pattern here: https://msdn.microsoft.com/en-us/library/windows/desktop/ff476218(v=vs.85).aspx. AMD has added alternative MSAA resolve modes (such as narrow/wide tent filters) that reduce aliasing, but obviously the downside is added image softness. Wider filters with soft falloff are not without downsides. To reduce the softness, you can add negative lobes. But negative lobes cause ringing and halos. It is all about compromise.

sebbbi · Mar 16, 2017

Locuza said:
If you have time could you explain to a layman what possible usage such feature has?

It is nice when you want to write programmatical stencil patterns for [earlydepthstencil]. Many tricks need odd stencil patterns

Example (brand new GDC presentation):
http://gdcvault.com/play/1024476/Higher-Res-Without-Sacrificing-Quality

I haven't tested programmable stencil output when stencil culling is active at the same time. I don't believe this works, since the pixel shader needs to run before the stencil test (as pixel shader outputs the stencil value). However in some stencil modes, the GPU could cull before the pixel shader. For example greater test and stencil mode is add (non-wrapping). This would be useful for visibility percentage counting. We had a front-to-back particle renderer that counted overdraw to stencil buffer and culled all overdraw after some number. However this isn't perfect since stencil can only add a constant value. We could have wanted to add transparency percentage (read from the texture to properly estimate coverage). With programmable stencil you can do this. But as I said, I am not sure that the GPU actually stencil culls any pixels if programmable stencil output is enabled.

Deleted member 2197 · Apr 30, 2017

"Global Illumination is the Most Challenging Present"

Golem.de: What is currently the largest bottle neck in the game graphics?

Khan: Global Illumination is the big challenge we are still working on as an industry. Anything that has to do with the dynamic representation of light. But we still have to do a lot of basic research.

https://www.golem.de/news/id-softwa...die-groesste-herausforderung-1704-127490.html

Next id Tech relies on FP16 calculations

One of the most important features of the upcoming id tech is to be half-precision (FP16). Current graphics cards use almost exclusively simple accuracy (FP32), in the automotive and mobile sector as well as in deep learning, the focus is on FP16. If the precision is sufficient, the performance can be increased compared to FP32. The shader units are better utilized because less register memory is required. According to Billy Khan, the current id tech 6 is far more ALU-wide than bandwidth, which is why it promises great advantages from FP16.

https://www.golem.de/news/id-softwa...massiv-auf-fp16-berechnungen-1704-127494.html

swaaye · May 4, 2017

It is so bizarre reading excited news about FP16. Back with the olden times of NV30 that performed like shit unless you used fixed point or FP16, FP16 usage caused worries of reduced visual fidelity with shader replacements et al. And here we go with FP16 being the bright new future....

It seems strange to me that FP16 was unappreciated back then. Was it caused by ATI never supporting it? Their post-D3D8 parts were all FP24 or 32 right?

Lightman · May 5, 2017

swaaye said:
It is so bizarre reading excited news about FP16. Back with the olden times of NV30 that performed like shit unless you used fixed point or FP16, FP16 usage caused worries of reduced visual fidelity with shader replacements et al. And here we go with FP16 being the bright new future....

It seems strange to me that FP16 was unappreciated back then. Was it caused by ATI never supporting it? Their post-D3D8 parts were all FP24 or 32 right?

FP16 issue back in old days of NV30 was down to it being used throughout rendering pipeline. This obviously caused quite visible quality disparities compared to FP24 or FP32 rendering pipes. Modern GPU's and engines will try to use FP16 when suitable, avoiding quality issues and gaining performance.

Razor1 · May 5, 2017

Hmm I don't think they used FP16, actually it looked like Int8, cause after the 6 series came out, nV stated they finally fixed the driver bug to get FP 16 working right lol, and there was very little performance hit (2% difference) and the blocky lighting artifacts were gone too lol.

3dcgi · May 5, 2017

One reason dual precision makes more sense now than then is transistors are much cheaper now.

swaaye · May 5, 2017

Razor1 said:
Hmm I don't think they used FP16, actually it looked like Int8, cause after the 6 series came out, nV stated they finally fixed the driver bug to get FP 16 working right lol, and there was very little performance hit (2% difference) and the blocky lighting artifacts were gone too lol.

I'm not even sure which games used FP16. Most companies seemed to ignore it and ran D3D8 shaders if a NV3x card was in use because that was really the only way to make them perform anyway.

lanek · May 28, 2017

honestly you can play them with onlyl one 1070 .. sell the other one as it seems Nvidia have definittively forget SLI.. ( and i was part of the first "ALPHA" tester of SLI witth the 6600GT SLI editions ( before it become offcial ) ... period ...

Deleted member 13524 · Jun 1, 2017

swaaye said:
It is so bizarre reading excited news about FP16. Back with the olden times of NV30 that performed like shit unless you used fixed point or FP16, FP16 usage caused worries of reduced visual fidelity with shader replacements et al. And here we go with FP16 being the bright new future....

It seems strange to me that FP16 was unappreciated back then. Was it caused by ATI never supporting it? Their post-D3D8 parts were all FP24 or 32 right?

IIRC, DirectX 9 with Pixel Shader 2.0 only supported FP24 Pixel Shaders and R300 cards only supported that.
nvidia came later with the FX series that supported either FP16 or FP32 pixel shaders, through a patch on DX9 (DX9a). FP16 pixel shaders ran fast but FP32 ones were too bandwidth-heavy at the time, and existing FP24 pixel shaders had to be promoted to FP32.

Their problem was two-fold:
1 - At the time most high-profile DX9 games (namely Half Life 2) had been developed with DX9(non-a) with mind, using FP24 throughout the whole engine. This made the game run like shit on the FX series that had to run all pixel shaders promoted to FP32.

2 - Choosing between FP16 and FP32 for each pixel shader means more development time and costs. The big thing about HLSL/GLSL for developers at the time was getting pixel shader effects working without much effort, and having to do trial-and-error with each precision mode in each pixel shader was quite a bit away from that mindset (e.g. Doom 3 developers didn't follow that trend and the game ran great on FX cards).
Plus, ATi had already gained a very sizeable chunk of the DX9 card market regardless, so developing a nvidia-only DX9a version that needed more work than the baseline DX9 version would result in small-ish returns in terms of sales.

swaaye · Jun 1, 2017

ToTTenTranz said:
IIRC, DirectX 9 with Pixel Shader 2.0 only supported FP24 Pixel Shaders and R300 cards only supported that.
nvidia came later with the FX series that supported either FP16 or FP32 pixel shaders, through a patch on DX9 (DX9a). FP16 pixel shaders ran fast but FP32 ones were too bandwidth-heavy at the time, and existing FP24 pixel shaders had to be promoted to FP32.

Their problem was two-fold:
1 - At the time most high-profile DX9 games (namely Half Life 2) had been developed with DX9(non-a) with mind, using FP24 throughout the whole engine. This made the game run like shit on the FX series that had to run all pixel shaders promoted to FP32.

2 - Choosing between FP16 and FP32 for each pixel shader means more development time and costs. The big thing about HLSL/GLSL for developers at the time was getting pixel shader effects working without much effort, and having to do trial-and-error with each precision mode in each pixel shader was quite a bit away from that mindset (e.g. Doom 3 developers didn't follow that trend and the game ran great on FX cards).
Plus, ATi had already gained a very sizeable chunk of the DX9 card market regardless, so developing a nvidia-only DX9a version that needed more work than the baseline DX9 version would result in small-ish returns in terms of sales.

Yeah I recall much of that now... But even FP16 had poor performance on NV3x. I don't know if you remember it but Valve did a controversial presentation on how there was no way to optimize PS 2.0 shaders for NV3x and get useful results. Instead with Source engine games the FX cards run the DirectX 8.1 (PS 1.4) path. One can force D3D 9 but it has a horrendous speed impact.

Hey I actually made a video of it. Even with later drivers NV didn't do any shader replacement hacking on their own to make it run acceptably.

Doom3 has a focus on fillrate and texturing instead of computationally intensive shader programs. They use texture lookup tables for some effects. NV brought in enhanced stencil / Z capabilities with NV35+ for the game too. Seems like id essentially built the game for NV3x.

Deleted member 13524 · Jun 1, 2017

swaaye said:
Yeah I recall much of that now... But even FP16 had poor performance on NV3x. I don't know if you remember it but Valve did a controversial presentation on how there was no way to optimize PS 2.0 shaders for NV3x and get useful results. Instead with Source engine games the FX cards run the DirectX 8.1 (PS 1.4) path. One can force D3D 9 on them but it has a horrendous impact on performance.

Doom3 has a focus on fillrate and texturing instead of computationally intensive shader programs. They use texture lookup tables for some effects too. Humus here actually figured out a way to move some of that back into the shader ALUs and gained performance on ATI.

Well there were other factors that contributed to Geforce 5's flop, like nvidia being a first adopter of TSMC's 130nm and the process ended up being much worse than a matured 150nm at the time. Perhaps nvidia just made longer pipelines and expected to get much higher clocks.

swaaye · Jun 1, 2017

ToTTenTranz said:
Well there were other factors that contributed to Geforce 5's flop, like nvidia being a first adopter of TSMC's 130nm and the process ended up being much worse than a matured 150nm at the time. Perhaps nvidia just made longer pipelines and expected to get much higher clocks.

Well maybe. One does have to wonder why they didn't build a 8 pixel per clock chip at that point. 130nm did help them get their 4 pixel per clock chip to 500 MHz though and that was only reason it was somewhat competitive against 9700 Pro. That and brand new 500 MHz DDR2 instead of just going 256-bit with old DDR1. Strangely risky design decisions. I suppose they just underestimated ATI, a company that had been poorly executing until 9700.

They didn't mess around with NV40 and went to 16 pixels per clock there. And after that they just kept shooting the parallelism up with each generation and blew past ATI in that area.

Silent_Buddha · Jul 3, 2017

F-U NVidia and your crappy ass driver. I'm so F-ing mad right now. Lost hours worth of work because of the piece of shite. Also, F-U MS and your piece of crap Edge browser, at least Chrome has the decency to just say it's out of memory when the NV driver F's up (NV's driver takes down one Window of Edge when it craps out, but leaves the others up, WTF?) and IE just sits there and doesn't respond when it happens. Love that the NV driver is more than happy to take down Explorer as well. Yay!

OMG, I wish I didn't have to choose between a stable AMD driver and card but crappy game performance or a much faster NV card, but crappy unstable drivers (if you are you like me and don't reboot for weeks at a time). %@#$%@#%@#$

Maybe I am going to have to bite the bullet and make a new rig just for gaming and keep the AMD card in my work machine permanently. At least then the crappy NV driver won't cause me to lose hours worth of work. /sigh.

Once I cool down I'll probably regret posting this, so if a mod deletes it, I don't care. But Oh My God am I so mad right now.

Regards,
SB

Deleted member 2197 · Jul 20, 2022

Silent_Buddha said:
Maybe I should just make a new post on the idiocy of NVidia.

I just put the 1070 back in because a friend of mine is going to lend me his Oculus Rift for a little bit since it's just been gathering dust over at his house. And I'm immediately reminded of all the idiotic things about this card.

Like, for instance, what idiot at NVidia thought it would be a good idea to not show the bootup POST process on a DP monitor if another monitor is connected? My main monitor is hooked up through DP and my secondary monitor (in portrait mode) is hooked up through DVI. That makes it a bit of a pain if I ever need to go into the BIOS for any reason, which thankfully isn't that often.
Regards,
SB

Did you ever figure out how to get the DP monitor to show the bootup PRE/POST process? I encountered similar issue where entire boot process would show on HDMI port until reaching Windows.

I was fooling around with an GTX 780 that had similar setup (DP to monitor, HDMI to TV) and found if I went into bios changed CSM (Compatible Support Module) from Enabled to Auto the entire boot process would display on the DP monitor. Fixed the issue though would never have guessed the solution was related to UEFI device selection.

Silent_Buddha · Jul 20, 2022

pharma said:
Did you ever figure out how to get the DP monitor to show the bootup PRE/POST process? I encountered similar issue where entire boot process would show on HDMI port until reaching Windows.

I was fooling around with an GTX 780 that had similar setup (DP to monitor, HDMI to TV) and found if I went into bios changed CSM (Compatible Support Module) from Enabled to Auto the entire boot process would display on the DP monitor. Fixed the issue though would never have guessed the solution was related to UEFI device selection.

Interesting. I don't currently have a DP display connected to my main PC so I can't try that out to see if that would have fixed the issue. Right now with 2 HDMI display, it just depends on which HDMI port the main monitor is connected to.

Regards,
SB

IHV Preferences, Performance over Quality? Spawn

Locuza

Dodecahedron

MDolenc

sebbbi

sebbbi

Deleted member 2197

Guest

swaaye

Entirely Suboptimal

Lightman

Razor1

3dcgi

swaaye

Entirely Suboptimal

lanek

Deleted member 13524

Guest

swaaye

Entirely Suboptimal

Deleted member 13524

Guest

swaaye

Entirely Suboptimal

Silent_Buddha

Deleted member 2197

Guest

Silent_Buddha

Similar threads

IHV Preferences, Performance over Quality? *Spawn*

Deleted member 2197

Guest

Entirely Suboptimal

Entirely Suboptimal

Deleted member 13524

Guest

Entirely Suboptimal

Deleted member 13524

Guest

Entirely Suboptimal

Deleted member 2197

Guest

Similar threads

IHV Preferences, Performance over Quality? Spawn