A1xLLcqAgt0qc2RyMz0y
Veteran
And it's not going to have everything everyone wanted. But Nate is juggling this and benchmarking, so it's a whole lot of plates to spin at once.
I there a video of him juggling those spinning plates
And it's not going to have everything everyone wanted. But Nate is juggling this and benchmarking, so it's a whole lot of plates to spin at once.
Is Variable Rate Shading just Rapid Packed Math on steroids?Variable rate shading, mesh shading and texture space shading if easily used may become quite nice combination.
Isn’t RPM just 2xFP16 squeezed into FP32?Is Variable Rate Shading just Rapid Packed Math on steroids?
Isn’t RPM just 2xFP16 squeezed into FP32?
I don’t see the link with VRS? It seems to be completely different.
I see, I though it works through double FP16 which Turing supports in the consumer line now.It's a more flexible evolution of Multi-Res shading.
The concept in the slides at least seem to point to shading per a given grouping of pixels (I think).I see, I though it works through double FP16 which Turing supports in the consumer line now.
It offloads some of the load on the CPU to the GPU to increase the number of drawn objects on screen. It also has a LOD management system that works through automatic adaptive Tessellation. It can also modify and manipulate geometry on the fly, as shown in the Spherical Cutaway example in the white paper. Where the mesh shader is culling and modifying geometry based on its position relative to the sphere.
So while Vega's primitive shaders are focused more on accelerating current geometry processing as a means to improve AMD's shortcomings in that area, Turing's mesh shaders build on NVIDAI's lead in geometry processing to enable more stuff on screen and are aimed more at enhancing some of it's quality and flexibility.
Furthermore, traditional geometry pipelines discard primitives after vertex processing is completed, which can waste computing resources and create bottlenecks when storing a large batch of unnecessary attributes. Primitive shaders enable early culling to save those resources.
There is a similarity in how the pipelines go from *fixed* *programmable* *programmable* *fixed* *programmable* *programmable* *fixed* to *fixed* *programmable* *fixed* *programmable* *fixed*.But if you compare diagrams they look now the same.
The asteroid demo's change was having the front-end shader select a different variant of the model based on how much detail was actually necessary, not reading in and then culling out non-contributing triangles with extra shader code on top of the existing shaders. The primitive shader is taking orders from a standard draw call where the decision making was done earlier, not selecting different models on the fly.And if you look at the Turing asteroid tech Demo they also talk about the huge amount of polygons.
Which talented programmer? Who even made such a preposterous claim?Right now, considering talented programmers can get the same level of raytracing performance out of a 1080ti as Nvidia claims can come out of their new RTX cards, well consider me unimpressed.
I guess he is talking about Distance Field rt, or some kind of voxel cone tracing, or some sphere tracing. Obviously, he doesn't realise there is a difference once we speak about polygon soup tracing, which current games are.Which talented programmer? Who even made such a preposterous claim?
I was entertaining the thought, that dedicated INT32 cores might consume less energy doing their INT32 work than having to shove this through the FP32 pipe. Welcome to the world of energy over space. I might be wrong though, but further idle thoughts came up, letting me consider the idea, Turing is a contingency plan for 7 nm not being ready in time/not available in a large enough volume/not living up to the expectations energy wise. That would be supported both by the separation of cores not solely for performance increase but more energy efficiency as well as the immense amount of chip area invested for consumer products. It also would explain why no one outside of Nvidia every heard of Turing a couple of months ago. Maybe it was intended to be Ampere at 7 nm and with ~40% less die space.That being said, there some cleverness here. The restructured low level cache seems like a good idea and a straight up win. Depending on the separate INT cores actual silicon area though it may not justify their stated max throughput of 36% improvement, but what size it is, well just isn't known.
[…]
I'd also question the placement of their tensor cores in the same SM as FP/INT compute. A huge amount of energy usage from inferencing comes from memory shuttling, which is why inferencing specific chips have huge local caches, far bigger than those usually needed on other sorts of GPU tasks.
[…]
Assuming the linked leaks are real, performance per mm has, uhmm, gone down since Pascal.
I was entertaining the thought, that dedicated INT32 cores might consume less energy doing their INT32 work than having to shove this through the FP32 pipe. Welcome to the world of energy over space. I might be wrong though, but further idle thoughts came up, letting me consider the idea,
In addition to the DLSS capability described above, which is the standard DLSS mode, we provide a second mode, called DLSS 2X. In this case, DLSS input is rendered at the final target resolution and then combined by a larger DLSS network to produce an output image that approaches the level of the 64x super sample rendering
DLSS 2X doesn't give big increases in fps. It comes at the cost of some performance.Why isn’t DLSS 2x the standard mode? Seems a little bizarre to push a lower resolution render that just matches TAA instead of the mode that actually improves IQ.
Exactly.I guess he is talking about Distance Field rt, or some kind of voxel cone tracing, or some sphere tracing. Obviously, he doesn't realise there is a difference once we speak about polygon soup tracing, which current games are.
Here is what Sebbi had to say about RTX (hardware BVH) and his cone tracing implementation:
I don't think it's really a valid comparison with all the extra die space dedicated to Tensor and RT cores since comparing to Pascal you're referring to perf/mm2 in standard rasterized games. If you discount the die space for the additional hardware, the uplift is probably expected amount.Regardless one of the biggest things is how big things are. For reference a GTX 1080 (GP 104) is a mere 341mm^2, the 2080 is a massive 545mm and that's on the smaller 12nm process. Assuming the linked leaks are real, performance per mm has, uhmm, gone down since Pascal. To even equal Pascal from a 1080 to 2080 would need roughly a 60% performance increase.