Next Generation Hardware Speculation with a Technical Spin [2018]

Status
Not open for further replies.
Just thinking about how the PS4 is set up, with the main SOC and the smaller secondary arm SOC, I wonder if they'll expand on this for the PS5.

Having something like the mediatek MT8695 (I believe Sony are going to use this in there TV's??) as the secondary processor with its own 3GB of slow ram would free up the main SOC and RAM of the usual OS allocation.

16GB of DDR6 would then mean 3x the amount of RAM for developers compared to the current generation.
 
Those cards were brute forcing raytracing.
This is not correct. These realtime demos were running Unreal Engine with a DXR renderer, and Volta GV100 is hardware-compliant with D3D12_RAYTRACING_TIER_1 - i.e. it has dedicated hardware for ray-intersection search, which traverses entire scene geometry (subdivided into proprietary 'bounding volume hierarchy' acceleration structures) for each ray traced, and spawns raytracing-specific HLSL shaders (closest-hit, miss, anyhit) on intersecting triangles in order to perform lighting and texture mapping.

The original GDC 2018 demo from this March used Nvidia DGX-Station - a US$49,000 PC with 4 Tesla V100 cards interconnected through NVLink and a 20-core Xeon E5-2698 v4. The more recent SIGGRAPH 2018 demo ran on a Quadro RTX 6000, which is slightly better than GeForce RTX 2080 Ti.

Single 2080Ti can outpreform them in that type of workload.
This could be attributed to preliminary API and drivers rather actual raytracing performance of Volta and Turing. AFAIK the specs for ray-intersection search hardware are not disclosed by Nvidia, other than somewhat vague '10 Giga-rays/sec' and '78T RTX-ops', and their compute power is about the same at 13-16 TFLOPs.
 
Last edited:
Consoles will simply have lower quality RT than PC.
There are no image quality settings that can affect performance other than the number of traced rays. Current DXR titles already struggle with only two rays per pixel, which results in quite noisy picture (see this DXR code tutorial from Nvidia). That's on a gamer's GPU that retails for a hefty US$1200.

The 2080 Ti -version of the demo was downgraded IIRC
Nvidia claims that Quadro RTX 6000 / RTX 2080 Ti has better raytracing performance and that this was the same demo... careful wording?

Question 13: ...How does performance on the Quadro compare to the performance of the same demo on the DGX Station?

Answer [Ignacio Llamas, NVIDIA]: The framerate of the cinematic sequence on the Quadro RTX 6000 averages around 50 FPS at 1440p with DLSS. The original demo run at 24 to 30 FPS at 1080p on a DGX Station.
 
Last edited:
Current DXR titles already struggle with only two rays per pixel, which results in quite noisy picture - that's on a gamer's GPU that retails for a hefty US$1200.
From your article:
But the RTX hardware in the Nvidia GPU, including the RTX 2080 Ti, isn’t going to be fast enough to simply ray trace an entire AAA game. Even if it was, current game engines themselves are not designed for this. This point simply cannot be emphasized enough. There are no AAA game engines that deploy ray tracing as their primary rendering method. It’s going to take time to create them. At this stage, the goal of RTX and Microsoft DTX is to allow ray tracing to be deployed in certain areas of game engines where rasterization does poorly and ray tracing could offer better visual fidelity at substantially less performance cost.

Honestly, the whole discussion point is around this imo. No one should be disillusioned expecting RT as the primary rendering method. This is exactly what my debate is. Are we at a point where this underline/bold happens enough in the future of games such that we justify the silicon to do it.

bold underline: where rasterization does poorly and ray tracing could offer better visual fidelity at substantially less performance cost.
 
Last edited:
The only difference is the use of DLSS on the RTX GPU. While the Volta system ran native 1080p.
So they actually ran the demo at 2x less pixel resolution on a single TU102 card, and got 1.5x better fps?
It seems the GDC demo was running on a single GV100 card in that DGX-Station, as Unreal Engine 4 does not support explicit multi-GPU yet...

where rasterization does poorly and ray tracing could offer better visual fidelity at substantially less performance cost
This is exactly where DXR-enabled games are struggling on GeForce RTX - they only use ray-tracing for AO/GI and reflective surfaces.

https://www.pcgamesn.com/nvidia-rtx-2080-review-benchmarks
https://www.extremetech.com/gaming/279050-nvidia-rtx-ray-tracing-northlight-engine-demo
https://www.techspot.com/news/76073-shadow-tomb-raider-unable-maintain-60fps-geforce-rtx.html
https://www.pcgamesn.com/nvidia-rtx-2080-ti-hands-on
 
Last edited:
This is exactly where current RTX-enabled games are struggling - they only use ray-tracing for AO/GI and reflective surfaces.
Those are un-optimized demos specifically to show the technology, not how well RTX will perform in those games. Let's wait and give the developers a chance to showcase their final product.
 
These are largely unoptimized showings though. DXR PIX viewer just got released 2 days ago. Developers have not yet had a real chance to figure out how to setup their pipeline for RT performance yet. They did showcase that it works though. And that's just from the developer side, the drivers that work with DXR are still in their infancy.
Experimental API + Experimental Drivers for the demos we saw.

The final releases should be a more serious pass because they aren't expecting the API to change, and nvidia should be putting out a more solidified driver to support it.

Please note that PIX can no longer be used to debug applications that use the experimental DXR API that was announced during GDC 2018. Developers still using the experimental DXR API should strongly consider moving their applications to the final DXR API. If necessary, version 1810-02 of PIX can be used to debug the experimental DXR API.
 
Last edited:
So they actually ran the demo at 2x less pixel resolution on a single TU102 card, and got 1.5x better fps?
Nope, DLSS scales 1080p to 1440p. So in essence, they ran it at the same resolution.
It seems the GDC demo was running on a single GV100 card in that DGX-Station, as Unreal Engine 4 does not support explicit multi-GPU yet...
They used all four. The whole thing was tailor made for the system. All DXR demos showed at the events actually used 2 Volta cards or more. PICA PICA demo used 1 Volta GPU, but compared to Turing it was considerably slower.
https://www.slideshare.net/DICEStudio/siggraph-2018-pica-pica-and-nvidia-turing
 
The point is, there is not much room for optimization in the raytracing pipeline. You must keep all your geometry, textures, and shaders in memory at all times, you must traverse your entire geometry with every ray you emit, and you must trace your surface hit to all light sources in the scene. There is no way to remove hidden geometry with rough view frustum culling, no developer controls over the raytracing hardware. There is little use in profiling rendering calls and memory usage. So do not expect miracles from optimizations.
 
Yeah, I'm thinking the same. I'm not sure how efficiently the code could be structured. I guess it depends on the ray calls within shaders, maybe allocating workloads so RT'd shaders don't stall the pipeline? But I'd be surprised in these first-gen shaders aren't working pretty well anyway, because that'd be an obvious problem you'd consider when implementing the RT shaders. Within the raytracing itself, performance will be dependent on the RT hardware working through the constant workload.

I guess another area to work on would be compromising reflected quality, so seeing how expensive some reflections are based on their shader resolving and using a lower LOD shader. Profiling is probably most valuable for identifying where hacky solutions are better than true solutions as that's where realtime graphics gets really clever and where there's no cross-over with other similar imaging fields.
 
DLSS scales 1080p to 1440p
Tom's Hardware profiled the Reflections demo and they assume DLSS mode uses a custom resolution, a cross between 720p and 1080p.

They used all four. The whole thing was tailor made for the system. All DXR demos showed at the events actually used 2 Volta cards or more. PICA PICA demo used 1 Volta GPU, but compared to Turing it was considerably slower.
Khmm... SEED PICA PICA slides do show Turing to be 3-6 times faster... which makes me wonder whether Volta really has dedicated raytracing hardware?

Cannot really find references to explicit multi-gpu for other DXR demos though, besides vague images of a multi-card setup.
 
The point is, there is not much room for optimization in the raytracing pipeline.
But there is room to optimize many other things which will further increase performance, this what DICE had to say about their BFV implementation:

-DICE will make the quality of Ray Tracing scalable to work on different GPUs, also to decouple the RT resolution from the rendering resolution.
-RT acceleration is currently lacking as it only works after G-Buffer stage, DICE will change it to work asynchronously alongside the rasterization which will lead to a significant FPS boost.
-Also DICE will be merging the drawn object instances into the same acceleration structure, they expect this to lead to an almost 30% increase in raytracing performance.
Khmm... SEED PICA PICA slides do show Turing to be 3-6 times faster than Volta... does the latter really have dedicated raytracing hardware?
Volta doesn't have any RT acceleration, it just accelerates denoising, maybe it emulates RT on the software level.
 
Cannot really find references to explicit multi-gpu for other DXR demos though, besides vague images of a multi-card setup.

Epic StarWars:

Where the hell did the idea that our #ue4 ray tracing demo requires $150k worth of hardware to run? Seeing this all over the web. Runs great on 4 TitanVs which are 3k a piece. The fancy DGX Nvidia built box we ran on costs 50k but is probably overkill.


SEED:
We are doing a hybrid rendering pipeline in SEED with multiple raytracing components (reflections, shadows, materials, GI and AO) combined with rasterisation (gbuffer primarily) and compute.

Running on one or more TitanVs
https://forum.beyond3d.com/threads/directx-ray-tracing.60670/#post-2024400
 
But there is room to optimize many other things which will further increase performance, this what DICE had to say about their BFV implementation:

-DICE will make the quality of Ray Tracing scalable to work on different GPUs, also to decouple the RT resolution from the rendering resolution.
-RT acceleration is currently lacking as it only works after G-Buffer stage, DICE will change it to work asynchronously alongside the rasterization which will lead to a significant FPS boost.
-Also DICE will be merging the drawn object instances into the same acceleration structure, they expect this to lead to an almost 30% increase in raytracing performance.
You can also accumulate rays over many frames or update the BVH at a lower framerate. I'm not sure but you maybe you could also use LODs for the BVH. Something like variable rate shading would be great too.
 
Status
Not open for further replies.
Back
Top