Unreal Engine 5, [UE5 Developer Availability 2022-04-05]

No, they are not useless. A 4090 has 4x the compute performance and gets >4x the performance with Raytracing+. So it works. Raytracing puts more work on a GPU.
Respectfully you don't have a very in depth understanding of how GPU architecture and bottlenecks work, demonstrated pretty clearly from your posts here. This is not the level of discussion we expect at Beyond3D. I'll let you move a more targeted discussion to another thread if you want, but if you continue to engage here I expect it to be of a higher level of discourse.
 
I understand, it's just that 5.4 made some progress in CPU utilization and, even if it's not sure that it would have fixed it, maybe it would have lowered the amount of traversal stutters in the game, that you can even see in the demo.

I'm not that confident in the final version of this game. I'm expecting the whole suite of problems in unreal engine games, and 40 to 60 fps on console.
An NVRTX branch of UE5.4 isn't available yet so developers would have to wait for Nvidia release it. They could always choose to rebase their game around the latest release (version 5.4) of the main branch of UE5 but all of their work with their RTXDI integration goes out the window too should they make this decision ...
 
They've been developing their game with UE4 originally and have since moved on to using a custom branch (NVRTX) of UE5 ever since it was available just over 2 years ago. They can't afford to face potential regressions crop up with their game releasing in less than a week from now ...

There is a difference with continually rebasing to a later UE5 after regression test vs building the final gold master on latest HEAD of the UE5 development branch.
 
An NVRTX branch of UE5.4 isn't available yet so developers would have to wait for Nvidia release it. They could always choose to rebase their game around the latest release (version 5.4) of the main branch of UE5 but all of their work with their RTXDI integration goes out the window too should they make this decision ...
Ah, I didn't know that. Nvidia should do something about that. The game looks to have linear level design, so maybe the final version will perform without problems.
 
Is there a lot of manual work required to go from 5.0 to whatever is latest stable release? 5.4?

Or do you effectively make your game in the version you chose to develop within?
 
The latest games still use UE 5.3, see HB2 and Avowed. In 5.4, a significant performance acceleration was achieved on consoles, especially on the Xbox Series. Games made in this way will come from next year.
 
Gave it a quick run... first impressions so take with a grain of salt:

1) The geometric density is reasonable. Nothing earth shattering but not distractingly low either.
2) Performance in the non-RT mode is "ok" but not great. I suspect dropping some settings could help but I'm curious where most of it is going here.
3) Shadows in the non-RT mode are blurry and resolution transitions are obvious. I suspect this is just CSMs, which is pretty unfortunate from a quality perspective.
4) Shadow quality looks a lot better in the RT mode, but many of the foliage shadows do not animate in the RT mode.
5) In the RT path the denoising of foliage when it is initially revealed is pretty distracting even with a relatively slow flythrough. I'm worried how this will look with fast-paced camera movement.
6) Performance of the RT mode is not great, but kind of to be expected.

I'm not sure what they are actually using in terms of tech though and presumably the final game will let us see that shown off in more environments. This demo flythrough is probably not be the best example of any lighting or GI tech in particular, as it looks closer to the sort of thing you could accomplish last gen with baked lighting and good art.

Was hoping for a polished big budget release on UE5, but another ambitious more indie title is still neat to see. I suspect the lack of nanite in a lot of places is more to do with getting advertising $$$ from Nvidia, which means showing off things like RT shadows, and thus no budget for nanite foliage/etc.

That being said, I suppose the concentration for UE5 might be more towards getting dynamic GI and etc. to run at 60 on current consoles, and 30 on Switch 2 at all. This seems a really popular target rather than pushing visuals at 30 on PS5 and Series X.

Regarding that: How much does rendering VSMs consume, and how much does tracing the shadowmap consume? A cheaper soft shadow map solution might like the current CoD one might save a ms or two on lower end hw. But for rendering VSMs themselves all I can think of is that "constant time vsm" trick, where VSM gets a constant time budget and only renders the highest mips it can within that budget. Sure the next frame could see a pop to higher res in some tiles, but it seemed to work well enough from what I saw of it.
 
Was hoping for a polished big budget release on UE5, but another ambitious more indie title is still neat to see. I suspect the lack of nanite in a lot of places is more to do with getting advertising $$$ from Nvidia, which means showing off things like RT shadows, and thus no budget for nanite foliage/etc.

That being said, I suppose the concentration for UE5 might be more towards getting dynamic GI and etc. to run at 60 on current consoles, and 30 on Switch 2 at all. This seems a really popular target rather than pushing visuals at 30 on PS5 and Series X.

Regarding that: How much does rendering VSMs consume, and how much does tracing the shadowmap consume? A cheaper soft shadow map solution might like the current CoD one might save a ms or two on lower end hw. But for rendering VSMs themselves all I can think of is that "constant time vsm" trick, where VSM gets a constant time budget and only renders the highest mips it can within that budget. Sure the next frame could see a pop to higher res in some tiles, but it seemed to work well enough from what I saw of it.
In the latest iteration, a 60% performance improvement was achieved with 2x2 software VRS. On Series X, in a certain situation, the normal rendering time was 4.92 ms, which was reduced to 3.05 ms with the new API.
 
Last edited:
In the latest iteration, a 60% performance improvement was achieved with 2x2 software VRS. On Series X, in a certain situation, the normal rendering time was 4.92 ms, which was reduced to 3.04 ms with the new API.
Do you have a link to the documents? And the pages where they talk about performance testing? I would like to take a look at it. I'm curious if they talk about how they actually achieved this performance.
 
Playstation released a new trailer for the Until Dawn remake which has a comparison between the old version on the Decima engine, and the new version on UE5. Some previous comparisons it wasn't clear if the changes made were necessarily for the better.. but I think this trailer does a good job at showing that it does look much better and puts to rest my concerns. They've improved character models, lighting, RT shadows, and made some changes to the cinematography.. as well as other aspects of the game, such as changing the fixed 3rd person camera to a player controlled camera system. Can't wait to play this one again and go for a different ending!

 
Last edited:
Was hoping for a polished big budget release on UE5, but another ambitious more indie title is still neat to see. I suspect the lack of nanite in a lot of places is more to do with getting advertising $$$ from Nvidia, which means showing off things like RT shadows, and thus no budget for nanite foliage/etc.
Lol , emphasizing just shadows is reminiscent of the kind of RT effect Nvidia competitor prefers. Fortunately I doubt this is another "Starfield" moment as Black Myth's June release in China surpassed $137 million in the first 3 days.
 
In the latest iteration, a 60% performance improvement was achieved with 2x2 software VRS.
I think this applies only to the GPU portion of the improvements, but there are also massive CPU gains that were achieved with 5.4, about 50% more performance on PC as per Digital Foundry testing.


I suspect the lack of nanite in a lot of places is more to do with getting advertising $$$ from Nvidia
I suspect it's due to the UE4 heritage of the game. The game was almost complete on UE4.
which means showing off things like RT shadows
It's full path tracing here, shadows, reflections, illumination and caustics.
 
I think this applies only to the GPU portion of the improvements, but there are also massive CPU gains that were achieved with 5.4, about 50% more performance on PC as per Digital Foundry testing.

In the console city sample test, a 25% GPU improvement and 50% CPU improvement.

From the presentation of 5.4 a couple months back.
 
ause I don't think a single software has come out that uses "fp32 flops" in a way that matches the teraflops of those gpu's that support it.
I think doom eternal was used alot with the ampere press data because it seemed to use the double fp32 changes really well. If memory serves it was the only game that got close to the expected performance from the marketing specs. It kind of gives credence to the fact that if you have the talent, time and money things can/could be better.
 
I think doom eternal was used alot with the ampere press data because it seemed to use the double fp32 changes really well. If memory serves it was the only game that got close to the expected performance from the marketing specs. It kind of gives credence to the fact that if you have the talent, time and money things can/could be better.

4090 has similar performance differentials here as well. No game will ever get remotely close to "4x the compute" because it's practically impossible to do so. It would be interesting to see if it can even sustain its rated 80 TFLOPS in a simple shader test like we used to have in GPU reviews.
 

4090 has similar performance differentials here as well. No game will ever get remotely close to "4x the compute" because it's practically impossible to do so. It would be interesting to see if it can even sustain its rated 80 TFLOPS in a simple shader test like we used to have in GPU reviews.

Raw flops comparisons are terribly misleading since no GPU uses anywhere near its peak flops. Here's a trace from the Wukong benchmark. SM throughput averages less than 20%. This is Ampere so Ada probably does a bit better.

The highlighted raytracing call is the most expensive dispatch in the frame and is completely memory bound. The poor RT core doesn't even hit 10% utilization. There were a few other RT calls in this frame that hit ~80% RT core utilization and in those cases L2 hit rates were around 90% (woohoo!). Peak flops comparisons are mostly useless when everything is stalled waiting on VRAM anyway.


wukong-trace.png
 
Raw flops comparisons are terribly misleading since no GPU uses anywhere near its peak flops. Here's a trace from the Wukong benchmark. SM throughput averages less than 20%. This is Ampere so Ada probably does a bit better.

The highlighted raytracing call is the most expensive dispatch in the frame and is completely memory bound. The poor RT core doesn't even hit 10% utilization. There were a few other RT calls in this frame that hit ~80% RT core utilization and in those cases L2 hit rates were around 90% (woohoo!). Peak flops comparisons are mostly useless when everything is stalled waiting on VRAM anyway.
Where in the benchmark is that trace taken? Here much better on a 4080 super, at the beginning of the benchmark.

Screenshot 2024-08-15 114729.png
 
The highlighted raytracing call is the most expensive dispatch in the frame and is completely memory bound. The poor RT core doesn't even hit 10% utilization. There were a few other RT calls in this frame that hit ~80% RT core utilization and in those cases L2 hit rates were around 90% (woohoo!). Peak flops comparisons are mostly useless when everything is stalled waiting on VRAM anyway.
Ouch. That rumor of the RTX 5090 having a 512-bit memory bus and GDDR7 now sounds a lot more credible.
 
Back
Top