Unreal Engine 5, [UE5 Developer Availability 2022-04-05]

It’s 1 triangle per primitive engine. There are 1 primitive engines per shader array or 2 per shader engine. So total 4 triangles per clock and 64 pixels (ROPS).

So you're doing 4 triangles per clock, actually. But that is optimal triangle/pixel emission. The number can actually go down. And it will go down as you go below the thresholds like 1 triangle and exponentially worse once you get into sub pixel sizes. This blog illustrates this well

So using simple triangle fillrate test, the tiles are primitives, so they should divide into 2. Starting at 1 tile of 1080p, and shrinking to 1x1 pixels. You can see at tile (1,1) Performance has completely fallen off a cliff.
0662-1.png


Subpixel performance becomes an exponential graph the smaller the triangle is. WRT to what we saw, UE5 is at 1
0662-9.png


So we used optimal numbers which was wrong, it's simply not possible to do small triangle per pixel performance using fixed function pipeline, at least not in the traditional sense. They must have done something entirely different, and I suspect UE5 is a very heavy compute based pipeline.
In the DF article, Epic stated that their solution relied on "hyper optimized compute shaders". Obviously fixed function hardware is supported by the engine as not everything will rely on or need nanite (non static geometry, playable characters etc), but yeah for this to work and not run into the issues you outlined, It needs to be heavily compute based. I think we are coming to the end of fixed-function geometry hardware in the GPU as evident by this and the advent of mesh shaders seeking to bypass it entirely for more compute and therefore flexible systems.
 
The technology behind this reminds me a lot of Naughty Dogs work on Uncharted 2, I think? Didn’t they get close to culling to one triangle per pixel, maybe even using SPEs? Have to start googeling ...
 
The technology behind this reminds me a lot of Naughty Dogs work on Uncharted 2, I think? Didn’t they get close to culling to one triangle per pixel, maybe even using SPEs? Have to start googeling ...
?
https://www.guerrilla-games.com/read/practical-occlusion-culling-in-killzone-3
https://www.ea.com/frostbite/news/culling-the-battlefield-data-oriented-design-in-practice
http://www.selfshadow.com/talks/rwc_gdc2010_v1.pdf

iirc, uncharted 2 did use tile based deferred lighting. Don’t remember their triangle culling practice there.

hm...

http://www.jonolick.com/uploads/7/9/2/1/7921194/gdc_07_rsx_slides_final.pdf (page 70+)
 
Last edited:
The technology behind this reminds me a lot of Naughty Dogs work on Uncharted 2, I think? Didn’t they get close to culling to one triangle per pixel, maybe even using SPEs? Have to start googeling ...
Many ps3 games culled non contributing polygons which didn't hit sample points.
 
They revealed that the demo runs with "pretty good performance' on an RTX 2070 Super and NVMe drive. They also expect RTX will play nice with Lumen. No idea about performance numbers though, but If I had to guess, the fact that they are hiding them means they are higher than PS5.

Would this demo run on my PC with a RTX 2070 Super? Yes, according to Libreri, and I should get "pretty good" performance. For comparison, the PlayStation 5 GPU the demo video was captured on achieves 10.28 teraflops, while the RTX 2070 Super hits just over 9 teraflops

Sony is pioneering here with the PlayStation 5 architecture. It's got a God-tier storage system which is pretty far ahead of PCs. On a high-end PC with an SSD and especially with NVMe, you get awesome performance too.
Also shown off in the tech demo: New lighting tech called Lumen, a neat particle system that can mimic the behavior of bat swarms and roaches, the Chaos physics system, and ambisonics rendering (360 degree surround sound). All of this will play nice with Nvidia's RTX ray tracing, I'm told, though there are no specific details on that or what kind of performance Nvidia's DLSS can potentially bring to UE5 games.
https://www.pcgamer.com/unreal-engine-5-tech-demo/
 
They revealed that the demo runs with "pretty good performance' on an RTX 2070 Super and NVMe drive. They also expect RTX will play nice with Lumen. No idea about performance numbers though, but If I had to guess, the fact that they are hiding them means they are higher than PS5.





https://www.pcgamer.com/unreal-engine-5-tech-demo/

Totally expected. Now i want to see more of this UE5 on next gen nv hw.
 
Last edited:

It did. SPU culling was one of the main features in their toolkit, also provided to 3rd party devs. The other popular feature was SPU post-processing.

I hope this demo also implies more flexibility in incorporating user-generated objects/scenes in games like LittleBigPlanet. :)

I imagine their Audiokinetic SDK and Tempest run-time can fit under this UE5 umbrella too.
 
Last edited:
They revealed that the demo runs with "pretty good performance' on an RTX 2070 Super and NVMe drive. They also expect RTX will play nice with Lumen. No idea about performance numbers though, but If I had to guess, the fact that they are hiding them means they are higher than PS5.





https://www.pcgamer.com/unreal-engine-5-tech-demo/

I'd guess that they'd run an approximation of one another. Although, nothing in that article suggests that UE5 runs the same or better on a 2070 Super. Presumably from a GPU standpoint they'd be similar, give or take a few pixels/frames...

...sounds like you need more than a GPU to run this though...
 
I'd guess that they'd run an approximation of one another. Although, nothing in that article suggests that UE5 runs the same or better on a 2070 Super. Presumably from a GPU standpoint they'd be similar, give or take a few pixels/frames...

...sounds like you need more than a GPU to run this though...

They'll need to match the SSD specs with "equivalent" third party SSD units for similar performance. Not quite there yet, but close.

https://wccftech.com/playstation-5-...pansion-but-it-will-require-sonys-validation/

"Here's the catch, though: that commercial drive has to be at least as fast as ours. Games that rely on the speed of our SSD need to work flawlessly with any M.2 drive.

No PCIe 3.0 drive can hit the required speed of 5.5GB/s, as they are capped at 3.5GB/s. However, the first PCIe 4.0 M.2 drives have now hit the market, and we're seeing 4 to 5GB/s speeds. By year's end, I expect there will be drives hitting 7GB/s.

Having said that, we are comparing apples and oranges, because that commercial M.2 drive will have its own architecture, its own flash controller and so. For example, the NVMe specification lays out a priority scheme for requests that the M.2 drives can use, and that scheme is pretty nice, but it only has two true priority levels. Our drive supports six. We can hook up a drive with only two priority levels, definitely, but our custom I/O unit has to arbitrate the extra priorities rather than the M.2 drive's flash controller, and so the M.2 drive needs a little extra speed to take care of issues arising from the different approach. That commercial drive also needs to physically fit inside of the bay we created for PlayStation 5 for M.2 drives."
 
Last edited:
Honestly, I think the culling performance in this UE5 solution is going to be different from what we've ever seen, because it sounds like you'll have very fine grained control over loading the correct polygons from the disk. The percentage of polygons that get culled should be much lower than what you'd typically see. You're not issuing a draw call for a full model and then culling the backface and all of the parts of the model that don't fit within your scene, or that are occluded etc. I may be wrong, but it sounds to me like you import the high quality model from zbrush but it's converted into a format that fits their virtual geometry system. That format will allow you to do some type of visibility test cheaply enough that you can selective load probably cache sized chunks of geometry off disk and into a geometry cache in RAM. The detail in the distance is where I get a bit confused about it, because you're less likely to be cache friendly. Two pixels in the distance could be from two different models, where two pixels up close could be spatially close and from the same chunk. So there's something about the data structure that must accommodate that case. Really curious to see how far off that is. But there's no way the game has a billion polygons in the scene and it's culling it down to 3.6 million (1 per pixel at 1440p) in real-time. Like virtual texturing, smart loading of the correct pieces of data is where you'll gain a large amount of efficiency. It's likely that the texture data and the geometry are stored in the same structure, so the texels and polygons are cached in RAM together with a 1:1 relationship.
 
The PS3 had small memory. They already needed to partition the vertices efficiently in some way before culling in the SPUs.

Naturally, years later, today's method is necessarily more sophisticated.
 
Last edited:
I wonder if majority of triangles are rasterized with compute shaders, is the regular fixed function rasterization even used at big capacity, or are they more idle than usual.
 
I wonder if majority of triangles are rasterized with compute shaders, is the regular fixed function rasterization even used at big capacity, or are they more idle than usual.
nearly all of them are with some exception cases where FF rasterization is used. Until FF can solve the 1 triangle per pixel problem and also how to deal with subpixel triangle sizes, the FF pipeline is gonna be dead after this generation. They need to refactor how to cull so many triangles as well.
 
I've saw the digital foundry video, seems like that the new tech from UE5 ha a lot of latency and artifacts, and it's inefficient with foliage/hair (it wastes 3/4 pixel calc when they are organizated in groups of four or more)
 
I've saw the digital foundry video, seems like that the new tech from UE5 ha a lot of latency and artifacts, and it's inefficient with foliage/hair (it wastes 3/4 pixel calc when they are organizated in groups of four or more)
@Dictator being a journalist. Looking for the flaws for all of us.
 
Another interesting point is that, both lumen and geometry performances are linearly proportional to the pixel number on screen, so using a low resolution with DLSS tech (processed 1080p to final 4K), could be a great combination. if this is true, UE5 should shine on RTX hardware
 
Back
Top