Unreal Engine 5, [UE5 Developer Availability 2022-04-05]

I'm not suggesting how this works or anything but could it be just this simple?
Because the mesh data is so detailed each poly could just be represented by colour so no need for a texture.
Effectively the mesh is flattened and stored like a texture somehow.
The texture is resized down to get the desired LOD and un-flattened to get the 3d mesh each poly drawn using the colour either as a polygon if more than 2x2 or a pixel if less than 2x2.
This would only apply to certain mesh types though i.e., not the character.
 
No one here is talking about value. We're talking about the technology and how it scales.

I didn't watch the whole 132 minute. Will probably do so over the weeks.

They are very happy with what they have achieved, and the reaction from everywhere. Can't wait to get their existing customers on board.
The engineers are very grounded in cross-platform development and serving their existing customers.
It certainly feels like they will have different strategies and tactics to keep their customers happy (except mobile where they offer no target commitment at the beginning of the live cast).

They are very proud of Nanite's ability to keep the LOD switch seamless (Said it's difficult multiple times).

The UE5 demo actually sets the stage nicely for more upcoming reveals. As they said, this demo was supposed to be done 2 months ago.

If people think this is the best there is, probably not since they all still have work to do.
 
I believe you are misunderstanding how this works. The output of primitive shaders feed the rasterizer hardware. The UE5 engine software rasterizes and bypasses the raster hardware. The majority of the triangles are software rasterized. The minority of triangles, where the hardware rasterizes would be faster (probably related to cases where polygons are bigger than one pixel), the engine falls back to primitive shaders instead of the old vertex shader pipeline. For the majority of the scene the primitive/raster units are not used.
Yes, I get all that.

What isn't clear is what execution units constitute software and hardware rasterising, and what AMD defines as a Primitive Shader, and what Epic are doing regards to PS5 rasterising with Nanite. Especially this comment from Epic:

"On PlayStation 5 we use primitive shaders for that path which is considerably faster than using the old pipeline we had before with vertex shaders."

Prior to this statement, they were talking about software rasterising with CUs, then they switch to saying they are using Primitive Shaders for PS5 and compare how slow the old pipeline is with vertex shaders (old rasterisation with FF hardware). They are implying Primitive Shaders as part of software rasterising when comparing to the old pipeline. If something is programmable, then I consider this software, so a Primitive Shader is software rasterising. But it is not clear if a Primitive Shader is using its own dedicated ALUs or general purpose CUs for execution. Looking at old GPU block diagrams, Primitive (geometry) blocks look to have their own ALUs seperate from CUs.

I'd like to know the capabilities of this Primitive Shader in RDNA2 and if there are any customisations made here in PS5. But this information is hard to come by.
 
@j^aws I think it's pretty clear when you include the context of that statement:

"The vast majority of triangles are software rasterised using hyper-optimised compute shaders specifically designed for the advantages we can exploit," explains Brian Karis. "As a result, we've been able to leave hardware rasterisers in the dust at this specific task. Software rasterisation is a core component of Nanite that allows it to achieve what it does. We can't beat hardware rasterisers in all cases though so we'll use hardware when we've determined it's the faster path. On PlayStation 5 we use primitive shaders for THAT PATH which is considerably faster than using the old pipeline we had before with vertex shaders."

* vast majority of the triangles are software rasertised using compute shaders
* compute shaders leave hardware rasterisers in the dust
* can't beat hardware rasterisers in all cases
* use hardware shaders when they're the faster path
* primitive shaders are used for that path
* primitive shaders are faster than the old vertex-shader-based geometry pipeline.

Vast majority of the scene is software rasterised and the rest is handled with primitive shaders vs the old vertex shader pipeline.
 
UE5 should not have any edge aliasing, so 1440p UE5 should look much cleaner than a typical 1440p game. Also makes me wonder if DLAA would even work with it, or could be massively simplified. You've pretty removed the AA part, and now you're just upscaling an image.
I’ve seen this mentioned before and I can’t get my head around it. Can you or anyone else explain why this would be the case?
 
So, XSX would run maybe this demo at higher res, but Nanite textures and Lumen lighting tech would be scaled down somehow?
No, it won't, they clearly state the demo doesn't rely on the ray 5.5GB speed of the PS5 SSD, and can run on a regular PC SSD.
* Guy mentioned they can run the demo in the editor at 40fps, not 40+ but did not specify resolution.
Even with that, laptop RTX 2080 is a low par to clear, it's clocked lower and is thermally challenged, meaning it's effective clocks is often significantly lower than a desktop 2080, down to being at the performance level of a desktop 2070. It gets worse if this is a 2080 Max-Q variant, as this one is clocked way lower and can have it's performance drop to desktop 2060 level.
 
I’ve seen this mentioned before and I can’t get my head around it. Can you or anyone else explain why this would be the case?

Aliasing is caused by rasterization when a triangle edge partially covers a pixel. Basically the triangle covers the center of the pixel, that pixel then gets the colour from a texel in that polygon. Two polygons can overlap a pixel, but only one polygon "wins" and gets coverage for that pixel. MSAA adds multiple sample points per pixel and the one the polygon that covers the most points wins. Most games do not use MSAA anymore and do different tricks in post processing or temporally. In the case of UE5, you have one polygon that's pixel sized so the coverage won't overlap multiple pixels and it'll have its own colour, so you won't get that sawtooth edge. At least that's how I understand it.

anti_aliasing_rasterization.png


anti_aliasing_rasterization_filled.png


Edit: Actually I'm dumb. With straight lines you'll still have the same issue if they're testing pixel centers. A straight line is a straight line. Because you have way more polygons, you're probably going to have a lot less straight lines in many scenes, but art style is going to have a lot to do with that.
 
Last edited:
@j^aws I think it's pretty clear when you include the context of that statement:

"The vast majority of triangles are software rasterised using hyper-optimised compute shaders specifically designed for the advantages we can exploit," explains Brian Karis. "As a result, we've been able to leave hardware rasterisers in the dust at this specific task. Software rasterisation is a core component of Nanite that allows it to achieve what it does. We can't beat hardware rasterisers in all cases though so we'll use hardware when we've determined it's the faster path. On PlayStation 5 we use primitive shaders for THAT PATH which is considerably faster than using the old pipeline we had before with vertex shaders."

* vast majority of the triangles are software rasertised using compute shaders
* compute shaders leave hardware rasterisers in the dust
* can't beat hardware rasterisers in all cases
* use hardware shaders when they're the faster path
* primitive shaders are used for that path
* primitive shaders are faster than the old vertex-shader-based geometry pipeline.

Vast majority of the scene is software rasterised and the rest is handled with primitive shaders vs the old vertex shader pipeline.
He is saying Primitive shaders are hardware and part of the fixed function pipeline, right? And this hardware path is faster than the other, older hardware path (vertex shader pipeline). This is what is throwing me as shaders are programmable, and that's what I consider software and not FF hardware. And is not clear if Primitive shaders leverage CUs or their own ALUs for programmabiity?
 
He is saying Primitive shaders are hardware and part of the fixed function pipeline, right? And this hardware path is faster than the other, older hardware path (vertex shader pipeline). This is what is throwing me as shaders are programmable, and that's what I consider software and not FF hardware. And is not clear if Primitive shaders leverage CUs or their own ALUs for programmabiity?

Primitive shaders are programmable and use the CUs, I'd imagine, but they output vertex data which is fed into the fixed raster hardware. The raster hardware assembles polygons from vertices, culls unneeded polygons and then outputs pixels from testing polygon coverage. So even though primitive shaders are programmable and use the general ALU of the GPU, they still require the raster hardware to assemble polgyons and test coverage. In the case of UE5 they're using compute shaders in place of primitive shaders and the raster hardware. They're assembling and testing the polygons in a compute shader, instead of with the raster hardware.

For the case where the primitive shader path would be faster, they use it, which is most likely when they end up with a polygon that covers more than one pixel.

The old vertex shader path is dropped entirely because primitive shaders just do the same thing better.
 
No, it won't, they clearly state the demo doesn't rely on the ray 5.5GB speed of the PS5 SSD, and can run on a regular PC SSD.

Even with that, laptop RTX 2080 is a low par to clear, it's clocked lower and is thermally challenged, meaning it's effective clocks is often significantly lower than a desktop 2080, down to being at the performance level of a desktop 2070. It gets worse if this is a 2080 Max-Q variant, as this one is clocked way lower and can have it's performance drop to desktop 2060 level.

When the engineer talked about 40fps, he didn’t mention the model of the laptop.

The event was very casual, with the participants joking and eating throughout the show. The whole activity lasted more than 2 hours mostly because they were often off-topic. It’s hard to summarize because I find myself losing focus and doing other stuff while they drift off. :)
 
Last edited:
Primitive shaders are programmable and use the CUs, I'd imagine, but they output vertex data which is fed into the fixed raster hardware.
If Primitive shaders use CUs, then why not just use compute shaders that use CUs and feed their output into the hardware FF rasterisers? What's the difference here?
 
He mentioned it in a separate forum, besides, the highest RTX GPU SKU in the laptop market is the RTX 2080 (technically it's the 2080 Super but that was launched very very recently).

Do you have a link to his post ?

EDIT:

I found the post. He wasn’t at the forum at all. Another poster said he asked the engineer and paraphrased his response. It’s a one liner. I thought I could ask him directly but no such luck.
 
Last edited:
If Primitive shaders use CUs, then why not just use compute shaders that use CUs and feed their output into the hardware FF rasterisers? What's the difference here?

Because the FF hardware rasterisers are too slow when polygons are too small. That's why they're software rasterizing instead.The scene they demoed had mostly 1 polygon per pixel, which is a horrible performance case for the fixed function hardware. The case where they switch back to the primitive shaders and the fixed function rasterizers is probably when a polygon covers multiple pixels.

Edit:
The rasterizer in each shader engine performs the mapping from the geometry-centric stages of the graphics pipeline to the pixel-centric stages. Each rasterizer can process one triangle, test for coverage, and emit up to sixteen pixels per clock. As with other fixed-function hardware, the screen is subdivided and each portion is distributed to one rasterizer.

If you have 1 polygon per pixel the hardware rasterizer is running at 1/16th of its potential. Essentially the four rasterizers in the gpu would be outputting four pixels per clock in that case. Their software rasterizer can likely beat four pixels per clock by a good margin, otherwise they'd never do this.
 
Last edited:
With straight lines you'll still have the same issue if they're testing pixel centers.
Yeah, spacial aliasing is a problem of the shape of the pixel, nothing else, because pixels are squares, any "oblique" line they represent will be aliased because it will form gaps inbetween pixels (represented here by the empty pyramids):

26112010-22.png


Even if the pixels were circles, the gaps would still be there, even triangular pixels would exhibit the same issue.

AA attempts to "hide" these gaps through intermediate colors that blend with the surrounding colors in a gradual way.

26112010-26.png
 
Because the FF hardware rasterisers are too slow when polygons are too small. That's why they're software rasterizing instead.The scene they demoed had mostly 1 polygon per pixel, which is a horrible performance case for the fixed function hardware. The case where they switch back to the primitive shaders and the fixed function rasterizers is probably when a polygon covers multiple pixels.

Edit:

If you have 1 polygon per pixel the hardware rasterizer is running at 1/16th of its potential. Essentially the four rasterizers in the gpu would be outputting four pixels per clock in that case. Their software rasterizer can likely beat four pixels per clock by a good margin, otherwise they'd never do this.
I think we are misunderstanding each other and I appreciate the explanation. I get that there are two paths based on performance. My interest is in ascertaining which ALUs are responsible for which functions, not the functions themselves, which I understand.

If a FF rasteriser accepts vertex inputs, what difference does it make if these inputs come from a Primitive shader or a Compute shader if they both are executed on a general purpose CU? The CUs instruction set is the same.
 
No, it won't, they clearly state the demo doesn't rely on the ray 5.5GB speed of the PS5 SSD, and can run on a regular PC SSD.

Even with that, laptop RTX 2080 is a low par to clear, it's clocked lower and is thermally challenged, meaning it's effective clocks is often significantly lower than a desktop 2080, down to being at the performance level of a desktop 2070. It gets worse if this is a 2080 Max-Q variant, as this one is clocked way lower and can have it's performance drop to desktop 2060 level.

Whole demo, yes, but, like Tutomos at ERA said :

Someone asked about is 8gb bandwidth on PS5's SSD true, the developer said that's a question to ask Sony. And then he said high-performance SSD will definitely help streaming, because Nanite and Lumen require good IO but the requirement to run the demo they showed doesn't require the specs as high as PS5's SSD.[quote/]

But XSX nor that laptop PC doesn't have that I/O like PS5 SSD.We doesn't know at which graphical fidelity XSX would run the demo.

Those laptops cost upwards of £€$1600. I don't think it's a fair comparison no matter your perspective (and would they even have the same IO or decompression tech? Certainly not).

I don't even think many PC gamers would suggest buying one of those laptops, they're hardly bang-for-buck.

That's not so expensive. :D
 
Last edited by a moderator:
I think we are misunderstanding each other and I appreciate the explanation. I get that there are two paths based on performance. My interest is in ascertaining which ALUs are responsible for which functions, not the functions themselves, which I understand.

If a FF rasteriser accepts vertex inputs, what difference does it make if these inputs come from a Primitive shader or a Compute shader if they both are executed on a general purpose CU? The CUs instruction set is the same.

Primitive shader most likely takes advantage of the ALU in the CUs. The fixed-function raster hardware in the shader array does not, at least not to my understanding. It's a special purpose unit that assembles triangles from vertices, culls triangles and outputs pixels from coverage tests. Each unit can rasterize 1 polygon per clock, outputting up to 16 pixels. Both PS5 and Xbox Series X have 4 shader arrays, so 4 raster units each. The Series X has more CUs per shader array, but the performance of the rasterizer is still 16 pixels from 1 polygon per clock.

I don't think the rasterizer cares where it gets the vertices from (vertex shader, primitive shader, mesh shader, compute shader). The difference in UE5 is they don't use the rasterizer hardware at all for the vast majority of the pixels on screen. They replace its function in the compute shader to get better performance for small triangles. The compute shader accepts data, probably vertices and outputs pixels.
 
DavidGraham said:
No, it won't, they clearly state the demo doesn't rely on the ray 5.5GB speed of the PS5 SSD, and can run on a regular PC SSD.

Even with that, laptop RTX 2080 is a low par to clear, it's clocked lower and is thermally challenged, meaning it's effective clocks is often significantly lower than a desktop 2080, down to being at the performance level of a desktop 2070. It gets worse if this is a 2080 Max-Q variant, as this one is clocked way lower and can have it's performance drop to desktop 2060 level.

Whole demo, yes, but, like Tutomos at ERA said :

Someone asked about is 8gb bandwidth on PS5's SSD true, the developer said that's a question to ask Sony. And then he said high-performance SSD will definitely help streaming, because Nanite and Lumen require good IO but the requirement to run the demo they showed doesn't require the specs as high as PS5's SSD.

But XSX nor that laptop PC doesn't have that I/O like PS5 SSD.We doesn't know at which graphical fidelity XSX would run the demo.

On the subject of laptop GPU ...

The engineer said Lumen uses up more resources than Nanite. Although their target for nextgen consoles is 60fps, he hasn't hit it yet.
He was going to guarantee that he will [hit 60fps]. But stopped short of completing his statement and then as a matter of fact, said that he already got laptops to run at 40fps near the beginning of the demo in the editor. [He didn't mention the resolution, effects, and general settings for his test]

This suggests that PS5 and XSX are at least as powerful as laptop GPUs. For if the laptop GPU is more powerful, his wouldn't have used it as proof that he's making progress (He would be shredding his credibility publicly).


On SSD, the final scene simply doesn't need too much resources to run. The engineer highlighted UE4.25/5's new feature, overlapped I/O. He did comment that high speed SSD will be helpful in large world streaming. Fast I/O is what they (will) rely on. From Nanite's perspective, he can also use compression and better disk layout to help (like the current gen's approach).

Overall, it's pretty clear that they are still working on the project. It's too early to judge performance. The demo was scheduled to be shown at GDC but was delayed due to the pandemic.

The presentation is named "First UE5 live cast", suggesting we may/will have more to come.
 
Last edited:
Back
Top