Expected perceptual difference between Scorpio and PS4 Pro spin-off

AlNom · Sep 20, 2016

Did someone say, 'stacked'?

function · Sep 20, 2016

sebbbi said:
I remember some PS4 first party developer claiming their 1080p rendertargets were 250 MB in total. 250 MB * 4 = 1 GB.

Most games switch their ESRAM contents a few times per frame. Not all draw calls render to the same buffers. Let's say you switch ESRAM contents 4 times per frame. That is 128 MB total (at 900p). 900p * 5.76 = 4K. Thus render targets would be 738 MB in total at 4K. Not 1 GB, but close enough.

How much memory does your virtual texturing solution take up at 4K? I know it scales with resolution, and based on your comments about texture sampling it doesn't seem like checkerboard will reduce the requirements any.

Also, if you were to support VR - with its need to support fast user head movements - would this increase size further?

sebbbi · Sep 20, 2016

function said:
How much memory does your virtual texturing solution take up at 4K? I know it scales with resolution, and based on your comments about texture sampling it doesn't seem like checkerboard will reduce the requirements any.

Also, if you were to support VR - with its need to support fast user head movements - would this increase size further?

VT cache size scales linearly with screen pixel count. So 4K requires 4x larger cache. With PBR materials (4 bytes per pixel) the following cache sizes are enough: 720p = 64 MB, 1080p = 128 MB, 4K = 512 MB.

These cache sizes reserve 16 texels per each screen pixel. I have measured that filtering + geometry/UV discontinuities roughly multiply the pixel count by 4x. The remaining 4x is to ensure that a single scene takes at max 25% of the cache. This allows camera rotation, shaking, etc without loading new data. Also you need some leeway to load data ahead of the camera.

VR would need wider FOV render for VR page id texture. This pass would also provide wider FOV depth buffer, reducing screen edge issues with SSAO, specular occlusion and screen space reflections.

Checkerboarding or interlacing doesn't reduce virtual texture cache size requirement. Also these techniques don't help any other mesh lod or texture mip based streaming techniques. Streaming pools need to be as big.

iroboto said:
Swapping contents 4x per frame??? Holy moly. Fingers crossed that you can eventually you can get around to having a presentation on the workflow on this one. I'm quite curious how things run well in esram and when things run really poorly.
Even then. LOL wow. I'm surprised anything runs on XBO.

What did you expect from a 32 MB on-chip memory? GPU caches get trashed thousands times per frame. Copying 32 MB from ESRAM to DDR three times per frame costs only a few percents of DDR bandwidth. An easy sacrifice to ensure that you get most of out the fast memory.

You can copy in background, utilizing the leftover bandwidth gaps (similar to async compute). Not a big deal. On Xbox 360 you had to resolve from EDRAM to main memory much more frequently. In 60 fps games resolves used 10%+ of total GPU cycles, as the GPU was idling during resolves. Xbox One doesn't have this problem. And often you don't even need to copy data out from ESRAM, you write it and consume it by the next pass and write something else over it. Now Vulkan and DirectX 12 on PC also allow overlapping multiple resources in memory (you of course need to use barriers/fences to ensure that data races don't occur). But I don't think many PC games use this feature yet.

iroboto · Sep 20, 2016

sebbbi said:
What did you expect from a 32 MB on-chip memory?
¯\_(ツ)_/¯

lol honestly, I think my mistake here was that everyone started tossing out buffer sizes and doing some sort of math that said, look xbox one can only hold 2-3 targets at 1080p. And i guess that anchored in my mind that ESRAM was only there for holding render targets. I was never sure what it was actually being used, so I assumed that you're writing in a buffer performing something, and then writing out back to DDR3 and writing in new information.

GPU caches get trashed thousands times per frame. Copying 32 MB from ESRAM to DDR three times per frame costs only a few percents of DDR bandwidth. An easy sacrifice to ensure that you get most of out the fast memory.
You can copy in background, utilizing the leftover bandwidth gaps (similar to async compute). Not a big deal. On Xbox 360 you had to resolve from EDRAM to main memory much more frequently. In 60 fps games resolves used 10%+ of total GPU cycles, as the GPU was idling during resolves. Xbox One doesn't have this problem. And often you don't even need to copy data out from ESRAM, you write it and consume it by the next pass and write something else over it. Now Vulkan and DirectX 12 on PC also allow overlapping multiple resources in memory (you of course need to use barriers/fences to ensure that data races don't occur). But I don't think many PC games use this feature yet.

This is pretty cool and intriguing. I honestly had to think really hard about what you were saying, but I think I got it.

Code:

main memory               esram                          mov
esram                          registers                    operations.....modify.... GPU code
registers                       main memory              mov
in parallel
main memory               esram                           mov

Is that what you mean by: "And often you don't even need to copy data out from ESRAM, you write it and consume it by the next pass and write something else over it. "

function · Sep 20, 2016

sebbbi said:
VT cache size scales linearly with screen pixel count. So 4K requires 4x larger cache. With PBR materials (4 bytes per pixel) the following cache sizes are enough: 720p = 64 MB, 1080p = 128 MB, 4K = 512 MB.

These cache sizes reserve 16 texels per each screen pixel. I have measured that filtering + geometry/UV discontinuities roughly multiply the pixel count by 4x. The remaining 4x is to ensure that a single scene takes at max 25% of the cache. This allows camera rotation, shaking, etc without loading new data. Also you need some leeway to load data ahead of the camera.

VR would need wider FOV render for VR page id texture. This pass would also provide wider FOV depth buffer, reducing screen edge issues with SSAO, specular occlusion and screen space reflections.

Checkerboarding or interlacing doesn't reduce virtual texture cache size requirement. Also these techniques don't help any other mesh lod or texture mip based streaming techniques. Streaming pools need to be as big.

Cheers.

AlNom · Sep 20, 2016

iroboto said:
lol honestly, I think my mistake here was that everyone started tossing out buffer sizes and doing some sort of math that said, look xbox one can only hold 2-3 targets at 1080p. And i guess that anchored in my mind that ESRAM was only there for holding render targets.

Early days + esram tools :3

Globalisateur · Sep 20, 2016

chris1515 said:
He is the guy who invented TAA and FXAA at Epic. He worked at Nvidia and now work at AMD.

Many Xbox One games are rendered under 1080p? Don't understand the point?

He just said it is better to render game on PS4 Pro and Scorpio under 4k...

If you render Scorpio game at 4k less quality per pixel than PS4...

My bad. I missed the word 'under' so my confusion...

Thanks guys for patiently putting me back in the right way.

Expected perceptual difference between Scorpio and PS4 Pro spin-off

AlNom

Moderator

function

None functional

sebbbi

iroboto

Daft Funk

function

None functional

AlNom

Moderator

Globalisateur

Globby

Similar threads

Expected perceptual difference between Scorpio and PS4 Pro *spin-off*

AlNom

Moderator

function

None functional

sebbbi

iroboto

Daft Funk

function

None functional

AlNom

Moderator

Globalisateur

Globby

Similar threads

Expected perceptual difference between Scorpio and PS4 Pro spin-off