Xbox Series X [XBSX] [Release November 10 2020]

Shifty Geezer · Jun 18, 2020

I didn't say that, but it wouldn't be unheard of either. How much did the tessellation hardware in Xenos see? How much action has the ID buffer of PS4 Pro had?

This is a technical discussion. We should discuss the technology without leaning on faith that every choice is awesome or ground-breaking. Talk about what SF is, where SFS fits in, and how they'll be used.

Ronaldo8 · Jun 18, 2020

Shifty Geezer said:
I didn't say that, but it wouldn't be unheard of either. How much did the tessellation hardware in Xenos see? How much action has the ID buffer of PS4 Pro had?

This is a technical discussion. We should discuss the technology without leaning on faith that every choice is awesome or ground-breaking. Talk about what SF is, where SFS fits in, and how they'll be used.

Leaning on faith ? I am not the one pulling conclusions about a core next-gen feature based on speculative techniques (that failed) from generations past. James Stanard himself made it clear that those filters were necessary to pull correctly pull off SFS in a tweet convo. Straight from the horse's mouth itself.

I only said Sampler Feedback was part of DX12 Ultimate. You can use it to figure out what texture pages to stream, but without our custom texture filters, you might notice "pop in" at page boundaries.
— James Stanard (@JamesStanard) April 15, 2020

iroboto · Jun 18, 2020

Ronaldo8 said:
Leaning on faith ? I am not the one pulling conclusions about a core next-gen feature based on speculative techniques (that failed) from generations past. James Stanard himself made it clear that those filters were necessary to pull correctly pull off SFS in a tweet convo. Straight from the horse's mouth itself.

I only said Sampler Feedback was part of DX12 Ultimate. You can use it to figure out what texture pages to stream, but without our custom texture filters, you might notice "pop in" at page boundaries.
— James Stanard (@JamesStanard) April 15, 2020

which is the way things are today unless you rolled your own custom solution to account for it.

nothing changes, it's up to developers to choose to adopt it or not. No known game at this moment that we know of uses it. Though, I largely suspect games with incredibly high IQ probably do.

Ronaldo8 · Jun 18, 2020

iroboto said:
which is the way things are today unless you rolled your own custom solution to account for it.

nothing changes, it's up to developers to choose to adopt it or not. No known game at this moment that we know of uses it. Though, I largely suspect games with incredibly high IQ probably do.

Of course hardware implementation changes everything. Developers are far more likely to use a feature if the hassle to use it is less burdensome and fixed-function silicon helps to do exactly that (along with elimination of overheads from software implementation). Same thing applies with Sony and its state of the art SSD pipeline.

iroboto · Jun 18, 2020

Ronaldo8 said:
Of course hardware implementation changes everything. Developers are far more likely to use a feature if the hassle to use it is less burdensome and fixed-function silicon helps to do exactly that (along with elimination of overheads from software implementation). Same thing applies with Sony and its state of the art SSD pipeline.

You gotta remake your pipeline however. It's like including ray tracing into your game. Yes the hardware supports it, but that doesn't mean it's a check box feature. As you see in the presentation, you're going to have to make changes to your pipeline to utilize it, not all developers will do this for their respective engines. But if say Unreal and Unity did it, it would be a checkbox feature for instance.

Ronaldo8 · Jun 18, 2020

iroboto said:
You gotta remake your pipeline however. It's like including ray tracing into your game. Yes the hardware supports it, but that doesn't mean it's a check box feature. As you see in the presentation, you're going to have to make changes to your pipeline to utilize it, not all developers will do this for their respective engines. But if say Unreal and Unity did it, it would be a checkbox feature for instance.

Guess who's stable of studios is the most consistent and biggest user of the Unreal Engine? Expect SFS to be heavily used.

iroboto · Jun 18, 2020

Ronaldo8 said:
Guess who's stable of studios is the most consistent and biggest user of the Unreal Engine? Expect SFS to be heavily used.

still takes time. No one is saying it won't be used. Just not to expect it just because it's available. There are performance implications. So some games may rather go without it than with it.

Ronaldo8 · Jun 18, 2020

iroboto said:
still takes time. No one is saying it won't be used. Just not to expect it just because it's available. There are performance implications. So some games may rather go without it than with it.

No free lunch.

Shifty Geezer · Jun 18, 2020

Ronaldo8 said:
Leaning on faith ?

Yes. Your discussion isn't technical. In that reply, you aren't talking about how SF and SFS will be used, and what advantages SFS could bring; you've just said, "MS have included it so it must be useful." That's a non-technical argument of faith.

From your tweet, you may notice pop-in at tile boundary. So now let's talk about tile boundaries and possible solutions to tile pop-in and where SFS features and various technical discussions.

Ronaldo8 · Jun 18, 2020

Shifty Geezer said:
Yes. Your discussion isn't technical. You aren't talking about how SF and SFS will be used, and what advantages SFS could bring. You've just said, "MS have included it so it must be useful."

From your tweet, you may notice pop-in at tile boundary. So now let's talk about tile boundaries and possible solutions to tile pop-in.

You doth protest too much. James Stanard, one of the system architects of the XSX, said it was useful. I don't assume to know more than him unlike others.
Tech talk? How about this:

There is an excellent GDC talk by Sean Barret about the case where a page is not yet resident. What to do? Using bilinear filtering (in hardware, as in Barret's own words, software implementation is an unnecessary hassle) on the residency map after "padding" it astutely to solve the issue of sampling texels from adjacent pages that are inherently decorrelated. Sampling from adjacent pages will introduce artifacts as mentioned by Stanard:

Those patented MS "texture filters" are in fact a modified form of bilinear filtering as explained in the patent (https://patentimages.storage.googleapis.com/ae/20/a0/313511519c3caa/US20180232940A1.pdf).

More importantly, texture filtering and blending is done any time a transition to the next LOD is occurring irrespective of whether the next LOD is resident or not. MS explicitly provide an example where a PRT is created and the residency map is constantly updated with successive mip levels corrected by a fractional blending factor.
Also, you can only prefetch what you can foresee and not what you figure out after sampling by which time you already need it.

Shifty Geezer · Jun 18, 2020

Ronaldo8 said:
Also, you can only prefetch what you can foresee and not what you figure out after sampling by which time you already need it.

That's not happening. You can't load a texture from storage the moment the GPU realises it's needed; that's just too slow. If SSD's could work that fast, there'd be no market for Optane. SSD access times are in microseconds, versus nanoseconds for DRAM (which in itself if horrifically slow compared to the working storage of processor caches).

Ronaldo8 · Jun 19, 2020

Shifty Geezer said:
That's not happening. You can't load a texture from storage the moment the GPU realises it's needed; that's just too slow. If SSD's could work that fast, there'd be no market for Optane. SSD access times are in microseconds, versus nanoseconds for DRAM (which in itself if horrifically slow compared to the working storage of processor caches).

SFS as a VRAM capacity (distinct from bandwidth) saver/multiplier implies exactly that.

Shifty Geezer · Jun 19, 2020

How do you address the microseconds of latency? Is the GPU going to sit waiting for the data to arrive, or stop what it's drawing, draw something else (or compute something else), then come back to drawing with the texture when it finally arrives and the object with the correct LOD in the right place without messing up the drawing it's already done?

Also, if you can fetch data on demand from disk, why would you need a mechanism for soft transitioning between LODs - SFS? Users would never see a transition because the correct LOD would always be present on demand, no?

iroboto · Jun 19, 2020

Shifty Geezer said:
How do you address the microseconds of latency? Is the GPU going to sit waiting for the data to arrive, or stop what it's drawing, draw something else (or compute something else), then come back to drawing with the texture when it finally arrives and the object with the correct LOD in the right place without messing up the drawing it's already done?

Also, if you can fetch data on demand from disk, why would you need a mechanism for soft transitioning between LODs - SFS? Users would never see a transition because the correct LOD would always be present on demand, no?

just thinking about frustum culling etc.
i think from further away say MIP10 because it’s so far away you pull that on demand, and it’s a blend of uggo at that draw distance.

If you’re strafing left and right, you’re loading in tiles that are out of view before you can actually see it I suspect. I don’t know how much of this translates to how tight you can cut it with SSD. But some testing would be required.

Shifty Geezer · Jun 19, 2020

iroboto said:
i think from further away say MIP10 because it’s so far away you pull that on demand, and it’s a blend of uggo at that draw distance.

Why is a high MIP something you'd pull on demand? As in, why is that more latency tolerant? The issue isn't BW but the time it takes from sampler feedback stating during texture sampling (as the object is being drawn), "I need a higher LOD on this texture" and that texture sampler getting new texture data from the SSD.

Texturing on GPUs is only fast and effective because the textures are pre-loaded into the GPU caches for the texture samplers to read. The regular 2D data structure and data access makes caching very effective. The moment texture data isn't in the texture cache, you have a cache miss and stall until the missing texture data, many nanoseconds away, is loaded. At that point, fetching data from SSD is clearly an impossible ask.

The described systems included mip mapping and feedback to load and blend better data in subsequent frames. You want to render a surface. The required LOD isn't in RAM so you use the existing lower LOD to draw that surface, and start the fetching process. When the higher quality LOD is loaded a frame or two later, you either have pop-in or you can blend between LOD levels, aided by SFS if that is present.

When it comes to mid-frame loads as described in that theoretical suggestion in the earlier interview (things to look into for the future), we'd be talking about replacing data that's no longer needed this frame. There's no way mid-rendering data from storage is every going to happen on anything that's not approaching DRAM speeds. The latencies are just too high.

iroboto · Jun 19, 2020

Shifty Geezer said:
Why is a high MIP something you'd pull on demand? As in, why is that more latency tolerant? The issue isn't BW but the time it takes from sampler feedback stating during texture sampling (as the object is being drawn), "I need a higher LOD on this texture" and that texture sampler getting new texture data from the SSD.

Texturing on GPUs is only fast and effective because the textures are pre-loaded into the GPU caches for the texture samplers to read. The regular 2D data structure and data access makes caching very effective. The moment texture data isn't in the texture cache, you have a cache miss and stall until the missing texture data, many nanoseconds away, is loaded. At that point, fetching data from SSD is clearly an impossible ask.

The described systems included mip mapping and feedback to load and blend better data in subsequent frames. You want to render a surface. The required LOD isn't in RAM so you use the existing lower LOD to draw that surface, and start the fetching process. When the higher quality LOD is loaded a frame or two later, you either have pop-in or you can blend between LOD levels, aided by SFS if that is present.

When it comes to mid-frame loads as described in that theoretical suggestion in the earlier interview (things to look into for the future), we'd be talking about replacing data that's no longer needed this frame. There's no way mid-rendering data from storage is every going to happen on anything that's not approaching DRAM speeds. The latencies are just too high.

The highest level MIP is always going to be 1px in size. So I think by draw distance it’s not too relevant at high mip levels. I think Claire indicated that MIP selection and sampling become critical and difficult once we get closer to to MIP4 to MIP0 which is the area you’re discussing.

So I’m not sure when things are retrieved at that distance, I suspect it must be before you see the tile but I could be wrong. If it is exactly when you see the tile, that seems rather nutty.

Allandor · Jun 19, 2020

Shifty Geezer said:
Why is a high MIP something you'd pull on demand? As in, why is that more latency tolerant? The issue isn't BW but the time it takes from sampler feedback stating during texture sampling (as the object is being drawn), "I need a higher LOD on this texture" and that texture sampler getting new texture data from the SSD.

Texturing on GPUs is only fast and effective because the textures are pre-loaded into the GPU caches for the texture samplers to read. The regular 2D data structure and data access makes caching very effective. The moment texture data isn't in the texture cache, you have a cache miss and stall until the missing texture data, many nanoseconds away, is loaded. At that point, fetching data from SSD is clearly an impossible ask.

The described systems included mip mapping and feedback to load and blend better data in subsequent frames. You want to render a surface. The required LOD isn't in RAM so you use the existing lower LOD to draw that surface, and start the fetching process. When the higher quality LOD is loaded a frame or two later, you either have pop-in or you can blend between LOD levels, aided by SFS if that is present.

When it comes to mid-frame loads as described in that theoretical suggestion in the earlier interview (things to look into for the future), we'd be talking about replacing data that's no longer needed this frame. There's no way mid-rendering data from storage is every going to happen on anything that's not approaching DRAM speeds. The latencies are just too high.

And because of that, I don't think the SSD will make that much difference in how something is rendered. It only reduces the size of the texture-cache and the need to load everything as package (multiple times). So you can use the memory more efficient, but overall it doesn't take away the problem that you still must load ahead from things that might get visible in the next few frames. Texture titling etc and the high bandwidth just helps here to reduce the texture-cache footprint. That's more or less all. It won't radical change how games get developed, but offers a method to use the RAM a bit more efficient.

eastmen · Jun 19, 2020

just thought this xbox series x looked really cool.

Ronaldo8 · Jun 21, 2020

Possible application of DirectML in the context of XSX launch:

There are two patent applications from Xbox ATG member Martin J.I Fuller that are of particular interest in trying to understand how the Scarlet engine will handle textures:

(1) Reducing the search space for real time texture compression (https://patentimages.storage.googleapis.com/3a/50/67/ab63ca347f8bca/US20190304138A1.pdf)
(2) Machine learning applied to textures compression or upscaling (https://patentimages.storage.googleapis.com/60/15/c9/14680f5408cfac/US20200105030A1.pdf)

The claims relate to two very interesting strategies:

(1) Compression of textures in non-GPU compatible format (like JPEG) to achieve greater compression and lower memory footprint and its conversion into GPU-compatibe compressed (block compressed BCn1 -BCn7) data using machine learning at runtime.
(2) Upscaling of lower resolution textures to minimize memory footprint and load on memory bandwidth

What's crazy is that we know that those particular methods are actively being used by some MS studios (Ninja theory?) thanks to an interview given by Playfab's head honcho: https://venturebeat.com/2020/02/03/...ext-generation-of-games-and-game-development/

Of note is this particular nugget of information:

"Gwertzman: You were talking about machine learning and content generation. I think that’s going to be interesting. One of the studios inside Microsoft has been experimenting with using ML models for asset generation. It’s working scarily well. To the point where we’re looking at shipping really low-res textures and having ML models uprez the textures in real time. You can’t tell the difference between the hand-authored high-res texture and the machine-scaled-up low-res texture, to the point that you may as well ship the low-res texture and let the machine do it.

Journalist: Can you do that on the hardware without install time?

Gwertzman: Not even install time. Run time.

Journalist: To clarify, you’re talking about real time, moving around the 3D space, level of detail style?

Gwertzman: Like literally not having to ship massive 2K by 2K textures. You can ship tiny textures."

Whatever MS is cooking at XBOX ATG, it promises to really change game development thanks to synchronisation between console and PC.

Jay · Jun 21, 2020

Definitely interesting, but I remember this aspect also.

Gwertzman: It’s especially good for photorealism, because that adds tons of data. It may not work so well for a fantasy art style.

So currently not a general solution, but it shows the sort of R&D that MS is doing, especially if it means leveraging their azure servers I guess.

Xbox Series X [XBSX] [Release November 10 2020]

Shifty Geezer

uber-Troll!

Ronaldo8

iroboto

Daft Funk

Ronaldo8

iroboto

Daft Funk

Ronaldo8

iroboto

Daft Funk

Ronaldo8

Shifty Geezer

uber-Troll!

Ronaldo8

Shifty Geezer

uber-Troll!

Ronaldo8

Shifty Geezer

uber-Troll!

iroboto

Daft Funk

Shifty Geezer

uber-Troll!

iroboto

Daft Funk

Allandor

eastmen

Ronaldo8

Jay

Similar threads