Xbox Series X [XBSX] [Release November 10 2020]

Definitely interesting, but I remember this aspect also.

So currently not a general solution, but it shows the sort of R&D that MS is doing, especially if it means leveraging their azure servers I guess.
Indeed. If you are going for an art style that it’s never been trained on; it’s not going to make the right thing.
Luckily all the largest photogrammetry textures belong in the domain of realistic making this useful for a variety of scenarios if this makes it out of theory craft and into release.
 
Indeed. If you are going for an art style that it’s never been trained on; it’s not going to make the right thing.
Luckily all the largest photogrammetry textures belong in the domain of realistic making this useful for a variety of scenarios if this makes it out of theory craft and into release.
Have to admit I just scanned it to look for the quote I remembered.
I didn't think it was just a matter of it not having been trained on it yet.
If so, even better.

Wonder if they can put, opt out clause for any game on store that a studio uploads for the textures to be used as training assets.
 
Wonder if they can put, opt out clause for any game on store that a studio uploads for the textures to be used as training assets.

That's an interesting point.

I image that to some degree you could make an AI upscaler that learned simply on the output image generated by games. It could learn to identify styles and and the types of data it needed to recreate based on what was sent to the display device. Would probably be better to do it from the raw textures though, as you say.

We know that with the X1X, MS built in diagnostic tools that helped them to analyse texture access patterns, and that the information this gave MS fed directly into XSX velocity architecture (and perhaps DX12U Sampler Feedback in general). Maybe MS have designed into the XSX SoC the capability to read and then analyse any texture formatted data consumed by the GPU in a similar way. It might make sense, agreements permitting..
 
I was thinking on the lines of using the raw texture as the ground truth, jpeg as the input, train it to get from jpeg to raw texture.

Also was wondering if could then ship with say 2 mip levels, and if the training could take jpeg mip 0 to output mip 0-2, and jpeg mip 3 to generate 3-5 in real time.
May take less resources to do the ML to convert to the lower mip level.
Could then ship with not only smaller assets sizes but also fewer, and also wouldn't need to swap them in and out of memory as much either.

Wonder if ML would work well with SFS, would ML do better with the full jpeg texture in memory?

But anyway they have all those textures in the stores already, be a shame to not be able to use them for training.

Also feel that, during all that 200,000 hours of play testing they plan to do before xsx release, it should be grabing screen shots automatically for use as ground truth for ML upscaling. Or would it need to be at much higher quality than the output that is required?
In that case it auto pauses game and renders at 8k with super high AA and AF, even if it took 5 seconds to do it.
 
The xbox series x isn't much bigger than the xso. Its just one dimension that is bigger while another is smaller and another is the same.

Hopefully where you want to put it has the room for it.
 
I was thinking on the lines of using the raw texture as the ground truth, jpeg as the input, train it to get from jpeg to raw texture.

Also was wondering if could then ship with say 2 mip levels, and if the training could take jpeg mip 0 to output mip 0-2, and jpeg mip 3 to generate 3-5 in real time.
May take less resources to do the ML to convert to the lower mip level.
Could then ship with not only smaller assets sizes but also fewer, and also wouldn't need to swap them in and out of memory as much either.

Wonder if ML would work well with SFS, would ML do better with the full jpeg texture in memory?

But anyway they have all those textures in the stores already, be a shame to not be able to use them for training.

Also feel that, during all that 200,000 hours of play testing they plan to do before xsx release, it should be grabing screen shots automatically for use as ground truth for ML upscaling. Or would it need to be at much higher quality than the output that is required?
In that case it auto pauses game and renders at 8k with super high AA and AF, even if it took 5 seconds to do it.

If you're doing that much, just figure out how to run the Substance Designer graphs in realtime, lossless and no tiling at all assuming the artist is decent.
 
well project griffin is now in a bigger internal test . So it looks like it will be ready for launch frame of xbox series x

2013 netflix were going to release its own console with the netflix app. It was pulled last minute .. It was called Project Griffin ...

Has this come back to life, using lockhart as the base hardware :)
 
2013 netflix were going to release its own console with the netflix app. It was pulled last minute .. It was called Project Griffin ...

Has this come back to life, using lockhart as the base hardware :)
were they ? Sounds like something DOA to be honest.

But no MS's project griffin is not hardware
 
If you're doing that much, just figure out how to run the Substance Designer graphs in realtime, lossless and no tiling at all assuming the artist is decent.
Which part are you talking about?
The training of the models, or using different models against less jpegs when doing the up conversion?

For the training was thinking of a way to make use of all the assets they already have available. (obviously with studio permission)
 
Frenetic Pony means running the texture generation tools in realtime so you don't have texture-maps to upscale and can instead procedurally create textures at 'infinite' detail.

Which is great in theory, but I expect it's so processor intensive that ML upscaling is the better solution.
 
From an interview with Technical Director at Codemasters David Springate about optimizing Dirt 5 for Xbox Series X:



Link: https://news.xbox.com/en-us/2020/06/10/inside-xbox-series-x-optimized-dirt-5/

I'm a bit late to this conversation , but I find it immensely fascinating.

Being able to stream in and load/unload assets (textures etc) at such speeds i feel is going to be a common trend this generation.

Ratchet and Clank shows this well for PS5 .. And i believe Xbox Series X has its equivalent in "The Medium" ..

The Medium apparently renders 2 worlds simultaneously, the "real world" and the "shadow world" ..

There are several actors in the game, the main being a "medium" has the ability to instantly travel between the 2, clicking a single button instantly (within seconds) repaints the entire screen space, unloading one world whilst loading the other. I was wondering how it was loading/unloading all those assets near realtime ..

And it showed doing this at varying draw distances, in one case it was in a room, and another it was a couple of hundred meters where an entire building flipped between real/shadow..

And apparently the 2 worlds arnt the only triggers for full screen space redrawing. The medium is able to interact with ghosts and then travel instantly to the ghosts point of view , eg. the medium interacts with a ghost in the shadow world, and instantly redraws the world again to a point in time when the ghost died, reliving that moment..

I can't even imagine how its doing all this redrawing ..
 
I'm a bit late to this conversation , but I find it immensely fascinating.

Being able to stream in and load/unload assets (textures etc) at such speeds i feel is going to be a common trend this generation.

Ratchet and Clank shows this well for PS5 .. And i believe Xbox Series X has its equivalent in "The Medium" ..

The Medium apparently renders 2 worlds simultaneously, the "real world" and the "shadow world" ..

There are several actors in the game, the main being a "medium" has the ability to instantly travel between the 2, clicking a single button instantly (within seconds) repaints the entire screen space, unloading one world whilst loading the other. I was wondering how it was loading/unloading all those assets near realtime ..

And it showed doing this at varying draw distances, in one case it was in a room, and another it was a couple of hundred meters where an entire building flipped between real/shadow..

And apparently the 2 worlds arnt the only triggers for full screen space redrawing. The medium is able to interact with ghosts and then travel instantly to the ghosts point of view , eg. the medium interacts with a ghost in the shadow world, and instantly redraws the world again to a point in time when the ghost died, reliving that moment..

I can't even imagine how its doing all this redrawing ..

Yea...The two world thing is neatly described in this patent:
The method of simultaneous playing in single-player video games (WO2017171567A1)
https://patents.google.com/patent/WO2017171567A1/en?oq=bloober+sa
 
DirectML: A hypothesis

Reading through the directml patent by MS, there is a very interesting passage:


"The devices and method may also modify MIP chains used with textures to reduce the size of applications on the hard disk, optical media, or when downloaded over the internet. MW chains may consists of a plurality of images to use in textures, each of which may be a progressively lower resolution representation of the same image. A MIP chain may be created by repeatedly down-sizing the first image and/or one or more intermediate images in the MIP chains. For example, the first image of a MIP chain may take up majority of the storage space for the MW chain. The devices and methods may remove the first image and/or one or more intermediate images of the MIP chain and generate modified MW chains to transmit with the applications in order to reduce input/output bandwidth and/or reduce the size of applications on the hard disk, optical media, or when downloaded over the internet.

The devices and methods may reconstruct the deleted first image and/or any removed intermediate images of the MIP chain, at runtime of the application and/or application installation, using, for example, a trained machine learning model, such as a generative adversarial network (GAN), and generate a hardware compatible compressed reconstructed MIP chain. The trained machine learning model may receive and/or otherwise access the modified MW chain and may reconstruct the first image and/or any removed intermediate images of the MIP chain by upscaling the top image and/or a next largest intermediate image to recreate the missing image in the modified MW chain and decompressing the reconstructed MIP chain directly into a hardware compatible compressed reconstructed MW chain, such as a block compressed MIP chain."

Going back to the sampler feedback streaming framework, sampler feedback returns the following information:

(1) Where exactly in the texture sampling was done
(2) At what MIP level

The hardware addressing system described in the SFS patent will then presumably check for the required page in DRAM if it is not resident in the GPU caches. What if it is not found in RAM? It may or may not be streamed directly from the SSD. A better way is to generate it on the fly using the information provided by sampler feedback and to seamlessly blend it in using the custom bilinear texture filters.
 
The devices and methods may reconstruct the deleted first image and/or any removed intermediate images of the MIP chain, at runtime of the application and/or application installation, using, for example, a trained machine learning model, such as a generative adversarial network (GAN), and generate a hardware compatible compressed reconstructed MIP chain. The trained machine learning model may receive and/or otherwise access the modified MW chain and may reconstruct the first image and/or any removed intermediate images of the MIP chain by upscaling the top image and/or a next largest intermediate image to recreate the missing image in the modified MW chain and decompressing the reconstructed MIP chain directly into a hardware compatible compressed reconstructed MW chain, such as a block compressed MIP chain."
Looks like I should've read the patent, as this was what I was talking about. Thanks.
 
In other news, increased efficiency in GPU driven primitives culling via the visibilty buffers with an ultra heavy dose of ExecuteIndirect. Iroboto should be happy:

https://patents.google.com/patent/WO2019204064A1/en?inventor=Martin+Jon+Irwin+FULLER

There is a nice presentation from Frostbite in this regard with name drops of usual suspects Alex Nankervis and James Stanard:

https://frostbite-wp-prd.s3.amazonaws.com/wp-content/uploads/2016/03/29204330/GDC_2016_Compute.pdf
 
@Ronaldo8 That's actually pretty cool. If you're using full mip-chain with partial residency, use sampler feedback to find the correct mip level and then use ML to upscale the correct mip region/tile, so you don't even need to upscale the whole texture.
 
Unless you can explain how to cope with the SSD latency, that's not an option. ;) Ergo upscaling the MIPs or something else will be used instead. ML texture construction sounds like a great move.
You gave the answer yourself:
Why is a high MIP something you'd pull on demand? As in, why is that more latency tolerant? The issue isn't BW but the time it takes from sampler feedback stating during texture sampling (as the object is being drawn), "I need a higher LOD on this texture" and that texture sampler getting new texture data from the SSD.

Texturing on GPUs is only fast and effective because the textures are pre-loaded into the GPU caches for the texture samplers to read. The regular 2D data structure and data access makes caching very effective. The moment texture data isn't in the texture cache, you have a cache miss and stall until the missing texture data, many nanoseconds away, is loaded. At that point, fetching data from SSD is clearly an impossible ask.

The described systems included mip mapping and feedback to load and blend better data in subsequent frames. You want to render a surface. The required LOD isn't in RAM so you use the existing lower LOD to draw that surface, and start the fetching process. When the higher quality LOD is loaded a frame or two later, you either have pop-in or you can blend between LOD levels, aided by SFS if that is present.

When it comes to mid-frame loads as described in that theoretical suggestion in the earlier interview (things to look into for the future), we'd be talking about replacing data that's no longer needed this frame. There's no way mid-rendering data from storage is every going to happen on anything that's not approaching DRAM speeds. The latencies are just too high.

Case closed.
 
Back
Top