I could use this button. I would rather hit the LOL button than ignore some peoplethere should be a "lol" button next to the like button, would be useful !
C'mon bro, no-one has time to go running in circles.
The post you had responded to was about one of the potential benefits of real time upresing, namely the impact on a texture streaming system of being able to stream smaller textures and ML upres in real time. Basically, about the direct relationship between a hypothetical ML uprezzer and texture streaming requirements.
You responded "And texture streaming is an orthogonal issue altogether", leading me to think you were saying that ML upressing in real time is unrelated to a game's texture streaming requirements.
If what you meant was "streaming requirements are independent of a given drives capabilities", then yeah. But that, ironically, is independent of the point I was trying to make...
I know this quote was directed to eastmen, but I wanted to point out that the training data could easily be uncompressed 24-bit 8K asset textures (or higher for future compatibility), with the machine tuned on a per-game or even a per-level or per-material basis (whatever you want). So static images with definite states. Far more optimal than the situation for DLSS, or MS's own ML HDR for backwards compatibility (ML is everywhere!!) which both have to work on very dynamic frame-buffers.
By updatetilemappings, you mean the mapping of the virtual texture to the tile pool done by the hardware pagetable?
Exactly! And perhaps this is just an underestimated value. Microsoft does not accidentally promote the VRS technique, it probably has more potential.I realize it’s an over simplification, but in principal a performance advantage would stack with an efficiency improvement.
1.18 (raw performance advantage) x 1.2 (reasonable estimate for VRS efficiency improvement) = 1.42
Exactly! And perhaps this is just an underestimated value. Microsoft does not accidentally promote the VRS technique, it probably has more potential.
Exactly! And perhaps this is just an underestimated value. Microsoft does not accidentally promote the VRS technique, it probably has more potential.
What parts is smoke and mirrors?velocity architecture seems to be almost smoke and mirrors and very unclear as to what or how it works.
So it's not two stacked SoC, it's two superpositioned SoC!I heard that the Xbox Series X is a quantum machine and lives in a state of infinite probability. Just don't open it up or the wave function will collapse and you may not be happy with the outcome.
What parts is smoke and mirrors?
M. 2 2.4gbs.
Hardware decompression
More modern newer storage api (which we all know is a huge ball and chain in windows)
Have they given a build type presentation on it, no. But that's hardly unclear or smoke and mirrors.
How it all performs in the wild is a different matter.
It's looking like we might now know how MS are achieving the reduced overheads and reduced SSD latency they've talked about. @Ronaldo8 found a really interesting MS research paper from 2015 (perfect timing) that backs up some ideas a few of us had been kicking around. It's in the Velocity architecture thread in the tech forum. Pretty cool stuff.
Latency may (and I say 'may'!) be one of the only areas where MS's storage solution has a bit of an advantage over competitors. Though it's still going to be a lot slower than dram, of course.
How come do you think there could be advantage in latency? Sony solution moves the data straight from io-controller to ram via dma. No way to make it less latency as the data moves directly without going through any host os layers. The cache scrubbers implemented both in gpu and io-controllr should also help here as coherency is achieved without os/cpu having to do work.
The MS research paper, which talks about a technique that saves substantially on latency and overhead, wouldn't work as effectively (as far as I can tell) with with a drive that has to manage its own flash translation layer. My suspicion has been for a while that MS are allowing developers to directly map (extended?) memory address to a physical address on the SSD. And Zen 2 is an awful lot faster than an SSD embedded Arm processor too.
An earlier MS research paper estimated the FTL cost at around 30 microseconds. Even if modern drives have reduced that, there's still going to be a cost. My thought is that as Sony are supporting a range of third party drives with performance seemingly being the only limiting factor, and as I expect that the add-on drive will have to manage its own FTL, that for Sony greater drive latencies have to be planned for and potentially accommodated. MS otoh control exactly which drive and controller can work with them.
This is still conjecture though, as nothing has been confirmed by MS. And I do expect Sony to have lower latency and lower overhead access than typical PC drives anyway. Plus, once you exceed your transfer bandwidth your latency will go to crap anyway, and Sony certainly have an advantage there, for sure.
If you are talking about sampler feedback then that is interesting idea. I was talking about pure latency of io subsystem.
Sampler feedback first has to see a miss. Once the misses happen then they can be queued to be fetched.
It seems that the directstorage riddle has been resolved:
We have long suspected that MS has figured out a way of memory mapping a portion of the SSD and to reduce the I/O overhead considerably. I looked out for research on SSD storage from Xbox research members with no success until I realised that I was looking in the wrong place to begin with. MS research happens to count within its ranks Anirudh Badam as Principal Research Scientist. The latter has a paper published in IEEE about the concept of flashmap which subsumes three layers of address of translation into one (https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/flashmap_isca2015.pdf). The claimed performance gain is a reduction of latency of SSD access by up to 54%.
Playing devil's advocate the issue with this approach is that it is very much after the fact kind of approach. It could be that by the time the missed pages are in memory they are not needed anymore. It will be interesting to see if sampler feedback, miss and then fetch approach is better than use more cpu upfront to figure out what is needed and avoid the initial miss to begin with. My favorite idea for this is to train a dnn to fetch textures based on scene data+player movement and see if neural network could predict what is needed and then fetch data to avoid the misses as dnn predicts very well what is needed.
What parts is smoke and mirrors?
M. 2 2.4gbs.
Hardware decompression
More modern newer storage api (which we all know is a huge ball and chain in windows)
Have they given a build type presentation on it, no. But that's hardly unclear or smoke and mirrors.
How it all performs in the wild is a different matter.
The PR around it.
It is a collection of technology and it addresses many things in the IO chain,.it's just become their buzzword for marketing.
Blast processing 2 ;-)
Should we rename this thread rdna 2.9 ?
(Or "it seems the rdna name doesn't matter")