That RDNA 1.8 Consoles Rumor spawn

Karamazov · Jul 23, 2020

stacking things can be dangerous, like when Sony was claiming 2TF for PS3 by "stacking"

BRiT · Jul 23, 2020

VRS could be more disruptive on performance efficiency but not right now. It wont be for a couple years assuming there is ongoing research right now using it in other areas, like lighting. One of the online breakdowns was either in a video or an article raised this as a potential.

function · Jul 23, 2020

Yup, the evidence so far is 10~20% for a high quality, mostly invisible implementation.

And I'm not expecting many implementations at all this year...

iroboto · Jul 24, 2020

Karamazov said:
there should be a "lol" button next to the like button, would be useful !

I could use this button. I would rather hit the LOL button than ignore some people

Lurkmass · Jul 24, 2020

function said:
C'mon bro, no-one has time to go running in circles.

If it's not an improvement over the current methods then it's not really an advantage, now is it ? :smile2:

function said:
The post you had responded to was about one of the potential benefits of real time upresing, namely the impact on a texture streaming system of being able to stream smaller textures and ML upres in real time. Basically, about the direct relationship between a hypothetical ML uprezzer and texture streaming requirements.

You responded "And texture streaming is an orthogonal issue altogether", leading me to think you were saying that ML upressing in real time is unrelated to a game's texture streaming requirements.

If what you meant was "streaming requirements are independent of a given drives capabilities", then yeah. But that, ironically, is independent of the point I was trying to make...

Sure but using ML to do data compression for texture streaming isn't the only option as we've been optimizing data compression to aid texture streaming ever since S3TC was standardized!

In other words you could already stream smaller sized textures from the drive without using ML and it's widely practiced this way too in the industry ...

function said:
I know this quote was directed to eastmen, but I wanted to point out that the training data could easily be uncompressed 24-bit 8K asset textures (or higher for future compatibility), with the machine tuned on a per-game or even a per-level or per-material basis (whatever you want). So static images with definite states. Far more optimal than the situation for DLSS, or MS's own ML HDR for backwards compatibility (ML is everywhere!!) which both have to work on very dynamic frame-buffers.

The training data isn't the only input that you have to consider. The size of the input texture data needs to be considered as well. There's a data/quality trade-off that you're going to have to make as ML texture upscaling is a lossless method so using a very small texture size could severely degrade the quality of the reconstruction ...

It won't matter how good your trained model is if it lacks a sufficient amount of data ...

Ronaldo8 said:
By updatetilemappings, you mean the mapping of the virtual texture to the tile pool done by the hardware pagetable?

Page table mappings on the GPU are managed using the UpdateTileMappings API and it's likely done using the CPU as AMD and Nvidia suggests on page 44 so it's an expensive operation. What I want to know is if whether or not RDNA2 has a different way of updating the page table mappings but based off of AMD's recommendation in their RDNA optimization guide, I'm not holding out hope that they've fixed this issue at all ...

Years later we still have performance problems binding a tiled resource to a memory heap ...

QPlayer · Jul 24, 2020

smithg said:
I realize it’s an over simplification, but in principal a performance advantage would stack with an efficiency improvement.

1.18 (raw performance advantage) x 1.2 (reasonable estimate for VRS efficiency improvement) = 1.42

Exactly! And perhaps this is just an underestimated value. Microsoft does not accidentally promote the VRS technique, it probably has more potential.

Deleted member 86764 · Jul 24, 2020

QPlayer said:
Exactly! And perhaps this is just an underestimated value. Microsoft does not accidentally promote the VRS technique, it probably has more potential.

I heard that the Xbox Series X is a quantum machine and lives in a state of infinite probability. Just don't open it up or the wave function will collapse and you may not be happy with the outcome.

turkey · Jul 24, 2020

QPlayer said:
Exactly! And perhaps this is just an underestimated value. Microsoft does not accidentally promote the VRS technique, it probably has more potential.

Sony clearly picked the most custom aspect to talk about. They probably hear rumblings of the TF difference.

The SSD is great for them as it's unmatched, as much is discussed in relation to it.

Microsoft know they cannot easily compete, velocity architecture seems to be almost smoke and mirrors and very unclear as to what or how it works.

Microsoft need a good PR line of attack, they may know Sony does not have VRS, but even if they do Microsoft have their own implementation to shout about so double win.

It's more about PR than performance I would say. No doubt it's got performance implications as Microsoft has invested part of its budget into it but we are hearing about it repeatedly because the other side cannot directly counter.

Jay · Jul 24, 2020

turkey said:
velocity architecture seems to be almost smoke and mirrors and very unclear as to what or how it works.

What parts is smoke and mirrors?
M. 2 2.4gbs.
Hardware decompression
More modern newer storage api (which we all know is a huge ball and chain in windows)

Have they given a build type presentation on it, no. But that's hardly unclear or smoke and mirrors.
How it all performs in the wild is a different matter.

MrFox · Jul 24, 2020

ThePissartist said:
I heard that the Xbox Series X is a quantum machine and lives in a state of infinite probability. Just don't open it up or the wave function will collapse and you may not be happy with the outcome.

So it's not two stacked SoC, it's two superpositioned SoC!

function · Jul 24, 2020

Jay said:
What parts is smoke and mirrors?
M. 2 2.4gbs.
Hardware decompression
More modern newer storage api (which we all know is a huge ball and chain in windows)

Have they given a build type presentation on it, no. But that's hardly unclear or smoke and mirrors.
How it all performs in the wild is a different matter.

It's looking like we might now know how MS are achieving the reduced overheads and reduced SSD latency they've talked about. @Ronaldo8 found a really interesting MS research paper from 2015 (perfect timing) that backs up some ideas a few of us had been kicking around. It's in the Velocity architecture thread in the tech forum. Pretty cool stuff.

Latency may (and I say 'may'!) be one of the only areas where MS's storage solution has a bit of an advantage over competitors. Though it's still going to be a lot slower than dram, of course.

manux · Jul 24, 2020

function said:
It's looking like we might now know how MS are achieving the reduced overheads and reduced SSD latency they've talked about. @Ronaldo8 found a really interesting MS research paper from 2015 (perfect timing) that backs up some ideas a few of us had been kicking around. It's in the Velocity architecture thread in the tech forum. Pretty cool stuff.

Latency may (and I say 'may'!) be one of the only areas where MS's storage solution has a bit of an advantage over competitors. Though it's still going to be a lot slower than dram, of course.

How come do you think there could be advantage in latency? Sony solution moves the data straight from io-controller to ram via dma. No way to make it less latency as the data moves directly without going through any host os layers. The cache scrubbers implemented both in gpu and io-controllr should also help here as coherency is achieved without os/cpu having to do cache line invalidation calls. Multiple priorities implemented in io controller and api should also help.

function · Jul 24, 2020

manux said:
How come do you think there could be advantage in latency? Sony solution moves the data straight from io-controller to ram via dma. No way to make it less latency as the data moves directly without going through any host os layers. The cache scrubbers implemented both in gpu and io-controllr should also help here as coherency is achieved without os/cpu having to do work.

The MS research paper, which talks about a technique that saves substantially on latency and overhead, wouldn't work as effectively (as far as I can tell) with with a drive that has to manage its own flash translation layer. My suspicion has been for a while that MS are allowing developers to directly map (extended?) memory address to a physical address on the SSD. And Zen 2 is an awful lot faster than an SSD embedded Arm processor too.

An earlier MS research paper estimated the FTL cost at around 30 microseconds. Even if modern drives have reduced that, there's still going to be a cost. My thought is that as Sony are supporting a range of third party drives with performance seemingly being the only limiting factor, and as I expect that the add-on drive will have to manage its own FTL, that for Sony greater drive latencies have to be planned for and potentially accommodated. MS otoh control exactly which drive and controller can work with them.

This is still conjecture though, as nothing has been confirmed by MS. And I do expect Sony to have lower latency and lower overhead access than typical PC drives anyway. Plus, once you exceed your transfer bandwidth your latency will go to crap anyway, and Sony certainly have an advantage there, for sure.

manux · Jul 24, 2020

function said:
The MS research paper, which talks about a technique that saves substantially on latency and overhead, wouldn't work as effectively (as far as I can tell) with with a drive that has to manage its own flash translation layer. My suspicion has been for a while that MS are allowing developers to directly map (extended?) memory address to a physical address on the SSD. And Zen 2 is an awful lot faster than an SSD embedded Arm processor too.

An earlier MS research paper estimated the FTL cost at around 30 microseconds. Even if modern drives have reduced that, there's still going to be a cost. My thought is that as Sony are supporting a range of third party drives with performance seemingly being the only limiting factor, and as I expect that the add-on drive will have to manage its own FTL, that for Sony greater drive latencies have to be planned for and potentially accommodated. MS otoh control exactly which drive and controller can work with them.

This is still conjecture though, as nothing has been confirmed by MS. And I do expect Sony to have lower latency and lower overhead access than typical PC drives anyway. Plus, once you exceed your transfer bandwidth your latency will go to crap anyway, and Sony certainly have an advantage there, for sure.

If you are talking about sampler feedback then that is interesting idea. I was talking about pure latency of io subsystem.

Sampler feedback first has to see a miss. Once the misses happen then they can be queued to be fetched. Playing devil's advocate the issue with this approach is that it is very much after the fact kind of approach. It could be that by the time the missed pages are in memory they are not needed anymore. It will be interesting to see if sampler feedback, miss and then fetch approach is better than use more cpu upfront to figure out what is needed and avoid the initial miss to begin with. My favorite idea for this is to train a dnn to fetch textures based on scene data+player movement and see if neural network could predict what is needed and then fetch data to avoid the misses as dnn predicts very well what is needed.

function · Jul 24, 2020

manux said:
If you are talking about sampler feedback then that is interesting idea. I was talking about pure latency of io subsystem.

Sampler feedback first has to see a miss. Once the misses happen then they can be queued to be fetched.

Actually I'm thinking about this boi - the FlashMap. Three layers of address translation into one with, I think, the map stored in reserved system dram with the SoC itself doing the simplified translation.

Ronaldo8 said:
It seems that the directstorage riddle has been resolved:

We have long suspected that MS has figured out a way of memory mapping a portion of the SSD and to reduce the I/O overhead considerably. I looked out for research on SSD storage from Xbox research members with no success until I realised that I was looking in the wrong place to begin with. MS research happens to count within its ranks Anirudh Badam as Principal Research Scientist. The latter has a paper published in IEEE about the concept of flashmap which subsumes three layers of address of translation into one (https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/flashmap_isca2015.pdf). The claimed performance gain is a reduction of latency of SSD access by up to 54%.

Playing devil's advocate the issue with this approach is that it is very much after the fact kind of approach. It could be that by the time the missed pages are in memory they are not needed anymore. It will be interesting to see if sampler feedback, miss and then fetch approach is better than use more cpu upfront to figure out what is needed and avoid the initial miss to begin with. My favorite idea for this is to train a dnn to fetch textures based on scene data+player movement and see if neural network could predict what is needed and then fetch data to avoid the misses as dnn predicts very well what is needed.

This is an very interesting idea though! There are surely behaviours that are more likely than others given the current set of circumstances, and much like speculative execution there have to be some definite, statistics based rules you can come up with from play testing and open betas....

With SFS, I think the odds are that even if a particular texel is no longer needed some other texel on the mip page will be. So even if it is a reactionary thing, I think it will still pay off pretty well in most movement scenarios...

turkey · Jul 24, 2020

Jay said:
What parts is smoke and mirrors?
M. 2 2.4gbs.
Hardware decompression
More modern newer storage api (which we all know is a huge ball and chain in windows)

Have they given a build type presentation on it, no. But that's hardly unclear or smoke and mirrors.
How it all performs in the wild is a different matter.

The PR around it.

It is a collection of technology and it addresses many things in the IO chain,.it's just become their buzzword for marketing.

Blast processing 2 :runaway:

;-)

Rangers · Jul 24, 2020

turkey said:
The PR around it.

It is a collection of technology and it addresses many things in the IO chain,.it's just become their buzzword for marketing.

Blast processing 2 ;-)

Spoilers, corporations like marketing

I think "Smart Delivery" is a good one. It's easily understandable and instantly caught on.

Rootax · Jul 25, 2020

Should we rename this thread rdna 2.9 ?

(Or "it seems the rdna name doesn't matter")

eastmen · Jul 25, 2020

Rootax said:
Should we rename this thread rdna 2.9 ?

(Or "it seems the rdna name doesn't matter")

So the theory is that Sony has stuff in their gpu that wont launch in a flagship amd part until 2022 ? I find that very hard to believe. IF that was the case why would AMD bother with RDNA 2 when they could just produce the sony rdna chip on its own and release it

Deleted member 7537 · Jul 25, 2020

Although I think it’s a fanboys dream. There have been a lot of stories of how Sony and AMD have been working very closely, to the point some outlets said Navi was pretty much designed in a joint venture with Sony. It may be the case where Sony did help to develop features that AMD will use in future cards, Cerny himself said this in his presentation. In addition, I’m pretty surprised that the only thing Cerny talked about was the geometry engine and raytracing, while we know there are a lot of innovations from GCN. Maybe, and only maybe, Sony is not disclosing features because they are still binded by an NDA with AMD until RDNA 3.0 features are made public. But this is, as I said, extremely unlikely.

25:30

That RDNA 1.8 Consoles Rumor spawn

Karamazov

BRiT

(>• •)>⌐■-■ (⌐■-■)

function

None functional

iroboto

Daft Funk

Lurkmass

QPlayer

Deleted member 86764

Guest

turkey

Jay

MrFox

Deludedly Fantastic

function

None functional

manux

function

None functional

manux

function

None functional

turkey

Rangers

Rootax

eastmen

Deleted member 7537

Guest

Similar threads

That RDNA 1.8 Consoles Rumor *spawn*

(>• •)>⌐■-■ (⌐■-■)

None functional

Daft Funk

Deleted member 86764

Guest

Deludedly Fantastic

None functional

None functional

None functional

Deleted member 7537

Guest

Similar threads

That RDNA 1.8 Consoles Rumor spawn