Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

DavidGraham · May 5, 2020

The guy is a complete waste of time .. his track record is non existent, he talks about so many things with no technical basis to most of his claims. Most of his videos are pure sensationalism that is filled with so much over generalizations.

pjbliverpool said:
Also interesting to hear him saying the XSX will be as powerful as a 2080Ti too (and that would be before accounting for console optimisations).

I would say the upper case of XSX WITH console optimizations is the 2080, maybe the 2080 Super, any game that would stress both the CPU and GPU will slash the memory bandwidth of the device due to memory contention. This will suppress the performance of the GPU, capping it's effective bandwidth well below the 2080.

ShaidarHaran · May 5, 2020

DavidGraham said:
The guy is a complete waste of time .. his track record is non existent, he talks about so many things with no technical basis to most of his claims. Most of his videos are pure sensationalism that is filled with so much over generalizations.

I would say the upper case of XSX WITH console optimizations is the 2080, maybe the 2080 Super, any game that would stress both the CPU and GPU will slash the memory bandwidth of the device due to memory contention. This will suppress the performance of the GPU, capping it's effective bandwidth well below the 2080.

This video reminded me why I stopped clicking on his videos. Wannabe analyst with an obvious vendor bias and no engineering background.

Frenetic Pony · May 6, 2020

pjbliverpool said:
But surely that's not applicable in the PC space with varying memory configurations. How can a developer plan the optimal dataset to store in vram vs ram vs ssd if they don't know how much of each pool is going to be in the target system?

Minimum requirements are a thing.

pjbliverpool · May 6, 2020

Frenetic Pony said:
Minimum requirements are a thing.

Yes but say 8GB vram + 16GB RAM + SATA SSD (let's call that min spec) would presumably require significantly different hand optimisation to 12GB vram + 16GB RAM + Gen4 NVMe SSD for example. The combinations of different memory sizes and speeds is huge so I don't see how you can hand optimise for everything up front whereas my understanding of HBCC is that it manages the different memory tiers automatically like cache.

Edit: come to think of it, isn't that how it's supposed to work in the PS5 as well? Cerny said that the game engine didn't need to know what data was stored in which memory partition, it just called for it and the system itself managed data movement optimally. That sounds a lot like HBCC to me.

manux · May 6, 2020

pjbliverpool said:
Yes but say 8GB vram + 16GB RAM + SATA SSD (let's call that min spec) would presumably require significantly different hand optimisation to 12GB vram + 16GB RAM + Gen4 NVMe SSD for example. The combinations of different memory sizes and speeds is huge so I don't see how you can hand optimise for everything up front whereas my understanding of HBCC is that it manages the different memory tiers automatically like cache.

Edit: come to think of it, isn't that how it's supposed to work in the PS5 as well? Cerny said that the game engine didn't need to know what data was stored in which memory partition, it just called for it and the system itself managed data movement optimally. That sounds a lot like HBCC to me.

Sony and hbcc is not same. HBCC in essence is page swapping. Unused pages are moved to disk and pages are loaded back in miss. And there could be some arbitrary helper logic to try to avoid misses. This is something that just works without engine integration but when miss happens it's very expensive as the data is not in ram and needs to be loaded from disk, inserted to ram and then gpu can continue.

Sony solution requires developer to explicitly load content via those 6 different priority queues and by extension manually remove data from ram to make space for new data. Where sony has smartness is that the controller decompresses content, manages cache lines and loads the data directly to given address without going through cpu/os layer. i.e. data goes straight from disk to ram via dma. Developer still needs to manually manage what is loaded and when and what is discarded from ram to make space for new data. When the data is discarded from ram and replaced with new streamed content the cache lines pointing to old data in ram must be cleared. That's part of the cache scrubbers sony has implemented in hw.

CarstenS · May 7, 2020

pjbliverpool said:
Edit: come to think of it, isn't that how it's supposed to work in the PS5 as well? Cerny said that the game engine didn't need to know what data was stored in which memory partition, it just called for it and the system itself managed data movement optimally. That sounds a lot like HBCC to me.

Isn't that just unified memory?

Kaotik · May 7, 2020

CarstenS said:
Isn't that just unified memory?

The two don't count each other out. It's "just unified memory" if you don't care about speed and you just want to address single memory space. HBCC allows taking things further by taking SSD in too and allowing page-based addressing everywhere, which allows "anyhing" to be loaded up quick no matter where it is

CarstenS · May 7, 2020

I was referring to Cerny's "the game engine didn't need to know what data was stored in which memory partition, it just called for it and the system itself managed data movement optimally."

TBC, does Cerny include the SSD or Game-Servers in "memory partitions"?

Kaotik · May 7, 2020

CarstenS said:
I was referring to Cerny's "the game engine didn't need to know what data was stored in which memory partition, it just called for it and the system itself managed data movement optimally."

TBC, does Cerny include the SSD or Game-Servers in "memory partitions"?

Servers certainly not, but if they're using HBCC-esque memory controller, SSD should be included - I mean, what would be the point of emphasizing how it doesn't matter in which memory something is, PS4 already had unified address space for CPU & GPU in one unified memory didn't it?

manux · May 7, 2020

CarstenS said:
I was referring to Cerny's "the game engine didn't need to know what data was stored in which memory partition, it just called for it and the system itself managed data movement optimally."

TBC, does Cerny include the SSD or Game-Servers in "memory partitions"?

Cerny didn't claim ssd looks like ram. What cerny claimed is there is 6 priority queues developer can use to fetch data from ssd into ram very efficiently. i.e. developer has to manually manage data/memory.

To me it feels microsoft is taking similar approach. Microsoft side will be clear once microsoft releases DirectStorage api specification.

seahorsesaw · May 9, 2020

Ray tracing capabilities of Ampere 4x that of Turing allegedly.

https://www.pcgamer.com/uk/nvidia-ampere-turing-not-aging-well/

Rootax · May 11, 2020

It's based on the Moore's Law is Dead vidéo. So, not that reliable imo...

Konan65 · May 12, 2020

Moore's Law is Dead update video -

GA102 has apparently 384-bit Bus Width | 5376 CUDA Cores | 230W | 18gbps and boosts 2.2GHz +.
864GB/s bandwidth (40% more that 2080Ti) | Overall performance 50% faster than 2080 Ti

techuse · May 12, 2020

Konan65 said:
Moore's Law is Dead update video -

GA102 has apparently 384-bit Bus Width | 5376 CUDA Cores | 230W | 18gbps and boosts 2.2GHz +.
864GB/s bandwidth (40% more that 2080Ti) | Overall performance 50% faster than 2080 Ti

Now 50% i can believe. I think hes just guessing but 50% will probably be pretty accurate.

Frenetic Pony · May 12, 2020

Boost clock seems pretty damned high, especially since it's supposed to be both cutting down on power use AND boosting it over 20% higher than their previous highest clock. But the bus width and memory speed actually add up to the correct number! So, at least that's good. And the actual performance boost seems within reach.
BUT... this is literally just the leak from like, over a month ago as far as actual information involved that you can see a few pages back. That this guy wasn't involved in at all.

And while I still doubt the ram specs are a good for the consumer as devs will clearly, again, have finer grained control over memory in consoles than the PC; well looking on it that doesn't mean it's not what Nvidia is going to do anyway. I can sympathize slightly, ram to bus width is hard to figure out in terms of matching the new consoles. Sure you want at least 10gb at minimum, thanks MS, and 12gb should play it safe. But those are both fairly awkward numbers to hit with usual bus widths. Do you go a full 16gb for your standard mid tier 256bit bus? But then that means your high tier ones need like, 20gb and 24gb or whatever right? Seems like a path towards frustration and excessive material costs that may not see much use.

Kugai Calo · May 12, 2020

I'm curious whether they will make SIMD width match warp size, like what AMD did with RDNA.

Rootax · May 12, 2020

Sorry for the question, and it's maybe cross topic with rdna2, but, given the latest rumors, Ampere (for gaming) is coming before rdna 2, right ?

DegustatoR · May 12, 2020

Kugai Calo said:
I'm curious whether they will make SIMD width match warp size, like what AMD did with RDNA.

They've done that back in Kepler I think?

pjbliverpool · May 12, 2020

techuse said:
Now 50% i can believe. I think hes just guessing but 50% will probably be pretty accurate.

Some really interesting info in that video (again). It's all still sounding quite plausible to me and he's seems to have staked a lot of his credibility on this as he's presenting a lot of very specific information as factual rather than speculation. If he's wrong about even half of it his credibility is going to be shot.

He also seems to mix factual leak info with his own speculation without clearly indicating which is which. That's particularly apparent in the Tensor compressed video memory claims where previously he talked about this as essentially increasing both video size and bandwidth, but now we learn that it actually comes with a performance penalty and so is really only useful at giving the GPU some extra VRAM space if it runs out, and it's a toggle, not on as default - this sounds far more believable than the previous claim.

The thing that most interests me is NVCache and we should learn whether that's real or not in the HPC launch in a few days. So that should give a good indicator as to the reliability of the rest of this info.

The claims on DLSS 3.0 were interesting too. Nvidia will override settings in some games forcing it on?? A controversial move if so....

trinibwoy · May 12, 2020

Kugai Calo said:
I'm curious whether they will make SIMD width match warp size, like what AMD did with RDNA.

Or like Nvidia did with Maxwell and Pascal?

That would require more instruction scheduling hardware or abandoning the separate INT pipeline.

Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

DavidGraham

ShaidarHaran

hardware monkey

Frenetic Pony

pjbliverpool

B3D Scallywag

manux

CarstenS

Moderator

Kaotik

Drunk Member

CarstenS

Moderator

Kaotik

Drunk Member

manux

seahorsesaw

Rootax

Konan65

techuse

Frenetic Pony

Kugai Calo

Rootax

DegustatoR

pjbliverpool

B3D Scallywag

trinibwoy

Meh

Similar threads