Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

Status
Not open for further replies.
Rather worried about Nvidia's bandwidth situation, GDDR6 is hitting a Max of 18gbps only at the end of the year and from the looks of it they've done all they reasonably can with compression.


If there's any truth to this rumor, Ampere will compensate GDDR6 bandwidth with Tensor-core accelerated VRAM compression and NVcache. Kinda similar to what XSX does with BPack compression and Velocity architecture.
 

If there's any truth to this rumor, Ampere will compensate GDDR6 bandwidth with Tensor-core accelerated VRAM compression and NVcache. Kinda similar to what XSX does with BPack compression and Velocity architecture.

Wow there was a mother load of information in that video if true and it certainly came across a legitimate to me.

That said, I don't think the Tensor-core accelerated VRAM compression is similar to what the XSX is doing as the XSX decompresses the data before it goes into RAM as far as I understand it. So that compression is just to aid SSD bandwidth/storage, nothing to do with directly enhancing the size/bandwidth of video memory through compression.

If anything it's NVCache that is similar to what the XSX and PS5 are doing in that it allows (like AMD's HBCC) GPU's to treat system memory and external storage drives (the SSD) as an extension of VRAM. The question then becomes how quick your connection is between SSD and VRAM which is what BCPack and the Velocity Architecture are designed to address on the XSX with Direct Storage and Gen4 NVMe SSD's will hopefully doing the same on the PC.

Also interesting to hear him saying the XSX will be as powerful as a 2080Ti too (and that would be before accounting for console optimisations). I reached the same conclusion myself recently.

The RT comparison between Turing and RDNA2 should also be very interesting based on his comments.
 
If there's any truth to this rumor, Ampere will compensate GDDR6 bandwidth with Tensor-core accelerated VRAM compression and NVcache. Kinda similar to what XSX does with BPack compression and Velocity architecture.
That sounds weird. I'm not saying it's impossible, but I can't see how this would work. Especially because tensor cores are quite far from a memory controller.
 
Given how catastrophically wrong all leakers were with Turing, I'll take everything with a hefty pinch a sodium-chloride. Though the Ampere rumors do sound fairly sensible, I've to admit.
 
Edit- someone already pointed out how dumb the thing sounded.

Anyway, I do wonder if the 14th thing is just the expected hpc/ai stuff and if there's going to be any consumer cards at all. We know that's at least part of the announcement.
 
Last edited:
Edit- someone already pointed out how dumb the thing sounded.

Anyway, I do wonder if the 14th thing is just the expected hpc/ai stuff and if there's going to be any consumer cards at all. We know that's at least part of the announcement.

I'm curious to understand why this is so improbable? I'm not saying it isn't, just that I have insufficient understanding of how it works to determine for myself.

I'm talking about the memory compression btw as I understood enough of the rest to know it was all within the realms of possibility.
 
That sounds weird. I'm not saying it's impossible, but I can't see how this would work. Especially because tensor cores are quite far from a memory controller.
Yeah, that certainly sounds strange. (And power hungry.)

Perhaps something only meant for bandwidth reduction.
Creating compressed read only HDR surface for post process pipeline. (Really do not know if tensor cores could be used for compression or anything like this would be feasible.)
 
Can someone who has seen the video please explain in a couple of sentences what sort of tensor-based compression algorithm this guy is talking about? I don't want to give that video any additional clicks. Is there a citation or did they cook something up in their head?
 
tl;dr from the video
nvamp-launchaojvw.png

nvamp-perf98k2d.png

nvamp-arc1ikf4.png
 
Aside from technical things the whole thing looks too predictable. There's nothing much surprising. AMD has HBCC? What if Nvidia had similar technology!

What is track of record of this guy?
 
Next gen consoles with their explicit developer controlled streaming from fast ssd kind of make hbcc pointless. It's better and more reliable performance if developers explicitly manage io streaming/memory instead of driver trying to guess what is not needed and removing stuff from ram behind the scenes. If hbcc guesses wrong the miss is very high latency and a surprise to the engine that thought data was loaded into ram.

edit. Basically we really would like to have games using a lot of streaming and DirectStorage once it's available for pc. That should allow very efficient utilization of memory and avoiding loading stuff to ram just in case(load times, wasted memory)
 
Last edited:
Next gen consoles with their explicit developer controlled streaming from fast ssd kind of make hbcc pointless. It's better and more reliable performance if developers explicitly manage io streaming/memory instead of driver trying to guess what is not needed and removing stuff from ram behind the scenes. If hbcc guesses wrong the miss is very high latency and a surprise to the engine that thought data was loaded into ram.

edit. Basically we really would like to have games using a lot of streaming and DirectStorage once it's available for pc. That should allow very efficient utilization of memory and avoiding loading stuff to ram just in case(load times, wasted memory)
Who's to say developers couldn't have control over how HBCC behaves, or what kind of enhancements AMD might have made since first introducing it?
 
Next gen consoles with their explicit developer controlled streaming from fast ssd kind of make hbcc pointless. It's better and more reliable performance if developers explicitly manage io streaming/memory instead of driver trying to guess what is not needed and removing stuff from ram behind the scenes. If hbcc guesses wrong the miss is very high latency and a surprise to the engine that thought data was loaded into ram.

edit. Basically we really would like to have games using a lot of streaming and DirectStorage once it's available for pc. That should allow very efficient utilization of memory and avoiding loading stuff to ram just in case(load times, wasted memory)

But surely that's not applicable in the PC space with varying memory configurations. How can a developer plan the optimal dataset to store in vram vs ram vs ssd if they don't know how much of each pool is going to be in the target system?
 
But surely that's not applicable in the PC space with varying memory configurations. How can a developer plan the optimal dataset to store in vram vs ram vs ssd if they don't know how much of each pool is going to be in the target system?

Based on ssd speed and available memory the engine would decide mip level of textures to be streamed and cached. Lower end systems get less quality textures and possibly more pop in(less cache memory). Should not be rocket science to automate this to decent level. Console games with more hand tuning could get better quality than pc which would be pretty funny in itself.
 
More I think about this streaming more I think it would be near ideal thing to try to solve with machine learning. Someone is going to make bunch of money by creating machine learnt streaming/caching algorithm that will get sold to some big game company. dnn is much better than human at this type of task when considering the scale at what the game has to work. DNN will tirelessly figure out the near optimal strategy for keeping right things in cache at the right time.

Not an easy thing to solve but I think it's very doable.
 
Who's to say developers couldn't have control over how HBCC behaves, or what kind of enhancements AMD might have made since first introducing it?
From what I remember HBCC could either work automatically or be developer controlled.
 
Status
Not open for further replies.
Back
Top