NVIDIA Maxwell Speculation Thread

So they shot down the guy who offered to help... I hope he keeps his job (or get a better offering).
He spoke out of line but I hope he keeps his job, he obviously likes nvidia and probably works hard for them. I do hope he learns to control himself because his post really didn't do nvidia any good.
 
He spoke out of line but I hope he keeps his job, he obviously likes nvidia and probably works hard for them. I do hope he learns to control himself because his post really didn't do nvidia any good.
"He spoke out of line?" Well that is one way of putting it.
The other way of putting it, is that he spoke honestly and without bias but got beat down by marketing and higher-ups.

Your last part summarizes it nicely...
I do hope he learns to control himself because his post really didn't do nvidia any good.
We can't speak poorly about our dear leader and his vision/products. It doesn't help anyone other than those consumers who purchase our products based on our lies. /sarcasm
 
Last edited:
"He spoke out of line?" Well that is one way of putting it.
The other way of putting it, is that he spoke honestly and without bias but got beat down by marketing and higher-ups.

You last part summarizes it nicely...

We can't speak poorly about our dear leader and his vision/products. It doesn't help anyone other than those consumers who purchase our products based on our lies. /sarcasm
He said there would be a driver patch, which there wasn't. Consequences end up being that he lied too. Since he is part of nvidia, it just makes nvidia look even worse for not patching. What do you expect from there?
 
He said there would be a driver patch, which there wasn't. Consequences end up being that he lied too. Since he is part of nvidia, it just makes nvidia look even worse for not patching. What do you expect from there?
Technically, if they were originally telling the truth that they didn't know about the memory allocation design, then a driver "fix" (loose definition) would be doable.
So, one or the other, not both.
 
Granted I haven't personally tested 980M, but we have already double-confirmed with NVIDIA that 980M has all 64 ROPs enabled, and the memory is arranged in a single segment. GTX 970 is the only GM204 SKU to use the reduced ROP configuration.

If I had to guess, they're using Nai's tool on a system with the 980M active, meaning some of the memory is already in use.

Sorry Ryan, but I hardly doubt some memory occupance can descrease the bw that much and just at the very end in that way

Correct me if I am wrong
 
In the main 970 ram thread on the Nvidia forum I found this

https://forums.geforce.com/default/...tx-970-3-5gb-vram-issue/post/4444629/#4444629

ENBSeries (Boris) already posted this information in this thread several pages ago. He asked if there was an Nvidia engineer here to explain his findings. No one answered.

In his tests, the driver was 100% effective in avoiding using the slow 0.5GB, preferring to send the data to system RAM.

dblmstr1 said:I was reading on some testing results on pcper http://www.pcper.com/reviews/Graphi...ory-Performance/COD-Advanced-Warfare-and-Clos and in the comment section, a person with the name ENBSeries who is identified as boris claimed that Nvidia is telling us half-truths regarding the 970 having 4GB of vram. Here are his words

"I repeat again, videocard works like any other, but have 3.5 gb only available and last 0.5 gb is actually non local video memory (which is system ram) and there is no slow video memory like NVidia said. It's lie and it can be easy prooved (i prooved by writing own tests). Just allocate blocks in vram and dump ram, search in dump the code of those "vram" blocks and you will see that last 0.5 gb is stored in ram. Is that so hard? I feel myself genious seeing noone notice obvious things."

On his facebook page https://www.facebook.com/enbfx, he continues: "Wrote another test to check how that slow 0.5 gb memory works and again it's the same thing which driver do for a long time, that memory is stored in RAM instead of VRAM, that's why it slow. Basically, this is standart behavior for the most videocards on the market (vram is physical vram + a bit ram). What it means on practice compared to another videocards? GTX 970 have 3.5 Gb of VRAM. What i see in articles with explanation from NVidia is half-lie... I don't think it's something horrible to loose 0.5 gb, but it's bad that NVidia hide such information (my own videocard with 2 gb or vram have access to 2.5 gb and nobody annonced it as 2.0 fast and 0.5 slow)..."
 
Sorry Ryan, but I hardly doubt some memory occupance can descrease the bw that much and just at the very end in that way

Correct me if I am wrong
16GB/sec is almost exactly what PCIe 3 x16 offers. And this is exactly how it happens; CUDA keeps allocating VRAM until it runs out, then it goes to system RAM.

For what it's worth, I've already tested this on a GTX 970 that's set to PCIe 1 x16. 23GB/sec to memory across the affected area, which is many times the 4GB/sec such a connection offers. This value does not change when switching to PCIe 3.

Furthermore if I run it in a non-headless system (to force 3 tiers), there's a clear drop off at the end from 23GB/sec to ~3.6GB/sec with PCIe 1, and 23GB/sec to ~14GB/sec with PCIe 3.

Varying memory speeds isn't very effective due to various caches, and because even 800MHz of DDR3 is 13GB/sec in dual channel mode (128-bit). It's almost impossible not to saturate the PCIe bus.
 
Last edited:
Sorry Ryan, but I hardly doubt some memory occupance can descrease the bw that much and just at the very end in that way

Correct me if I am wrong
You're wrong. You can show this exact same thing on GTX 680 or GTX 980 as well. Depending on the amount of stuff already in GPU memory at some point cudaMalloc will start returning pinned memory (system memory). On 2GB card that benchmark will allocate 1920MB (128MB less then total) and on 4GB it will allocate 3840MB (256MB less then total). This means that once buffers start to overflow to system memory you'll start seeing PCI-Ex bandwidth. And since it's not necessary for whole buffer to overflow you can also get some combination of the two bandwidths.
That's basically what this Boris guy "discovered" and has been pointed out time and again by many different people that this CUDA benchmark should be run in "headless mode" (without display attached).
 
16GB/sec is almost exactly what PCIe 3 x16 offers. And this is exactly how it happens; CUDA keeps allocating VRAM until it runs out, then it goes to system RAM.

For what it's worth, I've already tested this on a GTX 970 that's set to PCIe 1 x16. 23GB/sec to memory across the affected area, which is many times the 4GB/sec such a connection offers. This value does not change when switching to PCIe 3.

Furthermore if I run it in a non-headless system (to force 3 tiers), there's a clear drop off at the end from 23GB/sec to ~3.6GB/sec with PCIe 1, and 23GB/sec to ~14GB/sec with PCIe 3.

Varying memory speeds isn't very effective due to various caches, and because even 800MHz of DDR3 is 13GB/sec in dual channel mode (128-bit). It's almost impossible not to saturate the PCIe bus.

correct me if I'm wrong, but you are using that famous CUDA tool that everyone is, while the ENB guy claims to be using something else...

what he is seeing is that in uses closer to real usage the 512MB are being ignored even when the memory (3.5GB 224bits) is full and system ram is being used, which would mean the 512MB are completely useless and with a bigger penalty than using system ram for some reason, while it's not the case with that other tool, or it's simply not working properly


like on the previous post, I think

edit: it looks like I didn't check the reddit post before posting, but in any case I think testing real world usage could be more helpful like the cases being posted on the nvidia forums
 
Last edited:
This begs the question - is there something wrong with TSMC's 16nm FinFET process? According to rumors Qualcomm, Apple and now NVIDIA have jumped the ship. Same goes for AMD, though they're going for GloFo, but still to same Samsung developed 14nm FinFET process
 
This begs the question - is there something wrong with TSMC's 16nm FinFET process? According to rumors Qualcomm, Apple and now NVIDIA have jumped the ship. Same goes for AMD, though they're going for GloFo, but still to same Samsung developed 14nm FinFET process

Yes, with four big to huge customers, this is starting to look like a real trend. Actually, I wonder whether AMD will use Samsung or stick to GloFo only, because the latter is supposedly a little behind. The process is the same but you'd expect some latency between any development on Samsung's side and implementation on GloFo's.

Of course, if neither AMD nor NVIDIA is interested in using 14nm as soon as technically feasible, this might not matter.
 
Nvidia Newest Samsung Foundry Customer For 14nm Process

http://www.tomshardware.com/news/nvidia-samsung-14nm-apple-qualcomm,28493.html

If this is true than Nvidia will more than likely use it for the next generation of graphic GPUs and not mobile (except for a possible Denver SOC) as mobile is on a yearly release and next release is Jan 2016.

Nvidia's next gen GPU's were always planned to be on FINFET...same with the Parker SoC. But this is the first I'm hearing that they will move to Samsung..last I heard they were on TSMC. Will try to dig deeper.
This begs the question - is there something wrong with TSMC's 16nm FinFET process? According to rumors Qualcomm, Apple and now NVIDIA have jumped the ship. Same goes for AMD, though they're going for GloFo, but still to same Samsung developed 14nm FinFET process

Apparently..Samsung is a bit ahead of TSMC on FINFET (they entered mass production in December) ..and has also achieved lower feature size and lower power. These are all rumours though so take them with a healthy grain of salt.
Yes, with four big to huge customers, this is starting to look like a real trend. Actually, I wonder whether AMD will use Samsung or stick to GloFo only, because the latter is supposedly a little behind. The process is the same but you'd expect some latency between any development on Samsung's side and implementation on GloFo's.

Of course, if neither AMD nor NVIDIA is interested in using 14nm as soon as technically feasible, this might not matter.

I'm sure both AMD and NVIDIA are keen to use 14nm as soon as technically feasible..they've basically skipped 20nm and have gone straight to FINFET.
 
A move from 16FF TSMC to 14nm Samsung is roughly 6 months to change libraries. I need to get convinced first that NVIDIA can afford to delay Pascal and Parker at least by 6 months as it doesn't sound in their best interest right now.

On a sidenote if they'd want to ship Parker before this year runs out shouldn't it have taped out yet anyway?
 
A move from 16FF TSMC to 14nm Samsung is roughly 6 months to change libraries.

They can always throw more manpower at it. Just ask Apple..apparently they're doing A9 and A9X on Samsung 14FF and TSMC 16FF respectively. But if NV is in fact making a move..I suspect it was decided a long time back.
I need to get convinced first that NVIDIA can afford to delay Pascal and Parker at least by 6 months as it doesn't sound in their best interest right now.

Given the dominance of Maxwell..a delay to Pascal probably wouldn't hurt them all that much. But if they want to keep their mobile ambitions alive..they'd better not delay Parker.
On a sidenote if they'd want to ship Parker before this year runs out shouldn't it have taped out yet anyway?

I'd heard the earliest they could possibly ship was very late 2015..and even that was a maybe..so I don't expect it this year.
 
Back
Top