NVIDIA Maxwell Speculation Thread

Just curious, what games could you run on it which would actually benefit from 3 GB VRAM?


2GB is done for, there are already more than 10 games that exceed that amount.

Potential exceeders:
ArmA 3 (in huge maps with lots of objects)
Battlefield 4 (in long gaming sessions and with Mantle)
Thief (consumes a touch above 2GB without SSAA, more with it)
Dead Rising 3 (long game session)

Definite exceeders:
Call Of Duty Ghosts
TitanFall
DayLight
Wolfenstien The New Order
Watch_Dogs
Shadow Of Mordor
The Evil Within
Ryse
Call Of Duty Advanced Warfare

And maybe even more, so unfortunately 3GB became the lowest standard for 1080p, 4GB became the lowest for 1440p, so I am not really sure 4GB won't be exceeded by some ambitious title down the line (we already have Shadow Of Mordor to give us the needed glimpse).

http://forum.beyond3d.com/showpost.php?p=1879063&postcount=34
 
It has only 75% of the CUDA cores and 71% of the memory B/W of GTX 980 to begin with. And base clock is 1038 mhz vs 1126 mhz of the GTX 980. So we're looking at 75% X (1038/1126) = 69% of theoretical GTX980 performance, not considering the fact that it may not boost as much due to TDP and/or cooling constraints. Hence I see ~70% being a realistic number.
Oh you're right for some reason I thought it has 13 SMM but it's only 12 which brings down the percentage some more points. I agree then 70% sounds more realistic (which still isn't too shabby and should be in desktop gtx 770 ballpark). Though you'd think that some part with some more SMM but lower clock would be a bit more efficient (as the clock is still quite high for a mobile part).
Though I really wonder about the TDP - nvidia could make it configurable for OEMs (within limits) in which case performance relative to GTX 980 could go down (or even up) quite a bit (not saying this would be necessarily a good idea, but most mobile chips have quite soft specs already anyway).
 
Being a half-empty glass kinda guy, this to me says that they didn't push the 980 desktop performance far enough. Given the power consumption and heat dissipation headroom advantage that a desktop GPU has over a mobile GPU, the performance gap should always be big.


That's a fair statement but we don't really know how well Maxwell scales at higher TDP.
 
Being a half-empty glass kinda guy, this to me says that they didn't push the 980 desktop performance far enough. Given the power consumption and heat dissipation headroom advantage that a desktop GPU has over a mobile GPU, the performance gap should always be big.

They may be waiting on 16FF for quite some time (I saw some powerpoint roadmap with Pascal landing in 2016), and if AMD doesn't come back with something new fairly soon I'll be really surprised. My guess is that clock/power headroom was left on the table so as to be able to respond to pressure from the competition in 2015.
 
Just curious, what games could you run on it which would actually benefit from 3 GB VRAM?
Edit - what pharma said.

Also, GTX 860M (GM107 GDDR5) is pretty fast. It's about like a 6950/7850/R9 270. Bigger textures are certainly useful with it.
 
Edit - what pharma said.

Also, GTX 860M (GM107 GDDR5) is pretty fast. It's about like a 6970/7850/R9 270. Bigger textures are certainly useful with it.

The GTX 860M is quite far away from the R9 270 (40% or so), though not that far behind the others you mentioned. Still, I don't think any of them really need more than 2GB of memory. Should it not be enough for some future title, you can always turn down some detail level, though it seems unlikely this is going to be needed for the resolution you're going to play at with this chip. I'm not really a big proponent of these asymmetric memory configurations in any case (which only nvidia does). There were actually 1GB versions of the HD 7850 which still did ok in most (but definitely no longer all) tests.
 
I'm not really a big proponent of these asymmetric memory configurations in any case (which only nvidia does).

In other words they'd need to go to 4GB to keep the bus width of the RAM pool from getting mixed. Good point.
 
Resolution isn't the only consideration though. Mainstream GPUs are more than capable of working with very high resolution textures without a significant performance hit. GM107's large L2 makes it an even better fit.
 
Being a half-empty glass kinda guy, this to me says that they didn't push the 980 desktop performance far enough. Given the power consumption and heat dissipation headroom advantage that a desktop GPU has over a mobile GPU, the performance gap should always be big.

Good point and you could well be right. Though its also possible that since they're already at pretty high clocks, they couldn't push it much further, well not for a mainstream part at least.
I'm not sure that's true - performance as a function of TDP is fairly nonlinear.

I thought its pretty much the opposite? For GPU's from the same IHV and with the same architecture anyway. Look at GM107 vs GM204..perf/W is roughly the same.
Oh you're right for some reason I thought it has 13 SMM but it's only 12 which brings down the percentage some more points. I agree then 70% sounds more realistic (which still isn't too shabby and should be in desktop gtx 770 ballpark). Though you'd think that some part with some more SMM but lower clock would be a bit more efficient (as the clock is still quite high for a mobile part).

Yea its still not bad at all for a mobile part..but I sense they've left some performance on the table for a refresh (possibly with the Broadwell quad cores only scheduled for Q1 next year)

That's exactly what I said in my first post ;) - http://forum.beyond3d.com/showpost.php?p=1879000&postcount=2405
Though I really wonder about the TDP - nvidia could make it configurable for OEMs (within limits) in which case performance relative to GTX 980 could go down (or even up) quite a bit (not saying this would be necessarily a good idea, but most mobile chips have quite soft specs already anyway).
AFAIK it it somewhat configurable as the clocks for each mobile part are not fixed (neither is the memory speed, or even type in some cases). And a lot of it would come down to the sustained boost clocks possible with the cooling solution employed, and if there is any throttling at play, etc.
Edit - what pharma said.

Also, GTX 860M (GM107 GDDR5) is pretty fast. It's about like a 6950/7850/R9 270. Bigger textures are certainly useful with it.

Mczak pretty much covered what I wanted to say. And besides, most laptops with GM107 employ 768p displays, very few have 1080p (And games which actually use more than 2 GB of VRAM would probably not be playable at that res anyway)

Another point to note is that more RAM has a slight power penalty..so that is another consideration.
 
Clearly, NVIDIA is so incompetently lazy as to leave performance on the table for who knows what, to chase a ridiculous goal of low power consumption, GTX 980 can reach an insane 1500MHz core speed quite easily, people with custom boards can push for even more, upping the default frequencies would have allowed for a much greater performance while still retaining an excellent perf/w ratio, and playing their cards conservatively will do them no good.
 
Clearly, NVIDIA is so incompetently lazy as to leave performance on the table for who knows what, to chase a ridiculous goal of low power consumption, GTX 980 can reach an insane 1500MHz core speed quite easily, people with custom boards can push for even more, upping the default frequencies would have allowed for a much greater performance while still retaining an excellent perf/w ratio, and playing their cards conservatively will do them no good.

any entusiasth gamer loves to overclock his component, if nvidia push frequencies at 1450/1500mhz there is no headroom for oc
 
Mczak pretty much covered what I wanted to say. And besides, most laptops with GM107 employ 768p displays, very few have 1080p (And games which actually use more than 2 GB of VRAM would probably not be playable at that res anyway)

I have the 17.3" 1080p ASUS G750JM. ASUS has the 860M clocked beyond desktop 750 Ti specs (1230 MHz boost). It certainly can play the latest games at 1080p.
 
GM204's 32-wide SIMD

By the way I'd just like to interject and say that a 32-wide SIMD (matching the work item count of the hardware thread) is a beautiful thing :D

Anyone seen a discussion of the minimum number of hardware threads per SIMD to keep it fully occupied (excluding cases of memory latency, i.e. in pure register-bound code).

Can it issue successive instructions from the same hardware thread if the second instruction depends on the result of the first?
 
By the way I'd just like to interject and say that a 32-wide SIMD (matching the work item count of the hardware thread) is a beautiful thing :D

Anyone seen a discussion of the minimum number of hardware threads per SIMD to keep it fully occupied (excluding cases of memory latency, i.e. in pure register-bound code).

Can it issue successive instructions from the same hardware thread if the second instruction depends on the result of the first?

Maxwell is a rather interesting architecture in that it takes some of the concepts used in VLIW and translates them to a scalar architecture.

Basically, each instruction encodes a stall count, indicating how long the processor must wait before issuing the next instruction in order for correctness. Dual issue is also encoded here. If This is sort of like issuing NOP's except that it doesn't pollute the instruction cache and that there are special barriers for things like loads and stores, which have an unknown latency due to caching. One thing to keep in mind is that the processor can and will issue instructions from other threads to fill these gaps.

More on Maxwell here: https://code.google.com/p/maxas/w/list
 
Calling all GTX 970 / 980 owners - could you please take a BIOS dump from your card and send it to me via mail or throw a link to it via PM (also, please mention which card it belongs to, brand + model)

I'd take 'em from TPU but their database only has 4 different models' BIOS-files
 
@keldor314

Your post immediately makes think of Denver CPU having its own unconventional and advanced ways of scheduling things, though I don't know more than that general idea about it.
 
Last edited by a moderator:
Anyone seen a discussion of the minimum number of hardware threads per SIMD to keep it fully occupied (excluding cases of memory latency, i.e. in pure register-bound code).

Can it issue successive instructions from the same hardware thread if the second instruction depends on the result of the first?

The minimum thread count to keep a Maxwell SMM fully occupied is 128: one warp for each of the four schedulers. Those warps would need to run with at least 6 deep ILP (since the pipeline latency for an ALU op is usually 6 on Maxwell).

An instruction certainly can depend on the previous one.. but there's still the pipeline latency so if there were no extra warps to fill the scheduler gap you'll get a stall and lost throughput. The schedulers (probably!) don't execute out of order to try to fill the dependency stall from the same warp's future instruction stream. Obviously the compiler tries to interleave independent instructions to minimize the chance of stalls.
 
@keldor314

Your post immediately makes think of Denver CPU having its own unconventional and advanced ways of scheduling things, though I don't know more than that general idea about it.

It really does. I actually wouldn't be surprised if internally Denver used the same internal ISA as something like Maxwell, though with additional execution pipes.
 
Back
Top