NVIDIA Maxwell Speculation Thread

So TechReport is using the new Beyond3D Test Suite. Is there a front page where an article explaining this wonderful technology gets linked?

I like the black/random fillrate test. Very nice. And as I've long suspected (from before Maxwell) NVidia has been doing something to make fill more efficient.
No article because it's not really for anyone but TR at this point due to being rough around the edges, but when it's finished I'll write it up.
 
Rys. Does the polygon throughput test draw any triangles or are they all culled? I suspect they're all front/back face culled or outside the view frustum.
 
Surely anyone who's serious about deep NN would be using FPGAs instead of GPUs. You're talking 2-3 years worth of performance advantage and access to arbitrary amounts of memory.

https://gigaom.com/2015/02/23/microsoft-is-building-fast-low-power-neural-networks-with-fpgas/
FPGAs have more of a role on running already trained neural nets. It would be ungodly expensive and time consuming to use FPGAs for training. However, if you want extensive neural nets to run on cell phones, that may be the way to go.

However, neural net techniques are just changing too quickly. You need the resources of the community to keep up. This means CUDA. There are three main libraries, Torch, Caffe, and Theano. Then there are tons of tertiary libraries. However, they all use CUDA. nVidia is investing in this technology. AMD is not.

Neural nets are the wild west of computer science. Anyone with a shoestring budget can make large contributions and even try them in brand new areas. 12 GB gives me a ton of flexibility. I am constantly running up against memory limitations when I am training a spatially sparse network. With 12 GB this card will be worth while for years. With 6 GB this card would no longer worthwhile after the first 980Ti is released.

I am very happy that nVidia is supporting the deep learning community.
 
NVIDIA is being dumber and dumber again, conserving clocks to achieve ridiculous power targets. The least they could have done is clock the card equally to GTX 980!
Anandtech mentioned there is another 8-pin connector.
Nvidia may be holding back custom variants to see what AMD drops.
In return, AMD may be holding back its water-cooled variant to see what Nvidia drops.

I think you'd lose that bet, but I'm more annoyed by the fact that this 'laser trimming' lingo is still in use.
When they're done with the itty-bitty lightsabers, I'm sure we'll be the first to know.
whoaoamm whoosh bzzzt

Also, since when has GTX 980 become mid-range card?
I suppose all is relative in the hype Olympics. There could be a series of performance tiers coming up, possibly more tiers than cards to fill them, but we may be in for a bit of a silly season.
 
I missed the Titan unveil stream yesterday, is there an offline version floating around on the webs somewhere?

...In case there isn't, Nvidia, why not? You don't want people to get hyped about your fking thousand dollar graphics card? :rolleyes:
 
Rys. Does the polygon throughput test draw any triangles or are they all culled? I suspect they're all front/back face culled or outside the view frustum.
Yep, 100% culled. There are other variants of that test that use different cull ratios and different vertex and index sizes that help explore throughput a bit more, which I'll ship in the next update.
 
Anandtech mentioned there is another 8-pin connector.
Nvidia may be holding back custom variants to see what AMD drops.

That's been on quite a lot of cards and re-used for Quadro- and Tesla-Boards which have their (sole) a-pin connector out of the back of the card. IIRC Fermi was the first to use such a L-shaped power-grid.
 
If each SM's throughput is half a vertex/prim per clock as has been said previously then 8 for a 16 SM GM204 makes sense. It's the GM200 results that surprised me. They're not better than GM204.
 
With Pascal not coming until 2016 (probably late at that), I wonder if will see a 16FF(+) shrink of Maxwell for the end of the year.
 
If each SM's throughput is half a vertex/prim per clock as has been said previously then 8 for a 16 SM GM204 makes sense. It's the GM200 results that surprised me. They're not better than GM204.
Oh, I missed the increase to half rate. And now even GK110 looks more like that. So yes, GM200 is underperforming, maybe GigaThread Engine was not scaled up?
 
If each SM's throughput is half a vertex/prim per clock as has been said previously then 8 for a 16 SM GM204 makes sense. It's the GM200 results that surprised me. They're not better than GM204.
According to Hardware.fr's GM204 article its 1/3 rate per SMM, reduced from 1/2 rate per SMX; the clusters are slimmer and scaled up in number relative to Kepler so I assume they wanted to reduce front-end area per SM.
 
I just look at those results on the TR review.

The hardware.fr article also has similar results. In this and their previous article GM107 is consistent with 1/3 rate while GM204 is somewhat overperforming a 1/3 rate on culled tri's. It is, however, closer 1/3 rate than 1/2 rate and I can only attribute the performance to GM204 hitting some rather large boost clock on what it possibly a relatively low power scenario; GM200 probably isn't hitting similar boost levels hence looks like it underperforming relative to it number of SMM's in comparison to GM204.
 
I just look at those results on the TR review.

The hardware.fr article also has similar results. In this and their previous article GM107 is consistent with 1/3 rate while GM204 is somewhat overperforming a 1/3 rate on culled tri's. It is, however, closer 1/3 rate than 1/2 rate and I can only attribute the performance to GM204 hitting some rather large boost clock on what it possibly a relatively low power scenario; GM200 probably isn't hitting similar boost levels hence looks like it underperforming relative to it number of SMM's in comparison to GM204.
If the meausrement is exact, it would correspond to a 1.5Ghz boost clock though for the GTX 980. Seems a bit on the high side, though I'm not sure on the maximum possible boost. For the Titan X however the results would correspond to only 1 Ghz - even assuming it can't reach that high of a boost that would only be roughly base clock. Maybe there's some front end limitations somewhere?

FWIW I'm also surprised how terrible the r290x does with strips...

edit: actually these results have to be Vertices/s not Tris/s right?
 
Last edited:
I think Nvidia has parts with 1/3 and 1/2 prim rate per SM. GM200 is probably back to 1/3, but I think GM204 is 1/2. Nvidia probably saw this as a place they could cut area without hurting game performance.
 
So, it seems to me AMD is on a warning: performance per watt doesn't crumble when Titan X is overclocked, which prolly indicates there's another 20% performance lying in wait for 390X, e.g. overclocked AIB special versions of 980Ti running at 1400MHz with 6GB of memory.
 
With Pascal not coming until 2016 (probably late at that), I wonder if will see a 16FF(+) shrink of Maxwell for the end of the year.
GM204 seems like a prime candidate.
28nm 400mm2 256bit ~180w -> 16FinFet ~220mm2 256bit ~100w

Bulk GM206 up to 10 or 12SMM and 16FinFet would put it at ~120mm2 128bit ~60w.
 
So, it seems to me AMD is on a warning: performance per watt doesn't crumble when Titan X is overclocked, which prolly indicates there's another 20% performance lying in wait for 390X, e.g. overclocked AIB special versions of 980Ti running at 1400MHz with 6GB of memory.
Titan X Black Edition?

NVIDIA is certainly holding back thus far with Maxwell. They could have easily released the Maxwell parts with 15-20% higher clocks, but I think it made the best business sense to let AMD remain competitive (selling their chips nearly at cost). Sort of like how Intel does with CPUs.

Bottom line is, NVIDIA would never put itself at a performance disadvantage in the name of power efficiency. They simply no longer have to make 300W cards to beat AMD.
 
Back
Top