NVIDIA Maxwell Speculation Thread

Rys · Mar 18, 2015

Jawed said:
So TechReport is using the new Beyond3D Test Suite. Is there a front page where an article explaining this wonderful technology gets linked?

I like the black/random fillrate test. Very nice. And as I've long suspected (from before Maxwell) NVidia has been doing something to make fill more efficient.

No article because it's not really for anyone but TR at this point due to being rough around the edges, but when it's finished I'll write it up.

3dcgi · Mar 18, 2015

Rys. Does the polygon throughput test draw any triangles or are they all culled? I suspect they're all front/back face culled or outside the view frustum.

Brodda Thep · Mar 18, 2015

Jawed said:
Surely anyone who's serious about deep NN would be using FPGAs instead of GPUs. You're talking 2-3 years worth of performance advantage and access to arbitrary amounts of memory.

https://gigaom.com/2015/02/23/microsoft-is-building-fast-low-power-neural-networks-with-fpgas/

FPGAs have more of a role on running already trained neural nets. It would be ungodly expensive and time consuming to use FPGAs for training. However, if you want extensive neural nets to run on cell phones, that may be the way to go.

However, neural net techniques are just changing too quickly. You need the resources of the community to keep up. This means CUDA. There are three main libraries, Torch, Caffe, and Theano. Then there are tons of tertiary libraries. However, they all use CUDA. nVidia is investing in this technology. AMD is not.

Neural nets are the wild west of computer science. Anyone with a shoestring budget can make large contributions and even try them in brand new areas. 12 GB gives me a ton of flexibility. I am constantly running up against memory limitations when I am training a spatially sparse network. With 12 GB this card will be worth while for years. With 6 GB this card would no longer worthwhile after the first 980Ti is released.

I am very happy that nVidia is supporting the deep learning community.

3dilettante · Mar 18, 2015

DavidGraham said:
NVIDIA is being dumber and dumber again, conserving clocks to achieve ridiculous power targets. The least they could have done is clock the card equally to GTX 980!

Anandtech mentioned there is another 8-pin connector.
Nvidia may be holding back custom variants to see what AMD drops.
In return, AMD may be holding back its water-cooled variant to see what Nvidia drops.

silent_guy said:
I think you'd lose that bet, but I'm more annoyed by the fact that this 'laser trimming' lingo is still in use.

When they're done with the itty-bitty lightsabers, I'm sure we'll be the first to know.
whoaoamm whoosh bzzzt

Kaotik said:
Also, since when has GTX 980 become mid-range card?

I suppose all is relative in the hype Olympics. There could be a series of performance tiers coming up, possibly more tiers than cards to fill them, but we may be in for a bit of a silly season.

Grall · Mar 18, 2015

I missed the Titan unveil stream yesterday, is there an offline version floating around on the webs somewhere?

...In case there isn't, Nvidia, why not? You don't want people to get hyped about your fking thousand dollar graphics card?

fellix · Mar 18, 2015

http://www.ustream.tv/recorded/60025825

Rys · Mar 18, 2015

3dcgi said:
Rys. Does the polygon throughput test draw any triangles or are they all culled? I suspect they're all front/back face culled or outside the view frustum.

Yep, 100% culled. There are other variants of that test that use different cull ratios and different vertex and index sizes that help explore throughput a bit more, which I'll ship in the next update.

CarstenS · Mar 18, 2015

3dilettante said:
Anandtech mentioned there is another 8-pin connector.
Nvidia may be holding back custom variants to see what AMD drops.

That's been on quite a lot of cards and re-used for Quadro- and Tesla-Boards which have their (sole) a-pin connector out of the back of the card. IIRC Fermi was the first to use such a L-shaped power-grid.

Putas · Mar 18, 2015

I can see why GM200 is close to 8 non-rasterized polygons per clock, but GM204? How could that be?

3dcgi · Mar 18, 2015

If each SM's throughput is half a vertex/prim per clock as has been said previously then 8 for a 16 SM GM204 makes sense. It's the GM200 results that surprised me. They're not better than GM204.

McHuj · Mar 18, 2015

With Pascal not coming until 2016 (probably late at that), I wonder if will see a 16FF(+) shrink of Maxwell for the end of the year.

Putas · Mar 18, 2015

3dcgi said:
If each SM's throughput is half a vertex/prim per clock as has been said previously then 8 for a 16 SM GM204 makes sense. It's the GM200 results that surprised me. They're not better than GM204.

Oh, I missed the increase to half rate. And now even GK110 looks more like that. So yes, GM200 is underperforming, maybe GigaThread Engine was not scaled up?

Dave Baumann · Mar 18, 2015

3dcgi said:
If each SM's throughput is half a vertex/prim per clock as has been said previously then 8 for a 16 SM GM204 makes sense. It's the GM200 results that surprised me. They're not better than GM204.

According to Hardware.fr's GM204 article its 1/3 rate per SMM, reduced from 1/2 rate per SMX; the clusters are slimmer and scaled up in number relative to Kepler so I assume they wanted to reduce front-end area per SM.

Dave Baumann · Mar 18, 2015

I just look at those results on the TR review.

The hardware.fr article also has similar results. In this and their previous article GM107 is consistent with 1/3 rate while GM204 is somewhat overperforming a 1/3 rate on culled tri's. It is, however, closer 1/3 rate than 1/2 rate and I can only attribute the performance to GM204 hitting some rather large boost clock on what it possibly a relatively low power scenario; GM200 probably isn't hitting similar boost levels hence looks like it underperforming relative to it number of SMM's in comparison to GM204.

mczak · Mar 18, 2015

Dave Baumann said:
I just look at those results on the TR review.

The hardware.fr article also has similar results. In this and their previous article GM107 is consistent with 1/3 rate while GM204 is somewhat overperforming a 1/3 rate on culled tri's. It is, however, closer 1/3 rate than 1/2 rate and I can only attribute the performance to GM204 hitting some rather large boost clock on what it possibly a relatively low power scenario; GM200 probably isn't hitting similar boost levels hence looks like it underperforming relative to it number of SMM's in comparison to GM204.

If the meausrement is exact, it would correspond to a 1.5Ghz boost clock though for the GTX 980. Seems a bit on the high side, though I'm not sure on the maximum possible boost. For the Titan X however the results would correspond to only 1 Ghz - even assuming it can't reach that high of a boost that would only be roughly base clock. Maybe there's some front end limitations somewhere?

FWIW I'm also surprised how terrible the r290x does with strips...

edit: actually these results have to be Vertices/s not Tris/s right?

3dcgi · Mar 18, 2015

I think Nvidia has parts with 1/3 and 1/2 prim rate per SM. GM200 is probably back to 1/3, but I think GM204 is 1/2. Nvidia probably saw this as a place they could cut area without hurting game performance.

xDxD · Mar 18, 2015

AnarchX said:
With Maxwell NV get rid of the superscalar structure of ALUs. 1:4 or even 1:8 should be very cheap, at least you did not make some mistakes in design. 1:32 is just ridiculous.
Cut

Ok, thank you for your opinion

Jawed · Mar 18, 2015

So, it seems to me AMD is on a warning: performance per watt doesn't crumble when Titan X is overclocked, which prolly indicates there's another 20% performance lying in wait for 390X, e.g. overclocked AIB special versions of 980Ti running at 1400MHz with 6GB of memory.

LordEC911 · Mar 18, 2015

McHuj said:
With Pascal not coming until 2016 (probably late at that), I wonder if will see a 16FF(+) shrink of Maxwell for the end of the year.

GM204 seems like a prime candidate.
28nm 400mm2 256bit ~180w -> 16FinFet ~220mm2 256bit ~100w

Bulk GM206 up to 10 or 12SMM and 16FinFet would put it at ~120mm2 128bit ~60w.

homerdog · Mar 18, 2015

Jawed said:
So, it seems to me AMD is on a warning: performance per watt doesn't crumble when Titan X is overclocked, which prolly indicates there's another 20% performance lying in wait for 390X, e.g. overclocked AIB special versions of 980Ti running at 1400MHz with 6GB of memory.

Titan X Black Edition?

NVIDIA is certainly holding back thus far with Maxwell. They could have easily released the Maxwell parts with 15-20% higher clocks, but I think it made the best business sense to let AMD remain competitive (selling their chips nearly at cost). Sort of like how Intel does with CPUs.

Bottom line is, NVIDIA would never put itself at a performance disadvantage in the name of power efficiency. They simply no longer have to make 300W cards to beat AMD.

NVIDIA Maxwell Speculation Thread

Rys

Graphics @ AMD

3dcgi

Brodda Thep

3dilettante

Grall

Invisible Member

fellix

Rys

Graphics @ AMD

CarstenS

Moderator

Putas

3dcgi

McHuj

Putas

Dave Baumann

Gamerscore Wh...

Dave Baumann

Gamerscore Wh...

mczak

3dcgi

xDxD

Jawed

LordEC911

homerdog

donator of the year

Similar threads