NVIDIA Maxwell Speculation Thread

3dilettante · Sep 8, 2014

Grall said:
Mayhaps. You don't need to buy a ludicrously expensive xeon to get virtualization though.

The thing about virtualization is that it's implemented in hardware that is non-negotiable as far as x86 CPU cores go and it's not a significant increase in hardware cost. It can be fused on or off irrespective of elements like the die, IO, and other physical parameters.

There are use cases for low-end virtualization that some buyers will pay money for, and it's also a prerequisite for many workloads for which buyers will pay massive amounts of money for.
As such, there's some income on a higher-volume segment and income from very lucrative ones as well.

For a DP-capable GPU, there isn't a good dividing line. A high-throughput DP device will need high bandwidth, but so does a high-performance gaming GPU.
Their die sizes are going to be large no matter what, a GPU can hit its TDP with SP and DP, and various extras don't significantly change that there are going to be two large dies with the engineering and manufacturing costs that goes into each distinct ASIC.

A more economically established high-end niche might change that, as Nvidia's high-end Tesla chips seem to indicate, but AMD's not holding the high ground there.
Even then, the compute market is very focused on cost/performance, which is something more measurable than cost/virtualization (this tends to be more binary). On top of that, GPU compute is itself undercut by CPU products that frequently get better utilization in various workloads and which still have a vastly superior software situation.

Even when CPUs lose, they can fall back to very lucrative markets where they still win.
A DP-specific GPU that loses can go nowhere if the hardware isn't also similarly at the top of its class in SP. However, if that's the case you can't charge more for the DP hardware if it's always enabled, negating having it at all unless you jack up the prices on SP hardware.

3dcgi · Sep 8, 2014

Grall said:
Eh, I should pay $3000+ for a firepro card why the fuck for exactly? It's the same god damn silicon as in the regular radeons. Same with nvidia's overpriced "higher end" junk, by the way.

No GPU would have high DP rates at this point if there wasn't a market to pay for it. 1/4 or 1/2 rate DP adds significant cost to a GPU.

rpg.314 · Sep 9, 2014

3dcgi said:
It will probably happen after tessellation, but it's possible to perform some HSR prior to tessellation. If there's displacement mapping you probably need hints from software though. Or just let software do this level of HSR for you.

IMG doesn't do HSR before tessellation.

kalelovil · Sep 9, 2014

Grall said:
Yes, I know. However, say, virtualization in a CPU for example is not comparable to a GPU's DP performance. But yeah, deliberately gimping hardware just to charge more for the ungimped version is crap, no matter who's doing it (and intel has even experimented with paid CPU "ungimp DLC", making them the absolute worst of the worst by quite a degree really.)

As a side-note, I know of one (modern) instance where CPU floating point throughput was 'gimped' to provide market stratification for a single die.
AMD's 'Caspian' mobile CPU.
http://techreport.com/news/17567/amd-intros-new-notebook-platform-with-45nm-cpus
It can and has been done.

3dcgi · Sep 9, 2014

rpg.314 said:
IMG doesn't do HSR before tessellation.

I never said they do though I'm not sure how you would know considering IMG hasn't discussed their tessellation implementation. I was speaking of a hypothetical tiling architecture and what's possible. This has gotten off topic though so we should leave this tangent.

sebbbi · Sep 9, 2014

3dcgi said:
I never said they do though I'm not sure how you would know considering IMG hasn't discussed their tessellation implementation. I was speaking of a hypothetical tiling architecture and what's possible. This has gotten off topic though so we should leave this tangent.

Just wanted to say that HSR before tessellation is very much doable (and an already used technique in some rendering engines). However it would be pretty hard for the GPU, since it needs to know the maximum distance the vertices can move. To allow GPUs to do this, a similar feature than dx11 "conservative depth output" could be introduced to give the GPU the guarantees it needs (to use hi-z / tiling buffer to cull patches).

Ailuros · Sep 9, 2014

rpg.314 said:
IMG doesn't do HSR before tessellation.

http://worldwide.espacenet.com/publ...T=D&ND=3&date=20140626&DB=EPODOC&locale=en_EP

I'm aweful in understanding patents, but from my rather poor understanding it doesn't sound like its definite even in that patent.

A1xLLcqAgt0qc2RyMz0y · Sep 9, 2014

Ailuros said:
http://worldwide.espacenet.com/publ...T=D&ND=3&date=20140626&DB=EPODOC&locale=en_EP

A more readable link: http://www.google.com/patents/US20140176544

Xmas · Sep 9, 2014

sebbbi said:
Just wanted to say that HSR before tessellation is very much doable (and an already used technique in some rendering engines). However it would be pretty hard for the GPU, since it needs to know the maximum distance the vertices can move. To allow GPUs to do this, a similar feature than dx11 "conservative depth output" could be introduced to give the GPU the guarantees it needs (to use hi-z / tiling buffer to cull patches).

http://www.khronos.org/registry/gles/extensions/EXT/EXT_primitive_bounding_box.txt

rpg.314 · Sep 10, 2014

3dcgi said:
I never said they do though I'm not sure how you would know considering IMG hasn't discussed their tessellation implementation. I was speaking of a hypothetical tiling architecture and what's possible. This has gotten off topic though so we should leave this tangent.

They have filed patents for it though.

xDxD · Sep 10, 2014

tieba.baidu.com/p/3287030270

fellix · Sep 10, 2014

That's more likely, but the TMU count is still erroneous.

xDxD · Sep 10, 2014

fellix said:
That's more likely, but the TMU count is still erroneous.

Perhaps It's gpu z wrong?

AnarchX · Sep 10, 2014

Old GPU-Z 0.7.7, it calculate this GPU as Kepler 1xx - 138 = 1664 / 192 * 16.
He should used 0.7.9 with GM204 support.

trinibwoy · Sep 10, 2014

So 16 CUs total with 3 disabled on the 970? Any guesses on die size. I figure <= 350mm^2.

xDxD · Sep 10, 2014

trinibwoy said:
So 16 CUs total with 3 disabled on the 970? Any guesses on die size. I figure <= 350mm^2.

it isn't a too big difference between 970 and 980? What if that vga is 960 (or 980 has 15 CUs)?

AnarchX · Sep 10, 2014

trinibwoy said:
So 16 CUs total with 3 disabled on the 970? Any guesses on die size. I figure <= 350mm^2.

You missed the GM204 PCB leak? http://www.techpowerup.com/202714/is-this-the-first-picture-of-geforce-gtx-880.html
This is more ~400mm².

xDxD said:
is that a too big difference between 970 and 980? What if that vga is 960 (or 980 has 15 CUs)?

The driver they run these cards says "GTX 970" and there are also shop listings of 970 and 980.
GM206 will be probably end of 2014 / early 2015 - 35x35mm package chips at Zauba shipped first in August, so probably 3 months to go.

Here is a N16E-GT (GTX 970M?) which uses only 10 CU /SMM: http://compubench.com/device.jsp?benchmark=compu20&os=Windows&api=cl&D=NVIDIA+N16E-GT&testgroup=info
Maybe today there are some other yield strategies, see Tonga at R9 285...

trinibwoy · Sep 10, 2014

xDxD said:
it isn't a too big difference between 970 and 980? What if that vga is 960 (or 980 has 15 CUs)?

Depends on the clocks I guess. At similar clocks it's about a 20% deficit. The 7950 at launch was ~25% behind the 7970. 670 was ~20% behind the 680. So even at slightly lower clocks a 13 CU 970 falls in that range.

xDxD · Sep 10, 2014

AnarchX said:
You missed the GM204 PCB leak? http://www.techpowerup.com/202714/is-this-the-first-picture-of-geforce-gtx-880.html
This is more ~400mm².

The driver they run these cards says "GTX 970" and there are also shop listings of 970 and 980.
GM206 will be probably end of 2014 / early 2015 - 35x35mm package chips at Zauba shipped first in August, so probably 3 months to go.

Here is a N16E-GT (GTX 970M?) which uses only 10 CU /SMM: http://compubench.com/device.jsp?benchmark=compu20&os=Windows&api=cl&D=NVIDIA+N16E-GT&testgroup=info
Maybe today there are some other yield strategies, see Tonga at R9 285...

trinibwoy said:
Depends on the clocks I guess. At similar clocks it's about a 20% deficit. The 7950 at launch was ~25% behind the 7970. 670 was ~20% behind the 680. So even at slightly lower clocks a 13 CU 970 falls in that range.

Interesting, thank you

tviceman · Sep 10, 2014

Wow only 15 SMM's? I thought for sure it'd have 20.

NVIDIA Maxwell Speculation Thread

3dilettante

3dcgi

rpg.314

kalelovil

3dcgi

sebbbi

Ailuros

Epsilon plus three

A1xLLcqAgt0qc2RyMz0y

Xmas

Porous

rpg.314

xDxD

fellix

xDxD

AnarchX

trinibwoy

Meh

xDxD

AnarchX

trinibwoy

Meh

xDxD

tviceman

Similar threads