NVIDIA Kepler speculation thread

whitetiger · Mar 17, 2012

silent_guy said:
Well, somebody once claimed that it is fundamentally impossible to have 4x distributed geometry processing without having massive latency and gargantuan power consumption. That same someone inexplicably continued to prove this by pointing at, wait for it: a cell characterization problem that was fixed with a metal patch. (WTF?) Then he murmured something about crossbars too, which is funny, because I didn't really expect a crossbar that serves memory to have much to do with distributing geometry at the front-end. Right? It made him even speculate that GK104 would only have 2x geometry, because that's what sensible people do.

It all makes me wonder: what if Nvidia had used a 2x configuration instead of 4x, how much lower do you think the power of GK104 could have been?

What a missed opportunity...

Edit: I forgot the best one: the distributed geometry architecture is responsible for increase power consumption during compute. You can't make this up...

My favorite was going from a statement that Fermi would be terrible at Tesselation, to claiming Fermi had *too much* tesselation, when it turned out that Fermi was actually quite good at Tesselation..

Man from Atlantis · Mar 17, 2012

ECH said:
After reading the release notes it was released not long ago. I think the previous version of that benchmark program is 2.5. I would like to see 3.0 vs 2.5 just to see if performance has changed or not using both cards.

it certainly boosts somewhat but not much on my systems, +6%

SimBy · Mar 17, 2012

I think Charlie was the one who started the nonsense about Fermis tessellation abilities.

Unfortunately most of that tessellation power is used to tessellate square Jersey barriers in Crysis 2.

Anyways supposedly they buffed it up in Kepler by a factor of two. So pulling ahead in Heaven is not too surprising.

silent_guy · Mar 17, 2012

fellix said:
Latency? Cross-bar interconnects are the first choice to be used in cases of low-latency communication between moderately large number of clients.

Yup.

The problem with complex cross-bar interconnects is the accumulation of hotspots due to signal crossings.

Yup, that too. But I'm used to crossbars that are WAY bigger than whatever they use in GPUs. Look at it this way: in a GPU, how much area do you think just the crossbar with just 4 units (because that's what we're talking about in the case) takes compared to the rest of the area?

In GF100 the distributed nature of both geometry processing (16x) and primitive setup (4x) asked for very dense wiring mesh. JHH said that this aspect of the architecture was the main reason the product delays and metal re-spins.

All he said there was that the first spin came back dead because of an issue with the process parameter of TSMC not matching real life. For a standard cell based design, that's saying: a characterization issue. It does not point to a fundamental architecture problem.

The other obstacle was the large transistor leakage variance.

Yes. But that has nothing to do with fundamental architecture issues: a fundamental architecture is not something you can fix with even a limited base spin, IMO.

Kaotik · Mar 17, 2012

CarstenS said:
Or you might simply ask Techpowerup's w1zzard if GPU-z 0.5.9 fully supports Kepler yet. [Hint: It doesn't, as you have noted above wrt to shaderclocks].

Detecting basic gpu/mem frequencies should be quite trivial even if the particular chip isn't supported, no?
I mean, it does detect memory frequency right anyway?

UniversalTruth · Mar 17, 2012

Kaotik said:
Detecting basic gpu/mem frequencies should be quite trivial even if the particular chip isn't supported, no?
I mean, it does detect memory frequency right anyway?

Maybe it's just me but you don't see what the Sensors tab shows. :???:

AnarchX · Mar 17, 2012

Even pre-GeForce 300.xx driver control panel could not detect Kepler clocks proper.

Kaotik · Mar 17, 2012

UniversalTruth said:
Maybe it's just me but you don't see what the Sensors tab shows.

No, just what the first tab shows as "default" & "current" clocks (with obviously mistakenly hotclock too)

AnarchX said:
Even pre-GeForce 300.xx driver control panel could not detect Kepler clocks proper.

This is on 300.xx drivers, and memory frequency is at least detected correctly, though?

whitetiger · Mar 17, 2012

silent_guy said:
Yup, that too. But I'm used to crossbars that are WAY bigger than whatever they use in GPUs. Look at it this way: in a GPU, how much area do you think just the crossbar with just 4 units (because that's what we're talking about in the case) takes compared to the rest of the area?

So how wide is the offending crossbar?
- 4 In, 4 Out
- how wide is each bus? 32-bits? 128-bits?

It's probably more of a track routing issue than a number-of-transistors issue....

CarstenS · Mar 17, 2012

Kaotik said:
Detecting basic gpu/mem frequencies should be quite trivial even if the particular chip isn't supported, no?
I mean, it does detect memory frequency right anyway?

There is always the possibility that a new generation introduces new power states or swizzles the old ones, so I would not take anything for granted that is displayed by a utility not supporting a particular architecture.

trinibwoy · Mar 17, 2012

Given the theory that nVidia originally intended GK104 for a lower market segment can we expect unrestricted geometry throughput from its 4 GPCs? GF114 ran full speed.

fellix · Mar 17, 2012

trinibwoy said:
Given the theory that nVidia originally intended GK104 for a lower market segment can we expect unrestricted geometry throughput from its 4 GPCs? GF114 ran full speed.

I don't expect Kepler to be any different than Fermi in consumer vs. professional segmentation -- half-rate setup w/o tessellation and full-rate with tessellation enabled.

trinibwoy · Mar 17, 2012

GF114 wasn't half rate was it?

fellix · Mar 17, 2012

All the synthetic testing shows it's in line with the bigger fermis.

AnarchX · Mar 17, 2012

Geforce cards were throttled in low culling tessellation and no-tessellation situations: http://www.behardware.com/articles/800-7/roundup-10-workstation-graphics-cards.html

Except for GF106 GTS 450: [http://www.behardware.com/articles/801-5/report-sparkle-geforce-gts-450s.html ... and maybe GF108 and GF119.

GpuMonkey · Mar 17, 2012

whitetiger said:
So how wide is the offending crossbar?
- 4 In, 4 Out
- how wide is each bus? 32-bits? 128-bits?

It's probably more of a track routing issue than a number-of-transistors issue....

It's not going to change the opinion of the gullible, but 3 years after the fact, it's probably time to settle this once and for all: the issue in GF100-A01 was in a back-end bus that fed the memory controllers. It was not even in the general xbar that interconnects the usual agents. There was a custom designed cell that with a timing violation that was not picked up during characterization.

The net result was a broken MC system (no transactions to external memory at all), but not a bricked chip: major parts could be verified by rendering to PC memory over PCIe. A02 fixed all known bugs, but not those that were hiding behind MC specific paths, so A03 was needed.

GF100-A01 had no issues at all with distributing geometry across GPCs. Distributed geometry never comes up in discussions about power. I don't think it should surprise anyone with a bit of a brain that SMs+TEX are where the power is.

Also: don't fret so much about crossbars in general. It's under control.

(Crawling back into my bear cave...)

trinibwoy · Mar 17, 2012

My Heaven comparison, 15% improvement.

AnarchX said:
Geforce cards were throttled in low culling tessellation and no-tessellation situations: http://www.behardware.com/articles/800-7/roundup-10-workstation-graphics-cards.html

Except for GF106 GTS 450: [http://www.behardware.com/articles/801-5/report-sparkle-geforce-gts-450s.html ... and maybe GF108 and GF119.

Thanks.

GpuMonkey said:
(Crawling back into my bear cave...)

Zing!

fellix · Mar 17, 2012

GpuMonkey said:
It's not going to change the opinion of the gullible, but 3 years after the fact, it's probably time to settle this once and for all: the issue in GF100-A01 was in a back-end bus that fed the memory controllers. It was not even in the general xbar that interconnects the usual agents. There was a custom designed cell that with a timing violation that was not picked up during characterization.

Is that the main reason for the low memory clocks in Fermi?

pjbliverpool · Mar 17, 2012

Gipsel said:
The slide with the "adaptive VSync" appears a bit strange. What they describe is basically vsync'd triple buffering. Otherwise tearing would appear also below the refresh rate.

From what I understand that's exactly how it works. Above 60fps you get vsync and below you get tearing, but much smoother transitions. This is how many console games work and it's long overdue IMO. Would be nice if the limit could be set manually to 30fps as well though. Although given the choice between the two, especially on a card of this power I'd take 60fps any day.

I've gotta say I'm pretty damn excited about Kepler after the recent leaks. Outperforming the 7970 with all the benefits of adaptive vsycn TXAA and PhysX support. And all that at lower power draw/heat and presumably lower noise. I won't be going for the top end (rip off) edition but a 670Ti should be a massive upgrade over my 4890.

Blazkowicz · Mar 17, 2012

vsync off with a 120Hz display is nice too.
still waiting for a 120Hz IPS panel.

NVIDIA Kepler speculation thread

whitetiger

Man from Atlantis

SimBy

silent_guy

Kaotik

Drunk Member

UniversalTruth

AnarchX

Kaotik

Drunk Member

whitetiger

CarstenS

Moderator

trinibwoy

Meh

fellix

trinibwoy

Meh

fellix

AnarchX

GpuMonkey

trinibwoy

Meh

fellix

pjbliverpool

B3D Scallywag

Blazkowicz

Similar threads