NVIDIA G92 : Pre-review bits and pieces

Rys

Graphics @ AMD
Moderator
Veteran
Supporter
Just to let folks know that B3D won't have anything substantial up for a few days. NVIDIA scheduling for this product being what it is means I've had less than half a day to analyse it so far, so there's not much I can do for today.

So we'll flesh things out soon after I've spent more time with it. Not knocking the guys that got boards nearly a week ago before flying out to editor's day, but I could have done with the same if anyone from NVIDIA is listening. Heck only knows what the policy was there. Heck only knows (and maybe Heck will let you all know at some point, I'll ask him what he wants to say) why the launch was pulled in for today, too.

Arun and Tim know some extras which they'll talk about in here/the rumours thread now, and we have colleagues doing games/extra testing that we'll poke you at in a news post later.

Architecture and GPU Musings

* Very similar instruction issue rates compared to G8x products, depending on what you're measuring and comparing to. So no major cluster or SFU changes compared to older chips that I can tell as yet. Not finished there though.
* No double precision in hardware it seems. Ergo, no DP in 2007 for NVIDIA, one assumes.
* Minor scheduler/load balancing changes to favour certain executing conditions.
* 8 TA and 8 TF per cluster, with some odd results as you've already been pushing around in this thread.
* C/Z compression improvements at certain settings. Explains some of the perf versus GTS you might have seen.
* It's an 8 cluster GPU or I'll eat your hat, my hat, and the hat's hat. 8800 GT SKU is 7C though, obviously.
* 256-bit, 16 ROPs, same Z-only rate as G8x.
* 600/1500/900 is pretty much very conservative in all areas.
* 17.5x18mm @ 734Mt @ 65nm.
* Maybe L1 is bigger. Maybe.
* PCIe Gen2. But maybe not the full spec, still figuring that one out.

8800 GT Board Thoughts

* NVIDIA thermal solution engineers need to take off the earmuffs and nuke the wax buildup. I hate saying that.
* Fast with games at 1080p and IQ upper pills popped, despite the below.

Random Extras

* 169.01 mostly sucks.
* Which is kind of funny because it has 69 in the number. I laughed anyway.

Sorry I don't have much more for today, just haven't had the time spent with it for a full analysis. Only got back from the event yesterday morning.
 
Extra tidbits:

* NVIDIA confirmed to us that the chip supports GDDR4, but they obviously wouldn't comment on unannounced products.
* Blending is still half-rate, unlike R6xx which is full-rate (more important now given the number of ROPs). While you'd be bandwidth limited anyway for FP16 blending, it is clear that you can be ROP-limited for INT8/FP10 blending; not a major bottleneck, but still noteworthy.
* Triangle setup is also completely unmodified, with 0.5 tri/clk setup, 1 tri/clk culling and only 1 vertex/clk output to the post-T&L FIFO (without attributes, even in Z-only passes). Every single one of those metrics is at least 50% lower than R600's, and this is most likely a very real bottleneck in certain cases.
 
I've been running the board since Saturday evening. The cooler is a mixed bag, but definitely more positive than negative. I swapped out a GTS 640, and at full load, this is significantly louder than my previous card. However, at idle, it is basically silent, and I've never heard it spin up to full in the course of normal operation (even during Crysis). I have to wonder how much overclocking headroom the thing has...

Qualitatively, it seems slightly faster than the GTS 640.

New TRMS mode is nice, but that's available on everything as far back as G70.

I have no idea what I can and can't say about it, but CUDA 1.1 is a lot more than your typical point release upgrade. G92 is a significant part of that. Remember how atomics were limited to Compute 1.1 devices (G84/6)? Yeah, G92 is Compute 1.1, and now it's not just atomics...
 
You know y'all make it sound more like a piece of laboratory equipment than a piece of gaming kit. :???:

Which is fine by me. :) I like the whole dry dissection approach without a whole lot of emotion to cloud the issue. We'll get lots of that at other sites.

Sounds like the cooling solution/sound will go in ATI's favor this round. Even if they end up slower.

Regards,
SB
 
Sounds like the cooling solution/sound will go in ATI's favor this round. Even if they end up slower.
Well, I'm not sure. The only time I've heard it spin to max speed is on bootup, and then it is really loud and whiny. But I haven't ever gotten it to spin up noticeably during gaming, so that's a definite plus.
 
Well, I'm not sure. The only time I've heard it spin to max speed is on bootup, and then it is really loud and whiny. But I haven't ever gotten it to spin up noticeably during gaming, so that's a definite plus.

Yep, I would heartily second that. I can't say I've noticed any difference between the GT and GTS in every-day usage/gaming.
 
Yep, I would heartily second that. I can't say I've noticed any difference between the GT and GTS in every-day usage/gaming.
A couple times I got my GTS to spin up (high ambient temps, long-term use). Maybe that will be the case with the GT as well, but yeah, thus far I haven't noticed any meaningful difference.
 
Is there some technical reason IHV's have the fans spin to max during the boot cycle? I'm guessing this is something controlled by the bios and/or drivers, but it seems obvious to me that it's not responding to heat at that point. . . so what's the rationale? Why not spin it to 50% and then let heat be the determiner?
 
Go on, overclock it. I think the average is about 700+ from what I have seen from reviews up to now for the core clock, and the shader overclock should give a good boost too.
 
Is there some technical reason IHV's have the fans spin to max during the boot cycle? I'm guessing this is something controlled by the bios and/or drivers, but it seems obvious to me that it's not responding to heat at that point. . . so what's the rationale? Why not spin it to 50% and then let heat be the determiner?

Fan speed control is voltage-regulated. At boot (pre-POST) everything runs @ full voltage.
 
Fan speed control is voltage-regulated. At boot (pre-POST) everything runs @ full voltage.

And the definition of "full voltage" would be controlled in the BIOS of the card, wouldn't it? So I'd think you could still do some sleight of hand to get where I'm talking about. 50% of "true" full voltage in the BIOS entry for the fan, then let the drivers know that "true" max voltage is a 2x multiplier.

Let's just say I have faith in technology's ability to solve this problem. :LOL:
 
Any information about new TR-AA-mode?
NV claimed it was better than EATM. It's certainly better than the old TRMS, but I haven't seen EATM to be able to really compare. If somebody wants a HL2 Ep1 saved game I used to make screenshots of the new TRMS mode to compare against EATM, I'd be happy to give it to you.

ps: as I mentioned above, you get the new TRMS mode on everything back to G70 once you install 169.01. Also, NV has said "don't use TRMS in Orange Box because it's already doing some kind of transparency antialiasing." No idea what that's about, so if somebody has a better knowledge of Orange Box than I and would like to chime in with what it's actually doing, I wouldn't complain.
 
Remember how atomics were limited to Compute 1.1 devices (G84/6)? Yeah, G92 is Compute 1.1, and now it's not just atomics...

Any suggested reference material for background on this stuff so that I could do a bit more than gawk incomprehensibly at that sentence? :oops:
 
And the definition of "full voltage" would be controlled in the BIOS of the card, wouldn't it? So I'd think you could still do some sleight of hand to get where I'm talking about. 50% of "true" full voltage in the BIOS entry for the fan, then let the drivers know that "true" max voltage is a 2x multiplier.

Let's just say I have faith in technology's ability to solve this problem. :LOL:

Perhaps it's more of a diagnostic issue then of confirming operability under full load?
 
Any suggested reference material for background on this stuff so that I could do a bit more than gawk incomprehensibly at that sentence? :oops:
Well, there's the CUDA programming guide (PDF). There's also the UIUC CUDA class. Rys and I both went through all of the information posted online from that class, and that's how we both got a decent handle on how CUDA works.

Short version is that each G8x/G9x corresponds to some Compute spec, which is just some level of CUDA functionality. In CUDA 1.0, the only difference between Compute 1.0 and 1.1 was atomic operations (e.g., add without having to worry about race conditions from other threads trying to add at the same time). For 1.1, that's not the only thing that Compute 1.1 will get you.
 
Thanks Tim, it's a little clearer now. One day when I'm in Jawed mode I'll take a stab at the documentation.
 
Perhaps it's more of a diagnostic issue then of confirming operability under full load?

Possibly, but then I'd want to talk about why it's any better for the thing to fail then than to fail later? In fact, it may never fail later if it takes a 100% load to make it fail and real usage never puts it under a 100% load.

I suspect you're right, btw, as to that reasoning behind it (diagnostic), I just think it's insufficient.
 
Is there some technical reason IHV's have the fans spin to max during the boot cycle? I'm guessing this is something controlled by the bios and/or drivers, but it seems obvious to me that it's not responding to heat at that point. . . so what's the rationale? Why not spin it to 50% and then let heat be the determiner?

It's not controlled at all during bootup and runs at full (full meaning anywhere between 80 and 95% usually) for safety reasons until it gets a "handshake". Done that way in every hot piece of electronics anywhere. Diagnostics happen either in the 0-5% area or up there 90-100%, between that it's in "normal" working mode.

The reason is, even those few seconds without the cooler running could sometimes be enough to fry the chip.
 
Last edited by a moderator:
Back
Top