NVIDIA's Goldman Sachs Webcast, Summary/Transcript (info G71, G80, RSX, etc.)

As a sidenote IMO any IHV can screw up a design whether units are unified or not and no I don't mean anything particular by that, other that USC doesn't guarantee success or absolute efficiency by se either.

Instead of NVIDIA releasing a USC "that sucks", it's times better for them I guess to take a safer path that minimizes risks and no that doesn't imply anything in what ATI is being working on. In theory and in my layman's mind a USC currently seems to weigh over in benefits, but until I can see and compare one with a non-unified PC GPU, it all stays in theory.

Overall I expect for the first D3D10 GPUs to finally see "SM3.0 done right" across the board (whereby I still have a huge question mark when it comes to NV and vertex/geometry texturing latency), with the additional D3D10 requirements to be mostly decorative, under the usual consensus "it's there for devs, let's care about performance later...".

For real advanced HOS and programmable tesselation good morning "D3D11" or whatever they'll call it.
 
_xxx_ said:
That's because very few people buy these, so they don't have them in stock imediately as opposed to cheaper models I guess.
I think Dell use some kind of an upgradable module for the video on the high-end laptop, thus I don't really think this kind of module is a specially build by order! If yes, Dell would state on the website about it for delay timing and the buyer would react as put-off a bit if they expect to buy-now but recieve part a month later. The delay can affect buying decision in anyway since the buyer could go for alternative model, brand, etc. As usual, even parts are hardly ordered... it must be spare stock keeping as current parts, and I really see less point on that delay. Just my believe.
 
satein said:
I think Dell use some kind of an upgradable module for the video on the high-end laptop, thus I don't really think this kind of module is a specially build by order!


Not the module, but the laptop itself with that module gets assembled specially on order and this must happen on some production line to satisfy the process flow since that's a must for proper handling of warranty issues etc. We are for instance also not allowed to swap any parts of the car on special orders, but it must be assembled on line in the respectable factory.

I don't know for sure if they handle it that way, but it's very probable.
 
_xxx_ said:
Not the module, but the laptop itself with that module gets assembled specially on order and this must happen on some production line to satisfy the process flow since that's a must for proper handling of warranty issues etc. We are for instance also not allowed to swap any parts of the car on special orders, but it must be assembled on line in the respectable factory.

I don't know for sure if they handle it that way, but it's very probable.
For the car assembly, yes, I see you explanation as I always talked to my friend who used to work as a Logistic assistance manager on Benz assembly line (we always discussed issues on his line and seeing how it work out and I believe this Benz assembly line giving you headache enough). But I believe this is not for the laptop parts assembly since Dell allows customers to change parts to be ordered as need (on the website and it will always based on basic setting). Thus this mean there would be some components on standard part built and some upgradable (changable) parts to be installed after that... probably fit on a subsequence line. For the 7800GTX on Dell, it is a module AFAIK and is always on a DTR like laptop as Dell XPS, Latitude D8xx or some big Pricision series.
 
Last edited by a moderator:
Xmas said:
I don't see a reason to do that in a unified IMR. When there's no reason one type of shader would stall/starve the other, why would you decouple them?
Rather than just assuming that you meant this solely for a non-unified architecture, I thought I'd ask first.

Still, it creates a bifurcation for developers: if a non-unified D3D10 GPU's performance means that a work-around has to be coded to "explicitly pipeline" rendering, then that's hardly a step forwards, is it?

Anyway, as I said earlier, with Xenos carrying the banner for unified hardware a non-unified GPU is always going to look like the red-headed step child.

Performance/Watt is an interesting one since it could be more about how much unnecessary work you save instead of whether all units are running at peak load.
Yeah, I agree - this is very much about making new kinds of algorithm possible on the GPU. Maximal utilisation is just the widget that gets you closer to those algorithms.

Jawed
 
satein said:
and I believe this Benz assembly line giving you headache enough

That you can bet!!! :LOL:

Don't even get me started there, I'd be bitching a few pages...

EDIT: car manufacturers also let customers choose extra options etc., that's no different. But keep in mind that the assembly line is probably in Malaysia or so and every machine needs to be fully documented on the line etc. A friend of mine bought a Dell laptop just a few weeks ago and also had to wait a couple of weeks to get it shipped from Czech republic to Germany, since the assembly line is over there (cheap labour etc.). But I certainly don't know how Dell handles this.

Now back on topic:
Maximal utilisation is just the widget that gets you closer to those algorithms

Not necessarely IMHO. If that means more heat/mm², for instance. But that surely depends on the implementation.
 
Last edited by a moderator:
Jawed said:
Still, it creates a bifurcation for developers: if a non-unified D3D10 GPU's performance
Assuming RSX and R500 would stand at a similar transistor count if they both had the same featureset and both had their ROPs in their primary chip, which seems like a given to me, the RSX has a higher theorical programmable FLOPS rating in the PS than Xenos for PS+VS. Well, except that Xenos is 4+1 and RSX is 2+2/3+1 but for pixel shading application, 4+1 isn't quite that optimal, it's more of a compromise between VS and PS ideals.

means that a work-around has to be coded to "explicitly pipeline" rendering, then that's hardly a step forwards, is it?
I'd love to know what "explicitly pipelining" rendering means. If what you mean is that the developer would automatically try to balance VS and PS, that's what LOD algorithms are for. The only problem that can arise is for multipass rendering, including shadow rendering with Z-Passes. The limitation there tends to be another one completely though, anyway.

Anyway, as I said earlier, with Xenos carrying the banner for unified hardware a non-unified GPU is always going to look like the red-headed step child.
TBH, with Xenos carrying the banner for unification, you'd be pretty dumb to bother unifying anything, because it really isn't that amazing. Is it a cool idea? Yeah. Is it a proper implementation? Yeah. Is it the most transistor-efficient design ever? Hmm... no.

Yeah, I agree - this is very much about making new kinds of algorithm possible on the GPU. Maximal utilisation is just the widget that gets you closer to those algorithms.
Those algorithms tend not to require the VS=>Triangle=>PS paradigm much. As such, what counts in the end is the total Pixel Shading performance, unless you want to get hacky and use the VS too, which will just give a headache. Is Xenos nice in those cases? Yeah sure. Is it much better than, say, RSX? Hmm... no.


Uttar
 
Well, this was my point above about the relationship of G70 to GF6. I don't think of that as a new generation --more "aggressive refresh" of GF6. That would seem to be consistent with how NV sees it. So, at the moment at least, I'd have to say that Joe has the advantage on NV naming in pointing at G100 as the likely part for late '08.

Having said that, they've obviously shown a penchant for throwing curveballs in naming, so wtf really knows what the names will turn out to be.
 
Jawed said:
Still, it creates a bifurcation for developers: if a non-unified D3D10 GPU's performance means that a work-around has to be coded to "explicitly pipeline" rendering, then that's hardly a step forwards, is it?
That workaround would be done by the hardware and drivers, not by application developers.
 
so with the new architecture that Nvidia will be introducing later this year, that will be two complete generations beyond the NV3x ~ GeForce FX, and I guess it will be around through 2007 and 2008 pretty much like NV4x being around for a few years in the GF6/GF7 series, and RSX.

the way I see it: G80 = "NV50" and G90 = "NV55" and Nvidia might not have a full unified shader architecture until "NV60".
 
Last edited by a moderator:
Uttar said:
but for pixel shading application, 4+1 isn't quite that optimal, it's more of a compromise between VS and PS ideals.

Thats an interesting point and something I have wondered about for a while now. So effectively when we are quoting 240GFLOPs for Xenos, its not really fair to compare to the 211GFLOPs that the GTX512 has dedicated for pixel shading because some of those FLOPs in Xenos will be wasted on pixel shading?

And if so then I imagine this is one of the resons why nvidia are saying that a unified architecture isn't faster than a seperate one. As long as you have the balance right between vertex and pixel resources and as long as the developer sticks to that balance within the hardwares ability to manage any variance then a seperate shader architecture would actually be more powerful for a given number of FLOPs.
 
Back
Top