The People Behind DirectX 10: Part 3 - NVIDIA's Tony Tamasi

Chalnoth said:
there is an inherent danger that one company's design decision will come out on top in a dramatic fashion..

You are the new Don King .. I was excited before about the upcoming fight but now my mouth is really watering. :)

I think you might be right.
 
_xxx_ said:
Following simple logic, nV had MUCH more time to develop, (re)design and test their new architecture. While ATI needed resources for the Xenos development and resolving the problems with R520, nV has had almost an extra year where they could invest most of their resources into theoretical development. That's why I think they'll have the overall significantly faster and cheaper to produce chip this time around, simply because they had more time for tweaking and optimizing the design.
This will be ATI's second unified GPU.

Never buy v1.0, eh?

Jawed
 
Tammuz said:
What is meant by "more consistent and specified behaviour"?
Texture filtering is a minefield currently. The visual quality varies depending on the GPU being used and each GPU has various levels of quality at different performance levels.

I suppose you could say the net result is that devs don't know what target they're aiming at. If they program texturing like so, will they get shimmer? etc.

Devs know the precision of the maths (i.e. how good should "FP32" be) but they don't know the precision of texture filtering maths.

When ATI introduced anisotropic filtering, it was angle-dependent. When NVidia had performance problems in Geforce FX, they cut the quality of filtering right back. etc.

What would be "generalized tesselation"?
Presumably allowing the dev to say "here's an equation or set of control points for a curve, produce X triangles to apply that curve to this object". Or maybe "vary the level of detail (triangle count) for this object depending on how close to the camera it is" (though that's maybe too much of a special case to be called "generalised tessellation").

Jawed
 
Jawed said:
This will be ATI's second unified GPU.

Never buy v1.0, eh?

Jawed

You could also say "this will be nV's eight GPU", dunno if that means anything.

Xenos as a theoretical 1:1 PC version would be in the ballpark of what, the R420 maybe? R600 needs to be much more powerful than Xenos.

It will be their first unified GPU for the PC though, without the EDRAM and with loads of other stuff (DX10-related tweaks, ROPs, Avivo part etc.). I think xenos tells us as much as R580 - not enough for serious speculations on speed :)

I could imagine Xenos on steroids + R580 mem controller on steroids in there, but how much steroids and what flavor?
 
_xxx_ said:
Xenos as a theoretical 1:1 PC version would be in the ballpark of what, the R420 maybe? R600 needs to be much more powerful than Xenos.
ALU FLOP performance is a fair amount higher on Xenos than R480/R520 so, clock for clock, its between R520 and R580. Xenos also has a higher overall texture sampling ability than any current ATI desktop part.
 
_xxx_ said:
Following simple logic, nV had MUCH more time to develop, (re)design and test their new architecture. While ATI needed resources for the Xenos development and resolving the problems with R520, nV has had almost an extra year where they could invest most of their resources into theoretical development. That's why I think they'll have the overall significantly faster and cheaper to produce chip this time around, simply because they had more time for tweaking and optimizing the design.
Right, so this is nVidia's advantage. ATI's is that their current products are closer to DX10 in functionality than nVidia's. So it stands to reason that nVidia has the advantage of being able to spend more time getting things working optimally, and ATI has the advantage of experience.
 
Dave Baumann said:
ALU FLOP performance is a fair amount higher on Xenos than R480/R520 so, clock for clock, its between R520 and R580. Xenos also has a higher overall texture sampling ability than any current ATI desktop part.

Well then, between R520 and R580 wouldn't cut it for the next gen either. I just wanted to imply that R600 must be at least 2x that (speed-wise) in order to compete.
 
_xxx_ said:
Well then, between R520 and R580 wouldn't cut it for the next gen either.
Errrm, I don't think there is a suggestion that R600 would have the same performance as Xenos, is there?
 
Dave Baumann said:
Errrm, I don't think there is a suggestion that R600 would have the same performance as Xenos, is there?

No, but there is the assumption here that the Xenos design can be easily souped-up and adopted to the PC space, which is just plain wrong. Making a PC chip based on Xenos is almost like a completely new development. it's not like they can just take 2x Xenos, R580's souped up memory contoller, ROPs, video part etc. and just put it in shaker and voilá - there you have your R600, if you get me. If it ever was that easy...
 
So if R600 is actually heavily based on Xenos in a number of ways, we can come back to your post and point and laugh? Nobody said they are just putting disparate blocks in and shaking it about, but it's not a stretch to see Xenos heavily influence a PC part in a number of ways. They've said publically that's the case, IIRC!
 
_xxx_ said:
No, but there is the assumption here that the Xenos design can be easily souped-up and adopted to the PC space, which is just plain wrong.
With a unified shader design that is organised in three independant groups of 16 shader arrays it seriously points to the notion of scalability.
 
Tammuz said:
Can you point me to links that stated or suggested :

1) the hw space cost; and
2) that this will never be in any future D3D10 versions?
Early D3D10 presentations (check out B3D's DX Next piece for example) made mention of a processing stage for high-order surfaces, seperate from the GS. That was to perform on-chip adaptive tesselation of B-splines, Catmull-Rom, bezier patches etc, but was cut from the spec, and from all planned versions of D3D10, as Ail says.

Somewhat common knowledge.
 
Rys said:
So if R600 is actually heavily based on Xenos in a number of ways, we can come back to your post and point and laugh? Nobody said they are just putting disparate blocks in and shaking it about, but it's not a stretch to see Xenos heavily influence a PC part in a number of ways. They've said publically that's the case, IIRC!

I know they said it, but why laughing? All I'm saying is that it requires lots of work to implement it in a PC design and could have some culprits we know nothing of yet. It's definitely not cut-and-paste. Of course they'll use Xenos as a foundation, but how much will they have to change? Does the added circuitry for the PC design require severe changes in the design? Etc.
 
_xxx_ said:
No, but there is the assumption here that the Xenos design can be easily souped-up and adopted to the PC space, which is just plain wrong. Making a PC chip based on Xenos is almost like a completely new development. it's not like they can just take 2x Xenos, R580's souped up memory contoller, ROPs, video part etc. and just put it in shaker and voilá - there you have your R600, if you get me. If it ever was that easy...

But you should not forget that Xenos was sort of based on a unified PC part that never saw the light of day...
 
Dave Baumann said:
ALU FLOP performance is a fair amount higher on Xenos than R480/R520 so, clock for clock, its between R520 and R580. Xenos also has a higher overall texture sampling ability than any current ATI desktop part.

Not to mention when VS is a bottleneck (which varies game to game, but indications from a slide going around show it to be ~10% of the time) there is better allocation of ALU resources. Adding GS into the mix further complicates the issue of resource utilization. Of course that may be irrelevant if some of the rumblings are true about a certain IHV dragging its feet as game design will ultimately be designed around the least common denominator. Best to just offer features as [slow] check boxes and focus on legacy. D3D10 is sounding more like D3D9d all the time. I will pull out my hair if we go back to the days of declaring GPU winners based on who can hit 200fps in Quake (!) Ok, pessimism aside, Xenos also has a number of nice featureset additions that can solve certain design issues (vertex texturing, point sampling, tesselation), and a number of which that also improve performance for existing techniques and bottlenecks (FP10, eDRAM for fillrate and framebuffers; the later not being very relevant for a PC GPU of course). Doing the same effect in half the time is pretty relevant to performance.

Anyhow, as Dave said, comparing Xenos to an R420 (24bit percision shaders, SM2.0) is pretty off. They are a number of degrees different in ALU flop performance and the featureset is decidely in Xenos' court.
 
Are we expecting GS to be a separate unit for either ATI or NV? I suspect that wouldn't make a lot of sense for ATI. For NV tho, the options would be a separate unit(s) or combining with the VS?

The reason I ask is that we are getting hints here and there that GS performance is expected to suck mightily first gen --an "ISV showcase/development tool" rather than the real deal. But why would that be so if there are, say, 64 of them for ATI, or 8 (10, 12, pick your number) for NV?
 
Wouldn't making each shader in a unified design GS capable be a fundamental design change? I always expected GS to be a separate unit in R600 as well.
 
Well, the messages seem to be a bit in tension with each other, hence figured I'd ask.
 
Back
Top