Merging pixel and vertex shaders would be more of a hardware-side efficiency improvement.GraphixViolence said:Merging the pixel and vertex shaders or increasing programmability doesn't seem like it will provide any significant benefit for at least another few years.
I really don't think that's an accurate analysis. As far as functional units are concerned, the R300 certainly doesn't have twice as many FP24 units as the NV35+ has FP32 units. The NV3x also has deeper pipelines, and there may be extra transistors involved in supporting the integer formats (which I feel should be nixed...though I think FP16 isn't such a bad thing until we move to higher-precision DACs). In other words, there are other things that make the NV3x more transistor-hungry than the R3xx such that one cannot draw a direct comparison between the choice to use FP32 and FP24. There are too many other differences between the chips to single out that one as the cause.GraphixViolence said:It could be argued that supporting formats with less than FP32 precision is just as much of a hardware efficiency improvement as merging pixel & vertex shaders. Every additional bit of precision you have to support takes up silicon that could instead be used for increasing performance. As the NV30 vs. R300 match-up showed, an 8-pipe 24-bit architecture can be a lot more attractive than a 4-pipe 32-bit architecture.
I really don't think so. If programmability progress stalls, it will be only make it harder to get it started up again.As for increasing programmability, I wasn't implying that progress should be stopped in this area. I just think the next generation of graphics hardware would provide a good opportunity to let developers catch up to the current level and get more out of it, and the best way to do that would be to concentrate on adding more performance instead of more features that won't get used in the near term.
GV did say NV30, which did appear to only have four FP units.Chalnoth said:I really don't think that's an accurate analysis. As far as functional units are concerned, the R300 certainly doesn't have twice as many FP24 units as the NV35+ has FP32 units.GraphixViolence said:It could be argued that supporting formats with less than FP32 precision is just as much of a hardware efficiency improvement as merging pixel & vertex shaders. Every additional bit of precision you have to support takes up silicon that could instead be used for increasing performance. As the NV30 vs. R300 match-up showed, an 8-pipe 24-bit architecture can be a lot more attractive than a 4-pipe 32-bit architecture.
How did you come to that conclusion?The NV3x also has deeper pipelines,
Some support: http://www.beyond3d.com/forum/viewtopic.php?t=8005OpenGL guy said:GV did say NV30, which did appear to only have four FP units.Chalnoth said:I really don't think that's an accurate analysis. As far as functional units are concerned, the R300 certainly doesn't have twice as many FP24 units as the NV35+ has FP32 units.
But the NV35 doesn't have many more transistors, and apparently has a similar number of functional FP32 units as the R300 has FP24 units. That essentially means that any argument about FP24 vs. FP32 that depends upon looking at the NV30 vs. R300 is meaningless because of the existence of the NV35.OpenGL guy said:GV did say NV30, which did appear to only have four FP units.
From this interview:How did you come to that conclusion?The NV3x also has deeper pipelines,
The typical way to improve performance with lots of dependent texture reads it to have a deeper pipeline.Another example is if you’re doing dependant texture reads where you use the result of one texture lookup to lookup another one. There’s a much longer title time on the pipeline than there is in ours.
My goodness - Dr. Kirk said it - it must be true.Chalnoth said:From this interview:
The typical way to improve performance with lots of dependent texture reads it to have a deeper pipeline.Another example is if you’re doing dependant texture reads where you use the result of one texture lookup to lookup another one. There’s a much longer title time on the pipeline than there is in ours.
GraphixViolence said:I voted for more pipelines. This is provides exactly the same benefits as higher core clock frequency.
Considering he's the Chief Scientist at nVidia, I would tend to think he has a rather authoritative position on the inner workings of the NV3x architecture.andypski said:My goodness - Dr. Kirk said it - it must be true.
The fact that he's the Chief Scientist at nvidia should imply that you should take his comments on other IHVs' architectures with just a little bit of skepticism. Did he provide performance numbers to back up his claim? Did he provide examples of how the NV3x is better at dependent reads? Didn't think so.Chalnoth said:Considering he's the Chief Scientist at nVidia, I would tend to think he has a rather authoritative position on the inner workings of the NV3x architecture.andypski said:My goodness - Dr. Kirk said it - it must be true.
Well, that's really a question of which is more cost-effective: a larger, slower core, or a smaller, faster one? Both can realize the same performance, but each may not be realized at the same cost.GraphixViolence said:I voted for more pipelines. This is provides exactly the same benefits as higher core clock frequency.
He's also a public spokesman for nVidia, which should put you on alert as to his desire to tell the truth vs. sell his company. I'm mainly thinking of his (IIRC) interviews that initially misled most sites to proclaim the 5800 as eight-pipeline. I also remember his latest interview with FS where he said ATi couldn't claim to know anything about nV's pipeline because they didn't create it, then turned around and detailed the exact number of cycles it takes for ATi to do certain ops. I fail to see how one couldn't come to a decent approximation of a pipeline by examining cycle times for certain operations, much like nV obviously did for ATi's hardware.Chalnoth said:Considering he's the Chief Scientist at nVidia, I would tend to think he has a rather authoritative position on the inner workings of the NV3x architecture.andypski said:My goodness - Dr. Kirk said it - it must be true.
True. I guess I based this statement on the assumption that increasing the number of pipes would be more cost-effective than increasing the clock speed by an equivalent amount. Over the past few years, transistor counts have been roughly doubling each year, while clock speeds haven't quite been keeping pace (doubling around every 1.5-2 years). However, new process technologies don't seem to be rolling out as fast and furious as they once were, so it may not be possible for this trend to continue.Pete said:Well, that's really a question of which is more cost-effective: a larger, slower core, or a smaller, faster one? Both can realize the same performance, but each may not be realized at the same cost.