Tech-Report blasts GeForce FX

DemoCoder · Nov 24, 2002

DaveBaumann said:
However, something struck me the other day - NV30 is pretty much the same as Radeon 9500 PRO, just with nearly twice the clockspeed on both the core and memory. People should look for comparisons between the 9500 PRO and 9700 to see how a 128-bit bus will constrain an 8 pipe card.

I bet to differ. The NV30 is clocked at 500Mhz so that it can execute shaders faster, not so that it can achieve a 4gigapixel single texturing fillrate. The Pentium4 doesn't have enough system bandwidth to write out one longword per cycle either. If the NV30 were clocked at 250Mhz, it could write 1 pixel from 1 pipe every cycle.

It's not the 128-bit bus that that constrains the fillrate, that has nothing to do with it, it is the ratio between the core clock and the memory clock. If you just blindly divide the bus width by the pipeline width * # pipelines, you are doing the wrong calculation.

A R300 @ 600Mhz would also be constrained identically and not able to write out all 8 pipelines every clock. The width of the bus has nothing to do with it.

The question is, do you really want the ratio of core clock/mem clock to be adjusted so that there is never a bandwidth limit? Memory bandwidth hasn't scaled as fast as CPU clock. The cores are going to continue to outpace bandwidth. Should GPU makers deliberating underclock and execute shaders slower so that they can avoid being heckled for being "unbalanced". You might see a 1Ghz or 2Ghz GPU in the next few quarters, but you won't see a corresponding increase in bandwidth, the shader execution rate will have quadrupled. Is that so horrible? Memory bandwidth is expensive, so I maintain it is far better to be bandwidth limited since it is cheaper to increase shader op rate.

Laa-Yosh · Nov 24, 2002

Do not downplay the importance of bandwith in offline rendering. Huge textures and complex geometry need quite a lot of it, and games are going to be even larger because you cannot cheat that much due to interactivity. Yes, you will not have to write out more pixels than 1600*1200*60fps in the near future, but you will have to move several orders of magnitude more content. Doom3 also shows that texture compression can seriously downgrade image quality in complex shaders, so we may have to forget DXTC soon. Also, to reach CGI quality, we will need more 2D image processing, so scenes will be rendered to texture and post-processed in the pixel shader again and again. I don't think that we could say "there is enough bandwith now"...

Entropy · Nov 24, 2002

DemoCoder said:
However, If I am not using pixel shaders, then 1600x1200x32 4XFSAA 16xANISO @ 60fps is currently possible even with large depth complexities, so why should I want more raw pixel fillrate? Unless I am doing multipass, I don't want it.

There is no way 1600x1200x32 4XFSAA 16xANISO @ 60fps is sustainable on the NV30 or the R9700. Even averages are lower, (we are talking sub 40 fps averages on UT2003 Antalus flyby, which is much lighter than actual gameplay), even with the modest demands of todays games. Traditional fillrate is not a solved problem, particularly with the growth of LCD displays that you want to run at native resolutions, and which in the future might even support decent framerates.

We want high minimum framerates here, mind you, and at absolute numbers that ensure excellent responsiveness and precision at all times, not relating to whether a static word document would be a flickering hell. "Barely good enough as long as you don't really care" is not much of a goal to shoot for.

And, as you are well aware, there are a number of rendering techniques that might see wide use in the future if the fillrate is there to support it.

I bet to differ. The NV30 is clocked at 500Mhz so that it can execute shaders faster, not so that it can achieve a 4gigapixel single texturing fillrate. The Pentium4 doesn't have enough system bandwidth to write out one longword per cycle either. If the NV30 were clocked at 250Mhz, it could write 1 pixel from 1 pipe every cycle.

This is your main point, and it is valid. Fillrate alone is not the only measure of performance, and increasing the processing capabilities of the core increases performance for core resident tasks.
Whether such tasks will be what is limiting the overall performance of the NV30 on the programs it will run during its' lifetime is another question that needs to be considered.

It's not the 128-bit bus that that constrains the fillrate, that has nothing to do with it, it is the ratio between the core clock and the memory clock. If you just blindly divide the bus width by the pipeline width * # pipelines, you are doing the wrong calculation.

It is exactly the right calculation assuming the ratio of core clock and memory clock is constant (as it pretty much is in the R9700 vs NV30 case). Obviously width-per-pixel-pipe and clock ratios are independently variable. There are other issues as well for that matter, such as latencies, prefetching effectiveness, buffer sizes, cache hitrates, et cetera ad nauseum.
We need benchmark data.

You might see a 1Ghz or 2Ghz GPU in the next few quarters, but you won't see a corresponding increase in bandwidth, the shader execution rate will have quadrupled. Is that so horrible? Memory bandwidth is expensive, so I maintain it is far better to be bandwidth limited since it is cheaper to increase shader op rate.

Again, this is a solid point, but there isn't data around to support it. In the talks surrounding GDDR3 (Ack! Pfft!), datarates of 1.4Gbits/pin were mentioned. That's 2.25 times what the R9700 uses today if they retain a 256-bit bus, in nVidia goes 256-bit wide we are talking a total bandwidth factor of 2.8. And this is in less than a year for sure, which I doubt is the case for 2GHz GPUs.

Besides, GPUs are basically SIMD ASICs (with very deep pipelines). As soon as you step away from problems that fit into that paradigm, you will see performance plummet. There are a lot of reasons why CPUs today run at 2-3 GHz with half a MB of full speed cache and tons of logic devoted to scheduling instruction flow. People who think GPUs should take over the tasks of CPUs have some rude awakenings in store. Your architectural comparison is invalid. The two processor classes have different jobs to do.

Entropy

Humus · Nov 24, 2002

Entropy said:
DemoCoder said:

However, If I am not using pixel shaders, then 1600x1200x32 4XFSAA 16xANISO @ 60fps is currently possible even with large depth complexities, so why should I want more raw pixel fillrate? Unless I am doing multipass, I don't want it.

Click to expand...

There is no way 1600x1200x32 4XFSAA 16xANISO @ 60fps is sustainable on the NV30 or the R9700. Even averages are lower, (we are talking sub 40 fps averages on UT2003 Antalus flyby, which is much lighter than actual gameplay), even with the modest demands of todays games. Traditional fillrate is not a solved problem, particularly with the growth of LCD displays that you want to run at native resolutions, and which in the future might even support decent framerates.

There are several games the runs just fine at 1600x1200x32 6xFSAA 16xAniso, in fact most do, UT2003 does not however as you said, but that's one of the most demanding games out there. Games like UT/Unreal/Quake3 etc runs just fine at 100+ fps with those settings on the 9700.
While I can agree that fillrate isn't a solved problem, it's not the main problem for the future. Faster pixel shader execution is primarily what we need.

Luminescent · Nov 24, 2002

I think that has been the pattern witnessed previously, simd asics who choke in other processes. However, we are now witnessing gpu's that are VLIW, which issue not only simd ops but scalar, and now (NV30) contain separate support for integers. So, aside from the complex threading support, the scheduling, large caches, and high clockspeeds, gpu's or vpu's are becoming very general purposea and approaching the programmable complexity of cpu's. They are no longer purely clunky, fixed simd devices.

andypski · Nov 24, 2002

DemoCoder said:
andypski said:

#1 No - from the accumulated information R300 has nearly twice the bandwidth available per-pipe-per-clock when compared to NV30 - both have 8 pipes, with core and memory clocks closely matched, but R300's memory bus is twice as wide. Not a difficult calculation

Click to expand...

310Mhz memory on R300 PRO vs 500Mhz memory on NV30. 256-bit vs 128-bit. 19.8Gb/s vs 16Gb/s or only 23% more bandwidth. As I said, roughly comparable.

You're missing the point about pipeline balance completely.

The 16Gb/sec on the NV30 has to support a theoretical fill rate of (500MHzx8pipes) = 4 Gpixels/second.

vs.

R300 : 19.8 GB/sec to support a fill rate of (325Mx8pipes) = 2.6Gpixels /second

So, NV30 has 16/4 = 4 bytes per pixel. The overall bandwidth usage must not exceed this value if NV30 is to run at 8 pixels/clock.

R300 has 19.8/2.6 = 7.615 bytes per pixel.

...or nearly twice as much (1.9X as much to be more accurate). This really shouldn't be too hard to grasp - they're not roughly comparable at all.

On the other point about long shaders I think we are in agreement. But your 23% figure above is not the correct one to look at when investigating the balance of the design. The long shader argument is a different case - there are still times when you will want to be running close to your peak fill rate (eg. single textured alpha-blended explosions)

So, which appears to be better balanced - with 32 bit screen and Z buffers, do you think that the average bandwidth usage to fill an untextured pixel is closer to 32 bits or 64 bits?

(Hint - just writing to the screen is already taking 32 bits... ;-))

Yes, this is all before various compression techniques are taken into account, but we're dealing with the simple case here, not with hand-waving theoretical bandwidth figures.

- Andy.

Entropy · Nov 24, 2002

Humus said:
There are several games the runs just fine at 1600x1200x32 6xFSAA 16xAniso, in fact most do, UT2003 does not however as you said, but that's one of the most demanding games out there. Games like UT/Unreal/Quake3 etc runs just fine at 100+ fps with those settings on the 9700.

Those examples are over three years old. So yes, todays fastest card can handle games that were introduced when the Voodoos/TNT2 was king of the hill quite well. That is hardly a strong argument given that those games were intended to run decently on Voodoo1s or even software rendered.
A more relevant question would be:
"To what extent will this card be fillrate limited for the games it will be used for durÃng its lifetime?"

Someone who might buy an NV30 this April might want to keep it for, say, a year and a half. Towards the end of 2004, will it run those titles at 1600x1200x32 4XFSAA 16xANISO @ 60fps sustained? Obviously not. Not by a Very Wide Margin Indeed, I'd predict. We are not even close to having sufficient fillrate, even for the apps we see on the horizon for immediate release.

While I can agree that fillrate isn't a solved problem, it's not the main problem for the future. Faster pixel shader execution is primarily what we need.

Well, "the future" is too far away for my crystal ball. Yours may well be clearer. Beware the palantirs though.

Entropy

andypski · Nov 24, 2002

DemoCoder said:
It's not the 128-bit bus that that constrains the fillrate, that has nothing to do with it, it is the ratio between the core clock and the memory clock. If you just blindly divide the bus width by the pipeline width * # pipelines, you are doing the wrong calculation.

Of course just dividing the bus width by the # of pipelines is wrong.

... but, dividing the peak bandwidth by the peak pixel rate is the right calculation for the pixel-rate limited case - this dictates the fastest speed I can ever expect to fill pixels, and I want to be able to fill as close to my peak theoretical pipeline speed as possible under all circumstances. That is efficient and balanced - my expensive pipeline silicon is not being held back unnecessarily by external restrictions.

If I can't actually achieve my peak fill figure then I might redesign and go to a design with less pipelines but where each pipeline can run more instructions per clock - this might make better use of my silicon area and memory resources. The more pixels I actually have 'in flight' the more buffering/FIFO memory I may need internally to hold intermediate calculations.

A R300 @ 600Mhz would also be constrained identically and not able to write out all 8 pipelines every clock. The width of the bus has nothing to do with it.

The ratio between the peak fill rate and the bus bandwidth has everything to do with it. What makes you think that an R300 cannot come close to writing out all 8 pipelines every clock?

Do not underestimate the amount of single-textured simple shader pixels that still occur in modern applications. There are a lot of them.

[EDIT - just clarification]

DemoCoder · Nov 24, 2002

My point is, if future games do significant amounts of calculation and take more than 1 cycle to write a computed pixel, then the extra bandwidth is wasted

Imagine that in the future, every game runs pixel shaders that are atleast 16 instructions in length, and that at best, these will be executed in 8 cycles. Now tell me why I would want my memory bus sitting idle for 7 cycles doing nothing in a "balanced card"

You guys are assuming that you always want to write out a pixel every cycle. I am saying that as shader lengths increase and more calculation is done per pixel, having the card "balanced" to write out 1 pixel from every pipeline every cycle is in fact UNBALANCED, because the true measure is that the card should be balanced to write out 1 pixel every X cycles where X is the average time to execute a shader in the average game + average amount of textures fetched.

I would argue that as we shift to being calculation limited from mere single-texturing fillrate limited, the ratio between pipeline fill bandwidth and memory bandwidth should not be 1 to 1. Otherwise, that extra bandwidth is wasted. In the limit, when we are talking about 1-2Ghz GPUs and 100+ instruction shaders, coupling the bandwidth to the computed single-texturing fillrate would be a crime of inefficiency.

I've been on this board for several years now, and every single time that a new card comes out, without fail, it's the same thing. The "bandwidth numerologists" come out of the woodwork. They read the specs, plug the numbers into their bandwidth equation, and declare a victor. And of course, they are usually wrong. With Parhelia they were dreadfully wrong. The single biggest limit we have today is in fact, the CPU limit.

The only real upcoming need for raw non-shader-limited fillrate will be to do shadows, and there, you only need to fill Z/Stencil, of which the the NV30 has enough bandwidth for all 8 pipelines. But even if it didn't, it wouldn't be the end of the world, since as I said, Nvidia is going to run the clock as fast as they possible can to speed up the shaders. They will deem the raw Z/stencil fillrate "enough" even if they get stalls.

If we don't care about the compute limited case, then perhaps we should go back to DirectX8 and 8-instruction shaders.

gking · Nov 24, 2002

You guys are forgetting the effects of compression, caching, and all the other techniques IHVs use that affect memory performance. If you work out the amount of actual bandwidth needed to run Quake III with 8x trilinear aniso, 4x MSAA, at 32-bit 1600x1200 resolution (32-bit Z/s), no card with a paltry 16-20GB/sec should be able to sustain even 2-digit framerates.

DemoCoder said:
Imagine that in the future, every game runs pixel shaders that are atleast 16 instructions in length, and that at best, these will be executed in 8 cycles. Now tell me why I would want my memory bus sitting idle for 7 cycles doing nothing in a "balanced card"

Since the memory bus is run asynchronously from the actual execution units, it won't necessarily be sitting idle (it will still be fetching textures, outputting finished pixels from the FIFO, etc.); however, your point is correct -- as shader length increases, memory bandwidth needs will typically become a secondary concern (which means traditional memory-hogs like anisotropic filtering and AA can be enabled without incurring any penalty). Of course, if 100% of your shader time is spent fetching textures, you'll still run into bandwidth limitations, but these types of shaders are a largely uninteresting corner case.

Entropy said:
Someone who might buy an NV30 this April might want to keep it for, say, a year and a half. Towards the end of 2004, will it run those titles at 1600x1200x32 4XFSAA 16xANISO @ 60fps sustained? Obviously not. Not by a Very Wide Margin Indeed, I'd predict. We are not even close to having sufficient fillrate, even for the apps we see on the horizon for immediate release

Of course, most of those games will not be fillrate limited, by and large (well, they are in a sense, but not the type of fillrate that people here are arguing about). Even DX8 shaders can be long enough that you run into shader execution limitations long before you run into memory bandwidth limitations.

In future games, the only parts of the game that will run into traditional fillrate limits will be shadow passes -- and those already have a great deal of acceleration techniques built into the hardware. Of course, a game like Doom III (which can have 10+ shadow-casting lightsources on screen at a time) will be fillrate-starved, but I'd be surprised if too many games released in the next 24 months have a depth complexity of 40.

andypski · Nov 24, 2002

DemoCoder said:
My point is, if future games do significant amounts of calculation and take more than 1 cycle to write a computed pixel, then the extra bandwidth is wasted

Imagine that in the future, every game runs pixel shaders that are atleast 16 instructions in length, and that at best, these will be executed in 8 cycles. Now tell me why I would want my memory bus sitting idle for 7 cycles doing nothing in a "balanced card"

You guys are assuming that you always want to write out a pixel every cycle. I am saying that as shader lengths increase and more calculation is done per pixel, having the card "balanced" to write out 1 pixel from every pipeline every cycle is in fact UNBALANCED, because the true measure is that the card should be balanced to write out 1 pixel every X cycles where X is the average time to execute a shader in the average game + average amount of textures fetched.

I would argue that as we shift to being calculation limited from mere single-texturing fillrate limited, the ratio between pipeline fill bandwidth and memory bandwidth should not be 1 to 1. Otherwise, that extra bandwidth is wasted. In the limit, when we are talking about 1-2Ghz GPUs and 100+ instruction shaders, coupling the bandwidth to the computed single-texturing fillrate would be a crime of inefficiency.

I've been on this board for several years now, and every single time that a new card comes out, without fail, it's the same thing. The "bandwidth numerologists" come out of the woodwork. They read the specs, plug the numbers into their bandwidth equation, and declare a victor. And of course, they are usually wrong. With Parhelia they were dreadfully wrong. The single biggest limit we have today is in fact, the CPU limit.

The only real upcoming need for raw non-shader-limited fillrate will be to do shadows, and there, you only need to fill Z/Stencil, of which the the NV30 has enough bandwidth for all 8 pipelines. But even if it didn't, it wouldn't be the end of the world, since as I said, Nvidia is going to run the clock as fast as they possible can to speed up the shaders. They will deem the raw Z/stencil fillrate "enough" even if they get stalls.

If we don't care about the compute limited case, then perhaps we should go back to DirectX8 and 8-instruction shaders.

Your argument about 16+ instruction pixel shaders is fine - I already agreed that you can create longer shaders until eventually you become calculation limited - that is not in question.

However - any assumption that a 16 (or more) instruction pixel shader is not going to be bandwidth limited is not necessarily correct. Some shaders may be calculation limited - others may be the same length but still (sometimes) be bandwidth limited depending on other factors.

The rendering environment in which the shader runs may also enforce bandwidth limits - MSAA rendering almost always does at the moment.

Of course it is important to care about the compute limited case, but if you want to create a design that is targetted and 'balanced' for the case where you write out a pixel every X cycles from each pipe then why produce a (probably) less efficient design that is capable of outputting a pixel every cycle from every pipeline, when you could have less pipelines running more IPC and still generate the same speed?

If you assume that pipeline silicon is expensive both in terms of development effort and silicon area then any time when you are not achieving as close to full utilisation as possible in the pipelines you are effectively wasting money. This is guaranteed to happen if you cannot feed the pipelines sufficiently quickly from memory.

So why design a chip with many pipes and less IPC?

Because the 'full speed' case is still important.

At no point did I assume, as you suggest, that you 'always' want to write out a pixel every cycle. What I did assume, based on my experience, is that this is still an important case now. It's all very well saying that it won't be in the future, but these chips (well, one of them at least

) are current designs. We can talk about some hypothetical future chips running future applications, but for the moment, and for the immediate future (until we see the back of current legacy hardware) shaders are short, MSAA is rampant, and bandwidth is still (usually) king.

Perhaps in the future, as shader processing requirements become more CPU-like, we will see newer VPUs that are capable of super-saturating the external bandwidth when running simple programs, just as CPUs do today

For current game applications that would be a woefully inefficient use of silicon area.

- Andy.

[Edit - missed out a word]

Mintmaster · Nov 24, 2002

andypski said:
... but, dividing the peak bandwidth by the peak pixel rate is the right calculation for the pixel-rate limited case - this dictates the fastest speed I can ever expect to fill pixels, and I want to be able to fill as close to my peak theoretical pipeline speed as possible under all circumstances. That is efficient and balanced - my expensive pipeline silicon is not being held back unnecessarily by external restrictions.

If I can't actually achieve my peak fill figure then I might redesign and go to a design with less pipelines but where each pipeline can run more instructions per clock - this might make better use of my silicon area and memory resources. The more pixels I actually have 'in flight' the more buffering/FIFO memory I may need internally to hold intermediate calculations.

THANK YOU ANDYPSKI!!!

At least someone here can see my point. This is exactly what I'm saying. This is what "unbalanced" means. The GF3 and GF4 were very well balanced for the most part. NV30 is trying to squeeze every last bit of bandwidth out of the card by flooding the memory controller, but the downside is that for a year or two (until substantial shader lengths arrive) the core will be sitting idle much of the time, and that's a big sacrifice for a hot, leaf-blower cooled, 125M transistor core.

Democoder, see my first paragraph in the following post:

http://www.beyond3d.com/forum/viewtopic.php?t=3281&postdays=0&postorder=asc&start=71

A lot of the really important new shaders right now are very bandwidth dependent. Even with a much higher instruction issue rate, NVidia will have a hard time being any better than the 9700.

Yes, NV30 will be very good for longer shaders, but if developers are just getting their feet wet in DX9 right now, it's going to be a long time before NV30 reaps the benefits. Too long a shader, and framerates may slow down beyond playability (although it will be faster than the 9700, it may not matter). Too short a shader, and bandwidth limitations come into play. NV30 will have a sweet spot, but what about everything else? This is why it is unbalanced.

Entropy · Nov 24, 2002

gking said:
Entropy said:

Someone who might buy an NV30 this April might want to keep it for, say, a year and a half. Towards the end of 2004, will it run those titles at 1600x1200x32 4XFSAA 16xANISO @ 60fps sustained? Obviously not. Not by a Very Wide Margin Indeed, I'd predict. We are not even close to having sufficient fillrate, even for the apps we see on the horizon for immediate release

Click to expand...

Of course, most of those games will not be fillrate limited, by and large (well, they are in a sense, but not the type of fillrate that people here are arguing about).

Hmm. What kind of fillrate am I talking about? Basically, I'm talking about whether a design has sufficient bandwidth to do the job requested as above, or not. Just how those high quality pixels are produced, i.e. exactly how you spend the available bandwidth, is fundamentally irrelevant.

Even DX8 shaders can be long enough that you run into shader execution limitations long before you run into memory bandwidth limitations.

Sure. And just how many of those have we seen in games even though DX 8 has been around for ages? Just for the hell of it, I looked at all the advertisements for computers I got this sunday. All PCs were sold with DX7 class hardware at best. All. (Incidentally, the ads that shouted "Geforce 4 graphics" invariably turned out to have GF4MX-420s in them.)

So how long will it take before anything looking like a critical mass of potential customers use gfx cards with DX9 class shader capabilites? It is likely to take even longer than that before someone publishes a game that is so dependent on those long shaders that it is performance limited by them. Given that I mentioned a perspective of just under two years, those titles would have to be in development at this moment.

I hereby express doubt.

Entropy

DemoCoder · Nov 25, 2002

Mintmaster said:
Yes, NV30 will be very good for longer shaders, but if developers are just getting their feet wet in DX9 right now, it's going to be a long time before NV30 reaps the benefits. Too long a shader, and framerates may slow down beyond playability (although it will be faster than the 9700, it may not matter). Too short a shader, and bandwidth limitations come into play. NV30 will have a sweet spot, but what about everything else? This is why it is unbalanced.

Well, it looks to me like the NV30 will be faster on the older titles, but on longer shaders will do better as well. There already exists enough bandwidth to take care of the old single/dual textured games. Cranking up the core clock to make the shaders go faster isn't going to harm the performance on the older games, all it's going to do is make any DX9 shaders run faster. So from my point of view, increases in the core clock are "free" and they are much easier to achieve than increases in bandwidth. In the DX9 title case, every increase in core clock is going to have a direct performance increase on the fillrate.

Why should we complain if the core clock runs so fast that there isn't enough bandwidth to handle the degenerate condition of 1-2 instruction shaders? All it does is make sure that IF you toss a DX9 long shader title at it, it's going to perform better, but if you toss a DX8 title at it, it's still going to perform good enough (e.g. as fast or faster than R300)

That's why I don't get this "unbalanced" talk. You want NVidia to design a card so that it is optimized to run old legacy games the fastest. (a card "balanced" for the DX7 case) But NVidia is looking towards the future and want to design cards "balanced" for the case of pixel shaders that are dozens of instructions. Then you want to fall back on the same old tired argument that there are no DirectX9 games. Well then, why aren't you arguing that both NVidia and ATI are wasting millions of transistors on useless features that will never be supported for 2 years and they could have just shipped really really fast DirectX8 cards this cycle, or even DirectX7?

If you start with the premise that these are DirectX9 cards designed to run future programmable shader games, then core clock and shader execution rate are vitally important. I think we've already got enough Quake3 and Counter-Strike frame rate, so I really don't care if the NV30 gets beat running one of these engines. And virtually all other game engines I've seen are severely CPU limited as well. So all this talk of hitting the old multitexturing fillrate on older titles is really kind of moot.

I can run most of my older titles and even titles coming out this year at atleast 1280x1024 with 4X-6X FSAA and 16xaniso. How much more do I need?

DemoCoder · Nov 25, 2002

Entropy said:
So how long will it take before anything looking like a critical mass of potential customers use gfx cards with DX9 class shader capabilites? It is likely to take even longer than that before someone publishes a game that is so dependent on those long shaders that it is performance limited by them. Given that I mentioned a perspective of just under two years, those titles would have to be in development at this moment.

I hereby express doubt.

Entropy

So what you are really saying is that you are wasting your money buying an expensive DX9 card with useless features and that what ATI and NVidia really should do is pull a "GeForce MX" and ship a fast 8-pipeline 256-bit bus DirectX7 card.

Wouldn't you really be happy with that? The card would be much cheaper since it wouldn't have had useless transistors on it, and they wouldn't have spent the extra R&D on these useless features, and all of your current games would go faster without waiting 2 years.

If you were presented with two cards: One, a DX7 8-pipeline 256-bit bus card that was just as fast as a DX9 8-pipeline 256-bit bus card with identical IQ, only the first was half the price, ran much quiter and cooler with less power, would you buy the DX7 card instead? I mean, you'd be dumb to buy an overpriced DX9 card with features that won't be used by anyone for maybe 2 years right!?

Here we are yet again at the same old argument every year.

Doomtrooper · Nov 25, 2002

Not sure who to blame, developers or IHV's but someone certainly is not doing their job...look at some of the popular games today and what is these fancy DX8 cards and now Dx9 cards doing that a very fast DX7 card is doing.

Does Dungeon Siege run better on a Dx8 card...not really, in fact you would get more bang for buck by getting a new CPU..or how about UT 2003, again a faster processor shows the biggest gains.

I stated this a while back, IMO developers are not utilizing the power in these newer cards...here were are so close to DX9 release and the latest graphic engine UT 2003 releases as a DX 7 engine, or Dungeon Siege, or Comanche 4.....

Thats why I'm waiting for a Barton or 64 bit opteron..to me the platform is the better buy anymore.

A post from Gas Powered Games:

This is a DX7 game (though networking uses DX8). We install DX8.1 only because it contains post-DX7 bug fixes, but do not actually use any features like pixel/vert shading from 8 for graphics. This is why we can run on Voodoo 2 boards.

Anyway, there are two frame rate issues.

First is the problem where people are expecting 80 fps because they have a maxxed out system, and that's what they get in Q3. In DS, the frame rate is way more dependent on CPU, hard drive subsystem, and memory than the video card. All that CPU goes into AI, line of sight calculations, physics simulations, pathfinding/following, and a bunch of other stuff. Most importantly it goes into loading and unloading objects in background for our continuous world (though a dual-proc will help you out there). This is a very CPU-intensive game.

Second issue is the problem where people have bad frame rates relative to each other with similarly configured systems. I recently discovered that problem myself at home (and I feel your pain, folks, ouch) and it appears to be configuration-specific. We are working on figuring out what's going on there, and hope to put a fix into the next build. If you have a maxxed out system you might not get 80 fps, but you certainly shouldn't get 15! We're working on it.

.Scott@GPG

There will be no point to upgrade to these fancy new cards unless developers start making their code a little more current with the hardware...still waiting for a Decent DX8 title to be released

Nagorak · Nov 25, 2002

DemoCoder said:
Why should we complain if the core clock runs so fast that there isn't enough bandwidth to handle the degenerate condition of 1-2 instruction shaders? All it does is make sure that IF you toss a DX9 long shader title at it, it's going to perform better, but if you toss a DX8 title at it, it's still going to perform good enough (e.g. as fast or faster than R300)

The only reason people might complain is that there's a hoover vacuum attached to the GPU just to get it to run shaders faster. If it ends up being bandwidth limited in all current games, then it seems to me the core would be better off running at 400 MHz with a quieter fan. Granted, I don't know for sure how loud the GF FX fan is, but it looked pretty imposing.

DemoCoder said:
I can run most of my older titles and even titles coming out this year at atleast 1280x1024 with 4X-6X FSAA and 16xaniso. How much more do I need?

No offense, but what games do you play, the Sims? UT2003 came out just recently and no way is 1280*1024*6X*16X playable even on a Radeon 9700. Maybe you're only expecting 30 FPS or something, but I find your argument that current graphics cards are so ridiculously overpowered to be a stretch, to say the least.

Doomtrooper said:
Does Dungeon Siege run better on a Dx8 card...not really, in fact you would get more bang for buck by getting a new CPU..or how about UT 2003, again a faster processor shows the biggest gains.

I stated this a while back, IMO developers are not utilizing the power in these newer cards...here were are so close to DX9 release and the latest graphic engine UT 2003 releases as a DX 7 engine, or Dungeon Siege, or Comanche 4.....

Thats why I'm waiting for a Barton or 64 bit opteron..to me the platform is the better buy anymore.

I question whether Dungeon Siege performance even matters. Dungeon Siege is not a horribly strenuous game to begin with. It's barely even a 3D game. Who cares if you get 80 fps or 30 fps, it's not like a First Person Shooter where you NEED 60 fps in order to be competitive. I'll go beyond what you're saying and say that upgrading at all in order to get better performance out of a game like DS is more or less a complete waste. (Also IMO the game is boring as hell, but that's neither here nor there

)

Crusher · Nov 25, 2002

Doomtrooper said:
Not sure who to blame, developers or IHV's but someone certainly is not doing their job...look at some of the popular games today and what is these fancy DX8 cards and now Dx9 cards doing that a very fast DX7 card is doing.

Does Dungeon Siege run better on a Dx8 card...not really, in fact you would get more bang for buck by getting a new CPU..or how about UT 2003, again a faster processor shows the biggest gains.

I stated this a while back, IMO developers are not utilizing the power in these newer cards...here were are so close to DX9 release and the latest graphic engine UT 2003 releases as a DX 7 engine, or Dungeon Siege, or Comanche 4.....

...

There will be no point to upgrade to these fancy new cards unless developers start making their code a little more current with the hardware...still waiting for a Decent DX8 title to be released

This argument has been done to death on these boards, and there are always two types of people in these arguments. There are those who agree that games take a couple of years to produce, and most games already in development aren't going to shift complete version levels of their graphics systems just to take advantage of hardware that 90% of their consumers won't have. Then there are those, like yourself, who want everything to be done quickly, work flawlessly, and cater to the whims of the latest and greatest features.

Two years from now you'll be up to your eyeballs in DX8 games, and the only thing it will do is cause you to complain that there aren't enough DX9 games around yet.

DemoCoder · Nov 25, 2002

Nagorak said:
No offense, but what games do you play, the Sims? UT2003 came out just recently and no way is 1280*1024*6X*16X playable even on a Radeon 9700. Maybe you're only expecting 30 FPS or something, but I find your argument that current graphics cards are so ridiculously overpowered to be a stretch, to say the least.

UT2003 is way more CPU limited and in fact, I find 1280x1024 4xFSAA to be playable. Personally, however, I think is unimpressive visually. It spends way more polygons on details that could just as easily been normal mapped and in the end, it still looks rather conventional lighting wise. The VAST majority of games on the market right now and shipping this year are not limited by bandwidth. Try picking 20 games, and 18 of them will be CPU limited.

I personally am sick and tired of games that look bland lighting wise. I'd rather have more games like Doom3, Halo2, or Splinter Cell even if I had to drop down to 2X FSAA or 1024x768. I am totally with Nvidia on this one. I want more cinematic quality from shaders. That's why we have DX9.

I don't want same old bland looking games, just with less aliasing.

Joe DeFuria · Nov 25, 2002

My point is, if future games do significant amounts of calculation and take more than 1 cycle to write a computed pixel, then the extra bandwidth is wasted

Well, yes. And the counter-point is of course that today's games that don't require all that "pixel computational power", then all that extra computational power is wasted.

Imagine that in the future,

That future you speak of is not hard to imagine. The question is, when will it arrive, and will this generation of cards even be relevant at that point?

I would argue that as we shift to being calculation limited from mere single-texturing fillrate limited, the ratio between pipeline fill bandwidth and memory bandwidth should not be 1 to 1.

Only after we get to the point where we have "enough" absolute bandwidth, for today's games, to reach 1600x1200x32 with min 16x anisotropy and min 8x FSAA. Only after that would I say you are getting severly diminishing returns.

Furthermore....

Imagine a future where 32 bit is no longer enough for games, and 64 or even 128 bit depths from texture to back-buffer become the norm. (See Carmack.) That's a big jump in bandwidth requirements that can tip the scales right back.

I do recall hearing similar arguments when the GeForce1 came out....what good is all the "fill-rate" of the up-coming V5 6000, if it's going to be CPU limited in future games due to the lack of T&L?

Tech-Report blasts GeForce FX

DemoCoder

Laa-Yosh

I can has custom title?

Entropy

Humus

Crazy coder

Luminescent

andypski

Entropy

andypski

DemoCoder

gking

andypski

Mintmaster

Entropy

DemoCoder

DemoCoder

Doomtrooper

Nagorak

Crusher

Aptitudinal Constituent

DemoCoder

Joe DeFuria

Similar threads