Does Dave have ATI SM 3.0 Hardware?

trinibwoy said:
I'm stumped. Why does everyone think that ATI will move to SM3.0/FP32 and bring significant performance gains with R520 while Nvidia is saying that their refresh will be a modest improvement at best.

For one, ATI is expected to use 0.09 nm process, while we expect nVidia to still be on 0.13 / 0.11.

This doesn't guarantee anything of course, but it does provide rationale.
 
Why does everyone think that ATI will move to SM3.0/FP32 and bring significant performance gains with R520 while Nvidia is saying that their refresh will be a modest improvement at best.

Beats me... but it may just be perhaps that ATI had SM2 dominance for a longer period and that moving to SM3.. they could have found something to improve upon the existing architecture (as part of the move to SM3). NVidia's move was natural.. mainly because they had to resolve thier deficencies in the previous architecture.
 
Joe DeFuria said:
For one, ATI is expected to use 0.09 nm process, while we expect nVidia to still be on 0.13 / 0.11.

This doesn't guarantee anything of course, but it does provide rationale.

Well ATI certainly has built up an impressive reputation with some folks. They're going to move to a new process, add transistors for new features plus increased precision and hit a home run with performance!! If they do that Nvidia's engineers should hang their heads in shame.
 
trinibwoy said:
I'm stumped. Why does everyone think that ATI will move to SM3.0/FP32 and bring significant performance gains with R520 while Nvidia is saying that their refresh will be a modest improvement at best.

Which is exactly why I'm going out on a limb and claiming around 20%. The fact that it will most likely be running FP32 and only FP32 is what is going to limit it's performance.
 
trinibwoy said:
Well ATI certainly has built up an impressive reputation with some folks. They're going to move to a new process,

Which they probably had quite a bit of practice with via x-box 2 chip...

add transistors for new features plus increased precision and hit a home run with performance!!

See R300. ;)

If they do that Nvidia's engineers should hang their heads in shame.

No, just a consequence of continually moving targets...and product cycles being driven by slightly different inflection points...
 
Joe DeFuria said:
If the hints and rumors turn out to be true, then there will be certain circumstances where the 520 architecture is a significant improvement over anything else on the market today. The branching capability in the NV4x architecture, for example, while functional, isn't particularly robust / fast. ATI's been dropping hints that they will have a better solution for it.

Well, it's not like ATI, or any company, would drop hints that they "may be able to perform on par with hardware released by a competitor one year ago." Given that that year has passed I think it's safe to assume that whatver ATI releases will and must perform faster than NV4x.

ANova said:
The fact that it will most likely be running FP32 and only FP32 is what is going to limit it's performance.

But they have the process shrink to allow more transistors. The only way FP32 limits performance when the entire processor is built for it is that you must allocate transistors for it that could otherwise have been used for other performance boosting circuitry. The NV40 has no problem running FP32 and I doubt R520 will have any problems. Was this thinking inspired by the Geforce FX where FP32 suffered because it was FP16 design doing double duty? (like a 32-bit CPU having to calculate 64-bit integers by looping)
 
wireframe said:
Well, it's not like ATI, or any company, would drop hints that they "may be able to perform on par with hardware released by a competitor one year ago."
What was the name of the company that made the Duo-sucks-butt or whatever that overnight forgetter was called? I think they actually did make a claim similar to that. ;)
 
digitalwanderer said:
wireframe said:
Well, it's not like ATI, or any company, would drop hints that they "may be able to perform on par with hardware released by a competitor one year ago."
What was the name of the company that made the Duo-sucks-butt or whatever that overnight forgetter was called? I think they actually did make a claim similar to that. ;)

I dunno, but you seem to have proven me wrong. I shall make a full and formal retraction once the evidence is produced.
 
Joe DeFuria said:
ANova said:
How exactly is it different? It's built on the same architecture as the R420, which in turn is built on the R300. The only difference is in the fact that it will support SM3 and above and offer a modest core clock increase. I doubt it will feature more then 16 pipelines and it will definitely not be three times faster, more like 20-50%.

If the hints and rumors turn out to be true, then there will be certain circumstances where the 520 architecture is a significant improvement over anything else on the market today. The branching capability in the NV4x architecture, for example, while functional, isn't particularly robust / fast. ATI's been dropping hints that they will have a better solution for it.

We'll just have to wait and see.

I could see, at most a 2x increase in "general throughput" though, with 50% being more realistic.

Depends when we will see branching in applications, what kind of branching and under what circumstances or not?

There where posts even on these boards that showed driver entries that are obviously related to R520, where there were mentions for 512 PS30 instruction slots and somewhere over 1K VS30 slots. All I know is that the number of instructions slots is irrelevant to the instructions an architecture can theoretically execute. NV40 already claims (in a relative sense as always) "unlimited instructions" (limited obviously to >65K for either/or PS/VS) and I don't expect R520 to claim anything less.

The point where I don't have a clue, is if, where and why the number of instructions slots plays any role at all, especially when it comes to branching. NV40 has 4096 PS30 and 544 VS30 instruction slots, according to the driver.

I also have no clue what kind of overhead branching in current drivers for the NV40 takes; yet I have on the other severe doubts that R520 will be capable of single cycle branching in the end. If the ballpark of difference is let's say 3 vs. 5 clocks overhead for example, I'm not sure if it's good enough for a peak of twice of performance; in a pure synthetic application maybe. In a full game environment though?

***edit: I actually have for the above R520 vs. NV40 in mind. If NVIDIA has something planned to counter R520 more effectively, which will come as close as possible to the R520's fill-rate, might shrink the real-time difference between those two even more.
 
sm 3.0 isnt that interesting at this point in time. the level of shaders that will be used in games in the near future could be done in sm 2.0. the trannys required to support fp32 and sm 3.0 could be used to further enhance their current ps 2.0 speed, which i bet would run faster than 3.0 shaders anyway.
 
sm3.0 is extremely interesting when you consider the fact that it's the general level that the xbox2 is expected to use. Whether or not it's the truth (in the case of their specific product), publishers and developers alike expect the console versions of their games to be the money maker with the PC version a bit of a side goal. That means they'll likely concentrate the bulk of their efforts on good xbox2 performance and do a more direct translation to the PC (rather than a ground-up design). So the closer your PC is to the xbox2, the more likely you are to have the performance and visuals the developers planned for.

I wouldn't be too surprised if most developers porting their games to the PC will simply disable features on lesser hardware (even though it could technically be accomplished with relatively little effort on sm2 hardware), in which case if you don't have SM3 you're shit out of luck. Keep in mind, however, that when I say "most" I mean the quantitative majority (lots of games you likely wouldn't play anyways) - I highly doubt any developer of a major title would go this route as they'd end up making significantly less profit than if they were to put in the full effort (and they have the resources to do a proper port).
 
hovz said:
sm 3.0 isnt that interesting at this point in time. the level of shaders that will be used in games in the near future could be done in sm 2.0. the trannys required to support fp32 and sm 3.0 could be used to further enhance their current ps 2.0 speed, which i bet would run faster than 3.0 shaders anyway.

SM3.0 is a natural evolution IMO to SM2.0. If a developer wants to deal in the future with extremely long and complex shaders SM2.0, won't be good enough anymore. The longer/complex shader the higher the internal precision required.

Shaders in upcoming APIs are much closer to SM3.0 than anything else.

Current games are IMHO still dx7/dx9.0 sort of "hybrids". Once pure DX9.0 games appear in the future (I'd call UE3 a SM2.0 engine, based on the currently known data for example- 50-200 instructions per shader), SM3.0 will still even there serve for possible performance increases only. If there a SM3.0 architecture is a better idea than a SM2.0 architecture, we'll find out obviously next year. If developers beyond that want to use longer shaders, then I figure either SM3.0 or future WGF2.0 Shaders will be far more efficient.

If IHVs don't deliver developing platforms today, I'm afraid games 4-5 years down the line might still be restricted to SM2.0 and ~200 instructions max.
 
thats the thing, i doubt well see any games in he near future pushing shaders complex enough that sm2.0 is that much slower than 3.0.
 
Because R520 is not a refresh?

Usually ATI doubles performance with every new generation, with refreshes in between that have very little performance gain.
Look at R200 vs R100
R300 vs R200
R420 vs R300

So why would it suddenly be different for R520 vs R420?
 
Ilfirin said:
sm3.0 is extremely interesting when you consider the fact that it's the general level that the xbox2 is expected to use. Whether or not it's the truth (in the case of their specific product), publishers and developers alike expect the console versions of their games to be the money maker with the PC version a bit of a side goal. That means they'll likely concentrate the bulk of their efforts on good xbox2 performance and do a more direct translation to the PC (rather than a ground-up design). So the closer your PC is to the xbox2, the more likely you are to have the performance and visuals the developers planned for.

I wouldn't be too surprised if most developers porting their games to the PC will simply disable features on lesser hardware (even though it could technically be accomplished with relatively little effort on sm2 hardware), in which case if you don't have SM3 you're shit out of luck. Keep in mind, however, that when I say "most" I mean the quantitative majority (lots of games you likely wouldn't play anyways) - I highly doubt any developer of a major title would go this route as they'd end up making significantly less profit than if they were to put in the full effort (and they have the resources to do a proper port).

No disagreement Ilfirin, yet by the time those games will appear for XBox2, I'd expect SM3.0 hardware to have a much higher presence in the average PC than today.
 
mjtdevries said:
Because R520 is not a refresh?

Usually ATI doubles performance with every new generation, with refreshes in between that have very little performance gain.
Look at R200 vs R100
R300 vs R200
R420 vs R300

So why would it suddenly be different for R520 vs R420?

Did you count the amount of SIMD channels for all of those prementioned architectures?

I can see 8 SIMD channels on R3xx and 16 on R4xx. If you of course expect R520 to have 8 quads/32 SIMD channels and the quite high rumoured clockspeed most expect right now, then of course it's another story. Even if it would be possible to ship such a board today at reasonable prices and quantities (I still wouldn't know where you'd get the bandwidth for 8 quads from....), I'm sure you'd then expect 64 SIMD channels for R600, wouldn't you? :rolleyes:
 
Ailuros said:
Ilfirin said:
sm3.0 is extremely interesting when you consider the fact that it's the general level that the xbox2 is expected to use. Whether or not it's the truth (in the case of their specific product), publishers and developers alike expect the console versions of their games to be the money maker with the PC version a bit of a side goal. That means they'll likely concentrate the bulk of their efforts on good xbox2 performance and do a more direct translation to the PC (rather than a ground-up design). So the closer your PC is to the xbox2, the more likely you are to have the performance and visuals the developers planned for.

I wouldn't be too surprised if most developers porting their games to the PC will simply disable features on lesser hardware (even though it could technically be accomplished with relatively little effort on sm2 hardware), in which case if you don't have SM3 you're shit out of luck. Keep in mind, however, that when I say "most" I mean the quantitative majority (lots of games you likely wouldn't play anyways) - I highly doubt any developer of a major title would go this route as they'd end up making significantly less profit than if they were to put in the full effort (and they have the resources to do a proper port).

No disagreement Ilfirin, yet by the time those games will appear for XBox2, I'd expect SM3.0 hardware to have a much higher presence in the average PC than today.

Right. I wasn't trying to suggest that anyone needs to run out and get a sm3 card or anything, just that it's vital that ATI's next-gen part to be sm3 else what I mentioned becomes a problem (as the post I was replying to seemed to suggest that it'd be better for ATI to simply invest, again, in a significantly higher performance ps2.b part than to move to sm3 now).
 
hovz said:
sm 3.0 isnt that interesting at this point in time. the level of shaders that will be used in games in the near future could be done in sm 2.0. the trannys required to support fp32 and sm 3.0 could be used to further enhance their current ps 2.0 speed, which i bet would run faster than 3.0 shaders anyway.

The very same thing could have been said when going from SM 1.x to SM 2.0. Lots of games could have been made with huge amounts of very fast SM 1.x shaders. This was not to be.

The fact remains that SM 3.0 fleshes out SM 2.0 with precision and programmability. The way I understand the transition to SM 3.0, it is the programmability that is the key. Yes, many things could be done with SM 2.0, but the 3.0 model makes this easier for developers, especially those used to thinking in C/C++ and not ASM. By this I mean that SM 3.0 has more robust branching and a more unified programming model. There is less need to think about multiple branches that must coincide by thinking at a primitive pipeline hardware level. SM 4.0 should complete this transition to programmability that will be very similar to CPU programming using C/C++. I am still not sure why there is talk about SM 4.0 forcing shared resources in hardware. My initial understanding was that the goal was simply to make PS and VS data combining transparent to the developer.

I liken this to a painter creating his masterpiece by applying CMYK layers. This is entirely possible, but it would require an absolute genius and you have to think ahead so far that even a grand champion chess master would be impressed. Why not make it simple and mix colors as you go? Have a look here and there for shades and mix and match.

The Microsoft XNA and Nvidia Cg Toolkit provide strong examples of this open way of working that would allow programmers and non-programmers to create the effects needed in next generation titles.

A genius can always squeeze the impossible out of nothing, but the idea and driver now seems to make this accessible and then have a compiler make it all happen. To allow developers to think of the larger concept and then create their details rather than having to figure out how to puzzle the bits and pieces together in a way that works together.
 
hovz said:
sm 3.0 isnt that interesting at this point in time. the level of shaders that will be used in games in the near future could be done in sm 2.0. the trannys required to support fp32 and sm 3.0 could be used to further enhance their current ps 2.0 speed, which i bet would run faster than 3.0 shaders anyway.
Whether the SM2.0 speed can be made faster is irrelevant when you consider developers' preferences. It will always be preferable to write SM3.0 code than worrying about how to split SM2.0 code into multiple passes in order to make up for the lack of dynamic branching and other features that SM3.0 introduces. SM3.0 will also allow ATI's implementation of instancing to fit "properly" within the spec, though that's certainly a minor issue.
 
Back
Top