Next gen graphics and Vista.

Wunderchu said:
however, unlike NVIDIA who basically only have one current high end architecture (G70~RSX).. ATI also have their Xenos architecture in addition to their R520 architecture, and I imagine they would like to get the technology developed for Xenos out into the PC space as soon as possible, and take further advantage of the R&D they spent on developing that part.. I think this may especially be the case if Xbox 360 does well & 'unified shaders' receives positive hype
Kudos for pointing that out;)
 
Dave Baumann said:
? There is no such thing as "unified shading" - in DX10, would have VS, PS and now GS operations and at the very least the VS and PS will have the same programmable capabilities and the same instruction set; in a non-unified hardware platform the VS will be performed on the VS units and the PS on the PS units, but on a unified hardware platform the same hardware is shared for the operations, thats it.
Well, from what I thought I remember reading from the early info on WGF, the unified system that they have doesn't just unify the instruction set, but also allows novel methods of changing the behavior of the PS/VS output, such as having the output of the VS go right back into another VS. I could be wrong, of course, which would destroy my dream of directly implementing a recursive algorithm on the GPU.

Dave Baumann said:
So efficent utilisation of the available execution units is not of use to games? Again, take my prior example of a GS unit in a hypotetical non-unified architecture - how often will that just be die unused, but would be used (for non GS operations) in a unified platform?
I'm sorry, I guess I misspoke. What I meant is that I don't see any benefits from the new algorithms in games. But that may just be because we've not seen any made public (or, at least not widely-known in the message boards).

The utilisation is a definite benefit, but not a certain one. You do lose something to make the pipelines more general, so the question comes as to how whatever is lost in making the architecture unified balances out with the increased utilisation of the execution units.
 
Wunderchu said:
however, unlike NVIDIA who basically only have one current high end architecture (G70~RSX).. ATI also have their Xenos architecture in addition to their R520 architecture, and I imagine they would like to get the technology developed for Xenos out into the PC space as soon as possible, and take further advantage of the R&D they spent on developing that part.. I think this may especially be the case if Xbox 360 does well & 'unified shaders' receives positive hype
I'm not sure I buy that, because Microsoft will likely be footing that bill (they did for the original X-Box, I know that). And besides, consoles are typically much higher-volume parts, and longer lasting, so I don't think ATI would have any problem recouping their investment there.

That said, of course ATI is going to want to keep R&D costs to a minimum regardless of how much money they may or may not stand to get back. But I don't see any connection between the R&D of the Xenos and when ATI should release their next architecture (after R520).

That is to say, they've already spent all this money on the R520, so it makes sense for them to attempt to get as much out of that investment as they can. The Xenos should easily make enough money on its own.
 
micron said:
Does this type of stuff put ATi on M$'s good side?
Well, I think Microsoft is still pissed at nVidia over the pricing of the X-Box chips, but I don't think they'd resort to last-minute changes of the API spec or other such things that would really screw nVidia over, so I don't think it has any bearing on how good or not good nVidia will be when accelerating DX10 (If they did, nVidia would sue Microsoft's ass off....Microsoft may be bigger, but nVidia's not tiny).
 
Wunderchu said:
however, unlike NVIDIA who basically only have one current high end architecture (G70~RSX).. ATI also have their Xenos architecture in addition to their R520 architecture, and I imagine they would like to get the technology developed for Xenos out into the PC space as soon as possible, and take further advantage of the R&D they spent on developing that part.. I think this may especially be the case if Xbox 360 does well & 'unified shaders' receives positive hype

ASAP= not earlier than ~end of 2006. However if you're willing to speculate about the future roadmap, it would be interesting what and when will appear in the mainstream and budget markets even after the R600 release. I wouldn't yet exclude the possibility that ATI might keep RV5xx designs longer than expected and those are not only the parts of the market with vastly higher profit margins, but it would mean that the transition period and lifetime of the R5xx family is way longer than a just a year.

It'll definitely be shorter than NVIDIA's low-end SM3.0 3year life-cycle, but I don't see anything yet that would suggest a loss in investments for ATI's R5xx PC standalone line.

Finally I'd have to admit though that NVIDIA's strategy of minimizing expenses and maximizing margins at the same time, is better for the financials. ATI took way too many risks recently IMHO; I personally would had expected them to avoid temporarily low-k 90nm, especially since not so long ago NV fell flat on it's nose with low-k 130nm.
 
chalnoth said:
I'm not sure I buy that, because Microsoft will likely be footing that bill (they did for the original X-Box, I know that). And besides, consoles are typically much higher-volume parts, and longer lasting, so I don't think ATI would have any problem recouping their investment there.

That said, of course ATI is going to want to keep R&D costs to a minimum regardless of how much money they may or may not stand to get back. But I don't see any connection between the R&D of the Xenos and when ATI should release their next architecture (after R520).

That is to say, they've already spent all this money on the R520, so it makes sense for them to attempt to get as much out of that investment as they can. The Xenos should easily make enough money on its own.

Clearly Xenos will recoup ATI's investment and then some - afterall they're not fabbing anything for MS, it's strictly licensing - but I think you're putting too much focus on the recoupment of investment with regard to the R520. ATI will try and make as much money as possible off of this architecture - to be sure - but it behooves them to go on the offensive when the opportunity presents itself rather then to wait on NVidia.

As I've said before, I feel it's likely that ATI has an edge (currently) in the buildup to DX10. R600 could turn out to be their NV30 equivelent in the end, who knows? But that said I'm sure at the moment they are looking forward to the finalization of the DX10 API so they can prepare what in their mind must be a chip they've dreamed about for a long time now. They are doubly advantaged in that the 360 will give them feedback and experience with an actual real-world implementation without ever having risked anything in the PC marketplace.

I'm not counting NVidia out on the DX10 front mind you; I expect to be pleasantly surprised... if that makes sense. But ATI has been building up to R600 for a long time now. However well R520 (and R580) end up doing, they'll still have really only been a bridge generation for ATI to further challenge NVidia on the performance and SM3.0 (and now HD decoding) fronts.
 
Ailuros said:
Not very likely at all.

I agree wholeheartedly. (I hope my post didn't come off as otherwise to you.)

Still - always good to put that hedge in; afterall, who would've expected NV30 to 'fail' like it did pre-launch?

For my part though, I'm very bullish on ATI's prospects with the R600 generation.
 
Last edited by a moderator:
Dave Baumann said:
I don't see that as being the point of unified at all, I'd see it as benefitting any shader title as the point of its design should be to be able to better balance whatever workload is given to it.

I'm just saying it'll take time till we see any DX10 games. It may be the case that many current games have a VS/PS balance which better suits the non-unified architecture.

Of course, it's all just a stupid guess. Just saying it's not all black and white :)
 
Ailuros said:
Finally I'd have to admit though that NVIDIA's strategy of minimizing expenses and maximizing margins at the same time, is better for the financials. ATI took way too many risks recently IMHO; I personally would had expected them to avoid temporarily low-k 90nm, especially since not so long ago NV fell flat on it's nose with low-k 130nm.
Xenos is proof that the "risk" in the transition to low-k 90nm wasn't very high. 230M odd transistors isn't chump change.

If Dave's speculation/hint that the "memory interface" (perhaps that means ring bus) was the part of R520 failing due to soft-ground, then I think it would be safer to point at that as the high-risk venture.

Jawed
 
Chalnoth said:
Well, from what I thought I remember reading from the early info on WGF, the unified system that they have doesn't just unify the instruction set, but also allows novel methods of changing the behavior of the PS/VS output, such as having the output of the VS go right back into another VS. I could be wrong, of course, which would destroy my dream of directly implementing a recursive algorithm on the GPU.
Yes, there will be cases where the outputs don’t necessarily g down the traditional path, and are looped back on one another (e.g.: VS-->GS-->VS), but in the case of hardware that is non-unified they will just have to find a way of handling the intermediate results and passing it to each of the units, either internally or writing out intermediate results externally.

Chalnoth said:
The utilisation is a definite benefit, but not a certain one. You do lose something to make the pipelines more general, so the question comes as to how whatever is lost in making the architecture unified balances out with the increased utilisation of the execution units.
The question is, what do you loose? The ALU’s themselves are still basically simple math units – Kirk’s main gripe was about texturing and the latencies involved with it, but that appear to not be an issue with an architecture like Xenos because the textures units are separated from the shader ALU’s and th architecture is designed to thread in such a manner that other operations/threads are performed on the ALU's when latency bound operations are being performed.

_xxx_ said:
I'm just saying it'll take time till we see any DX10 games. It may be the case that many current games have a VS/PS balance which better suits the non-unified architecture.
Sorry, but how? How are developers under DX10 going to be able to perfectly predict the capabilities of the VS to PS balance on all hardware and exactly tune the utilisation of their games to match that, given that the VS/PS ratio has changed slightly every generation and changes according to the range of hardware. That utilisation balance also shifts on factors that are well outside the control of the developer, and dictated by simple things such as the end user settings and resolution selection.

Look at a graph of a benchmark plotted overtime and the FPS is bouncing around all over the place – at any of these points in time the bottlenecks encountered are shifting from one element of the processing system to another, its next to impossible to have an even load across the all the units through even a few thousand frames, let alone the course of an entire game. No, the game can never be expected to provide the balance of power between PS/VS (and GS) utilisation, and nor should it be a task of the developer to try to (outside of the reasonable bounds of the expected hardware capabilities), the only question is whether dedicated units can still be more optimal than a unified structure in order to best hide/minimise such bottlenecks.
 
I didn't mean they'll code it with some kind of balance in mind, but just that the "average" (unintended) balance may as well be more favorable towards dedicated pipes than the unified in many cases.

I guess the unified scenario will have better average fps, but less peak performance in many cases. Hence we'll have to say goodbye to timedemos as we know them now and go the [H] way instead (with fps over time).

But I'll stop the speculations here, makes no sense really :)
 
Dave Baumann said:
The question is, what do you loose? The ALU’s themselves are still basically simple math units – Kirk’s main gripe was about texturing and the latencies involved with it, but that appear to not be an issue with an architecture like Xenos because the textures units are separated from the shader ALU’s and th architecture is designed to thread in such a manner that other operations/threads are performed on the ALU's when latency bound operations are being performed.
Well, a unified architecture is going to be more complex, for one. So for the same number of math units, you're probably going to be paying out more transistors to make them unified. Then you have the fact that vertex shaders are typically going to have different usage patterns than pixel shaders, and thus a non-unified architecture may be able to gain some efficiency for typical shaders by specializing.

Edit:
Just bear in mind that I have stated previously that I really do like the idea of a unified architecture. I'm just stating that in real world situations, it's not necessarily going to be the case that its performance turns out better for a similar die size.
 
Well, a unified architecture is going to be more complex, for one. So for the same number of math units, you're probably going to be paying out more transistors to make them unified.
I'm sure there is extra control logic in there, but then if it was all that significant, how does that reconcile with putting it in devices bound for mobile platforms? Small device are going to have few ALU's in the first place, so the level of logic required must be lower than just adding a few more for dedicated VS/PS.

Then you have the fact that vertex shaders are typically going to have different usage patterns than pixel shaders, and thus a non-unified architecture may be able to gain some efficiency for typical shaders by specializing.
Quantify this - in what way? They are basically both still dealing with vector math ops, aren't they?
 
Chalnoth, you seem to be wilfully ignoring the scheduler and the fact that it increases utilisation of ALL units, TMUs and ALUs and minimising expensive pipeline stalls.

You also seem to be wilfully ignoring the fact that out of order scheduling allows the GPU to work on smaller batches, which brings greater algorithmic freedom - it means that dynamic flow control is a viable programming technique, unconstrained by multi-thousand pixel batches.

Jawed
 
Jawed said:
You also seem to be wilfully ignoring the fact that out of order scheduling allows the GPU to work on smaller batches, which brings greater algorithmic freedom - it means that dynamic flow control is a viable programming technique, unconstrained by multi-thousand pixel batches.
That not really a function of unification or not. I'd expect this to happen in either case.
 
Well it seems to me that unification requires the out of order scheduler.

But yes, if R5xx is also going to sport such a scheduler then it isn't directly a feature of unified architecture.

Jawed
 
I think the unified architecture makes as much sense for 'low-end' parts as it does for high-end parts - if not more. No more wondering what the ideal mix of pixel and vertex shaders should be, just pick a die size you want to target and stick all the unified's you can in the thing. And therein lies the only possible drawback; would the same useage of die space with dedicated pipes yield a unit with potentially higher peak performance? But Xenos (excluding the eDRAM) seems very reasonable transistor-wise for the power it should pack, and if R600 is successful from the outset, like Intel and Conroe I don't know why ATI wouldn't transition as quickly as possible to a top-to-bottom solution range based on R600. Though I'm sure there would still be the odd part here and there based on legacy architectures.
 
Back
Top