ATI's decision concerning TMUs

It still comes down to the architecture of the family. If you want to pick on something, pick on that, not R580 per se (and yeah, I've said that a few times already). Given that architecture, what else could they have done? The answer seems to be, "add two more quads instead". Which would have given them not only 8 more texture units, but also 8 more ROPS. . .and they'd still have been math-underpowered per pipe. So they'd have had to add at least one more shader per pipe, instead of two. So now you have double the shaders per pipe *and* two more quads (8 more TMU, branching units, and ROPS).

How big do you suppose that would have been?

And what would we have heard then? We'd have been inundated with ATI's silly and wholly unjustified decision to have a huge chip with 50% more ROPs than NV, and treated to analyses that showed they weren't getting nearly the advantage out of them that they should. This whole argument seems to boil down to R580 sucks because it isn't G71. Well, gee. You could turn that around, but that would be pretty silly too.

If you look at it from a consumer pov, you have X1900 right now is cheaper than G71 (tho G71 is early enough in its lifecycle to not be able to say for how long), performance competitive, and has feature advantages.

If you look at it from a stockholder perspective, ATI's non-chipset business margins are in their longtime corporate average (34-38%). NV has had a breakout upwards --bully for them if you are looking to buy stock, but don't miss the fact that NV is the one who changed the status quo there, not ATI. And if you go back and put your consumer hat on you could just as easily ask "hey, waitaminute, why is it that NV made a signficant cost improvement and didn't pass on any of that love to me?"
 
Why would they have needed to add more ROPs too? It's an honest question since I'm not aware if one can de-couple ROPs from quads on R5x0.

. . .and they'd still have been math-underpowered per pipe.

Not absolutely necessary either. You're just taking a scenario under consideration with a simple increase in quads. If they would had taken such an approach, they could have also increased the capabilities of the secondary subunit. That doesn't mean though that their current approach wasn't probably the cheapest and most sensible one given the aspects of the entire architecture.
 
Last edited by a moderator:
Ailuros said:
Why would they have needed to add more ROPs too? It's an honest question since I'm not aware if one can de-couple ROPs from quads on R5x0.

They don't seem to have that, Ail. We haven't seen it anywhere from them. This is part of what I mean by if you want to criticize, start with the family and its progenitor, R520. I'm certainly not saying they couldn't have decoupled rops for the R5xx family. . .I'm saying it seems unlikely you'd start something like that with R580.

Not absolutely necessary either. You're just taking a scenario under consideration with a simple increase in quads. If they would had taken such an approach, they could have also increased the capabilities of the secondary subunit. That doesn't mean though that their current approach wasn't probably the cheapest and most sensible one given the aspects of the entire architecture.

Hokay. So call it .5 ALU and two more quads (TMU, branching units, and ROPs). But having said that, I'd consider that just a die-size point. From an engineering effort pov, what do you suppose takes longer --slapping another full alu that you already have in inventory on there, or redesigning a unit to be more functional? Which is more reasonable to expect from a refresh? Sure, NV did it with G70, but even the name suggests they considered it more than a "refresh".
 
geo said:
They don't seem to have that, Ail. We haven't seen it anywhere from them. This is part of what I mean by if you want to criticize, start with the family and its progenitor, R520. I'm certainly not saying they couldn't have decoupled rops for the R5xx family. . .I'm saying it seems unlikely you'd start something like that with R580.

I'm just asking if it's possible, as there isn't a necessity for the time being for more than 16 ROPs.

Hokay. So call it .5 ALU and two more quads (TMU, branching units, and ROPs). But having said that, I'd consider that just a die-size point. From an engineering effort pov, what do you suppose takes longer --slapping another full alu that you already have in inventory on there, or redesigning a unit to be more functional? Which is more reasonable to expect from a refresh? Sure, NV did it with G70, but even the name suggests they considered it more than a "refresh".

I said already above that it probably was the most sensible and easiest decision for R580.

G70 is as much a refresh in my mind than anything else. Rivatuner states NV47 :p
 
Neeyik said:
Just to let people know you won't be getting any more replies from Gateway2, having just realised who that person actually is.

I was going to remonstrate with Trini for extending the recent rash of dual-accounts! :LOL:

I'm kidding. I think. Well, 51% I'm kidding. But if you say it was Trini, then I'm not. ;)
 
Ailuros said:
I'm just asking if it's possible, as there isn't a necessity for the time being for more than 16 ROPs.

Of course ATI would have to give the definitive answer there. We've already been assured from the NV camp that NV4x had it originally, even tho we didn't see it until 6600.

But wouldn't X1800GTO suggest pretty strongly the other way? Tho I suppose one might argue that X1800GTO is a marketing part, rather than a fall-out part, in which case it was crafted to hit a price/performance point rather than purely for technical "what can we get out of it" performance reasons.

Wouldn't the lack of clock domains in R5xx also suggest the ROPs aren't decoupled? "Well, geo, they don't have separate clock domains for the VS either --suggesting they aren't decoupled?"

Err, okay, so back to asking ATI guys. . .:LOL: In the mean time, I'll have to rely on the fact that we've seen R5xx in a healthy number of flavas by this point, and it seems reasonable to me to expect that if they had that in the toolbox they would have found somewhere in that collection of sku's to use it to their advantage.
 
DemoCoder said:
You asking me? Go read Anand's DVD codec comparisons complete with screenshots showing artifacts that occur. Even if they have improved in recent drivers, where's the evidence that video engine is "much more competent" than NVidias? Seems like a pretty bold statement to make, especially since they were playing catchup with NVidia's PureVideo codec. Would you care to explain the competency issue differences between ATI and NVidia's HW and SW video processing? (not you geo, but Tahir)

Hey Democoder..

I am going to make another thread with your posts and mine as a prelude - we can take it from there if you like.
This issue is definitely worth talking about in more detail.
 
geo said:
Of course ATI would have to give the definitive answer there. We've already been assured from the NV camp that NV4x had it originally, even tho we didn't see it until 6600.

But wouldn't X1800GTO suggest pretty strongly the other way? Tho I suppose one might argue that X1800GTO is a marketing part, rather than a fall-out part, in which case it was crafted to hit a price/performance point rather than purely for technical "what can we get out of it" performance reasons.

Wouldn't the lack of clock domains in R5xx also suggest the ROPs aren't decoupled? "Well, geo, they don't have separate clock domains for the VS either --suggesting they aren't decoupled?"

Err, okay, so back to asking ATI guys. . .:LOL: In the mean time, I'll have to rely on the fact that we've seen R5xx in a healthy number of flavas by this point, and it seems reasonable to me to expect that if they had that in the toolbox they would have found somewhere in that collection of sku's to use it to their advantage.


What have clock domains to do with ROPs? The 7800GTX ROPs run at default 430MHz, unlike the VS units.

The X1800GTO is a gap filler for high performance mainstream or lower end high end parts (depending on POV). What else would they use today to react against the 7600GT?
 
trinibwoy said:
Everytime this topic comes up, somebody asks this question. But that's too easy an escape. People are simply asking whether R580 would have been more impressive for its time with fewer ALU's and more TMU's. Given the ginormous die size it already has, maybe the trade-off would have resulted in an even bigger and impractical die size which is fine.
Well that's it. More TMUs would have taken a bigger chunk of the transistor budget than more ALUs. More TMUs might make a more balanced part in certain circumstances, but they transistor cost would have been disproportionaly high compared to ALUs, and what would have had to be cut from the transistor budget elsewhere.

It seems a lot of people don't really understand you can't just "add stuff" without losing stuff elsewhere.

trinibwoy said:
All speculative of course, but it seems to me that R580 will never be a "cheap" card to make and won't gracefully fall into the mainstream - it will just die a high-end death. I really do hope ATi is going to incorporate much of R580 into their USC from a financial standpoint.
Looks like it, particularly the memory controller (which is a big chunk of the transistor budget), and certainly the work they've done on SM3 for things like dynamic branching.

trinibwoy said:
The difference between NV40 and R580 is that NV40's checkbox advantages didnt cost Nvidia much transistor real estate - their recent boom was started with the NV40 and they were at a considerable die size disadvantage! Unfortunately for ATi, SM3.0 and SLI were marketable enough to make up the difference in sales - dynamic branching and ring-bus memory controllers are not :|
Yes, but how much value does "checkbox advantages" give us? Sure, they enable Nvidia to market a load of cards as "new & improved" so it's good for them, but for the customer these marketing checkboxes that give us unusable (and thus unused) features are actually a waste of transistor budget. "Stuff that we can market", as opposed to "stuff that works".
 
Bouncing Zabaglione Bros. said:
Yes, but how much value does "checkbox advantages" give us? Sure, they enable Nvidia to market a load of cards as "new & improved" so it's good for them, but for the customer these marketing checkboxes that give us unusable (and thus unused) features are actually a waste of transistor budget. "Stuff that we can market", as opposed to "stuff that works".

Stuff that can't be used (NV's dynamic branching) is wasted silicon, sure. Stuff that isn't used (ATI's dynamic branching) is also.

I'm still curious if R5xx owners will get to use those branching units via SM4 stuff compiled as SM3. Sure, that doesn't help todays marketing wars, nor the folks who buy a new card every 6 months. However, I still think that isn't the typical model for most people --I think most people keep their top end cards for a couple years.
 
geo said:
And if you go back and put your consumer hat on you could just as easily ask "hey, waitaminute, why is it that NV made a signficant cost improvement and didn't pass on any of that love to me?"

This is not how corporate world works. Nvidia answers to their stockholders and not to the consumer. Price cuts aren't made to satisfy the consumer unless there is a benefit to the stockholder. The same goes for ATI.

edited for grammar because i suck at it.
 
Mintmaster said:
R580 = same texturing power as R520 = 1.15-1.3X performance of R520
R580 = same pixel fillrate as R520 = 1.15-1.3X performance of R520
R580 = 10-15% more board cost = 1.15-1.3X performance of R520
G71 = 50% faster texturing than R520 = a bit slower performance than R580

And these numbers are improving all the time. Plus, you have an enormous marketing advantage over R520, which would have been toast compared to G71.

In the end, I'd say cost is all that matter. If nvidia is dominating in sales and coming close in performance with a part with significantly higher margins, then theirs is the better design.

ATI looks like its fallen back to the pre-R300 days (minus the fact that ati actually is faster than nvidia now and has decent drivers) where it focuses more on technology checkboxes than what is going to be actually utilized.

I really wanna see what console devs are going to be able to do with the Xbox 360's gpu though, I'd assume it is similar in shading-texturing balance?

Ironically the R580 wins against G71 in Fear by 9% in 1600 and by 14% in 2048 (always at default frequencies) in that FS link you posted

So would those be memory controller wins?

Let me ask you this: Why does the X1600XT hand the 6600GT its ass on a platter when it has half the texture units? Other games are closer, but that's precisely my point. Things are never as clear cut as you're making them.

Who cares if the x1600xt is drastically more efficient per tmu, it was priced against the x850xt when it came out.

BTW, how come cpus never have drivers to improve performance? Why don't cpus need drivers like video cards do? Well, I guess it could be because the cpu really is the platform, and the drivers are only needed to tell the cpu what to do with this additional hardware, but it would still be nice if a new driver could come out for a cpu and get a 20% increase in something.
 
Last edited by a moderator:
geo said:
Stuff that can't be used (NV's dynamic branching) is wasted silicon, sure. Stuff that isn't used (ATI's dynamic branching) is also.

I'm still curious if R5xx owners will get to use those branching units via SM4 stuff compiled as SM3. Sure, that doesn't help todays marketing wars, nor the folks who buy a new card every 6 months. However, I still think that isn't the typical model for most people --I think most people keep their top end cards for a couple years.

There's a difference between "stuff that can be used if you want" and "stuff that can't be used even if you want". I know as a customer what I prefer to have: the one that has the potential to be useful to me, not the one that is only useful to the vendor's marketing team.

We are going to see dynamic branching used because it makes life easy for devs, and down the line will mean that R580 has more longevity than G71. That's also valuable to me as a potential customer.

Are we just an untypical target market because we endeavour to see beyond the markting and to the facts of the matter?
 
Fox5 said:
In the end, I'd say cost is all that matter. If nvidia is dominating in sales and coming close in performance with a part with significantly higher margins, then theirs is the better design.
You're simply favouring the perspective of the business over the perspective of the consumer. In the long run ATI's design may prove significantly more futureproof, making it a better piece of engineering, and a better choice for the individual consumer.

ATI looks like its fallen back to the pre-R300 days (minus the fact that ati actually is faster than nvidia now and has decent drivers) where it focuses more on technology checkboxes than what is going to be actually utilized.
But that's why they delayed migration to SM 3.0, why they spent so many transistors on getting decent dynamic branching. It's Nvidia who used SM 3.0 support as a checkbox feature when the main benefit, dynamic branching, simply wasn't usable on their hardware.

And at the end of the day it's up to developers to utilise the features the hardware presents - you can't use a feature before it's present in the hardware! So they'll always be a period when new features aren't utilised - that doesn't mean they're just 'checkbox' features.
 
Subtlesnake said:
You're simply favouring the perspective of the business over the perspective of the consumer. In the long run ATI's design may prove significantly more futureproof, making it a better piece of engineering, and a better choice for the individual consumer.


But that's why they delayed migration to SM 3.0, why they spent so many transistors on getting decent dynamic branching. It's Nvidia who used SM 3.0 support as a checkbox feature when the main benefit, dynamic branching, simply wasn't usable on their hardware.

And at the end of the day it's up to developers to utilise the features the hardware presents - you can't use a feature before it's present in the hardware! So they'll always be a period when new features aren't utilised - that doesn't mean they're just 'checkbox' features.

And that's where ATI and nvidia are different. Nvidia has always, ever since their first chip, implemented features that weren't usable. It got the devs used to it, and by the time the software was out that was making use of it, so would the hardware. And at the same time, nvidia implemented these features for developers at no real expense to profit margins or the capabilities of the current hardware. Sure, the discerning consumer may decide "wait a minute, the nvidia card may offer better image quality with these settings turned on but it's unplayable! At playable settings, x brand card offers a better experience, even though it lacks unusable feature y!", but most see more features, and faster or comparable speed and ignore the "typical" settings performance. Still, it lets the developers play around with actual hardware as soon as possible.

For ATI, they're always fully implementing some feature, that comes at a significant cost to the rest of the chip's abilities, that very often does not end up being used at all, or if it is used it isn't until well after the card's usable lifetime when the rest of the card can no longer keep up with modern day requirements, even if one particular aspect of that card can.
ATI has a rather large list of properly implemented but dead end features. Even their less drastic choices, such as PS1.4 for the radeon 8500, proved worthless since by the time any games were out that used ps1.4, the 8500 was barely even able to play them. It could very well turn out to be the same with dynamic branching, where that one particular aspect of the card will hold up well, but it's texturing or some other abilities will keep it at sub 30 fps at 640x480. The nvidia card may be in a similar situation, perhaps even worse, but it was an immediate win for nvidia when it came out, and niether card will matter by the time the features are widely used. The radeon 9700 pro is the one exception to ati's history, and it can really be argued that design was conservative feature wise and thus why it fared better over time. (it gave top performance when it came out, and continued to outperform nvidia's cards later, and I can't think of any situations between two competing cards where the reverse has ever proven true)
 
It's not strictly-speaking true that DB is unusable in NV4x and G7x. It's found itself restricted to usage where multiple, similar but alternate, shaders can be consolidated into one. Here DB acts solely like "select case" or "switch case" to choose the required code amongst the variations that have been consolidated. Execution isn't meant to vary per-pixel (because performance would, generally, tank).

G7x's revised scheduling (each quad has its own program counter) means that its smaller batches of pixels (compared with NV4x) are more-likely capable of a performance-win.

Additionally, as I theorised recently, long and complex shaders that make use of a significant count of temporary registers (e.g. r0 to r7, for the sake of argument) will lead to each batch consisting of less pixels (the register file doesn't have the capacity to support the shader's register count for the full complement of pixels). The reduction in batch size effectively increases the granularity of DB, which theoretically increases the chances that per-pixel DB is a performance win.

It's also possible to theorise that NVidia may elect to perform some kind of shader replacement which artificially lowers the number of pixels in a batch, to help in situations where a shader uses DB extensively for performance (rather than as a switch).

All of these DB-enhancing behaviours necessitate a reduction in TMU throughput and increased texturing latency (since less pixels in a batch means that NV4x/G7x's fixed-length shader pipeline is running on a lot of bubbles (i.e. no operation). So performance is only actually going to be a win if the gains from DB outweigh the losses in texturing/general-throughput.

I'm hoping that hardware.fr or 3DCenter.org will tackle this subject empirically before G80 turns up - to see if these ideas have merit.

Jawed
 
Subtlesnake said:
You're simply favouring the perspective of the business over the perspective of the consumer. In the long run ATI's design may prove significantly more futureproof, making it a better piece of engineering, and a better choice for the individual consumer.

And this is why nV kills any competitor in the discrete graphics market.

Subtlesnake said:
But that's why they delayed migration to SM 3.0, why they spent so many transistors on getting decent dynamic branching. It's Nvidia who used SM 3.0 support as a checkbox feature when the main benefit, dynamic branching, simply wasn't usable on their hardware.

And at the end of the day it's up to developers to utilise the features the hardware presents - you can't use a feature before it's present in the hardware! So they'll always be a period when new features aren't utilised - that doesn't mean they're just 'checkbox' features.

Really think that the x1900 DB performance will hold up to shaders that require dynamic branching? These shaders are quite a bit longer and very expensive compared to shaders we are using now ;) . Even ATi's "improved dynamic branching" performance on the x1900's won't be usable when the time comes when these types of shaders are in heavy use.
 
Last edited by a moderator:
geo said:
I was going to remonstrate with Trini for extending the recent rash of dual-accounts! :LOL:

I'm kidding. I think. Well, 51% I'm kidding. But if you say it was Trini, then I'm not. ;)

Ummmm, wtf? I'm just a tad over a little insulted :???: Just because I have previously tried to make some of the same points that he did (in the Oblivion thread) it means I created a random new account with no reputation to say the same thing over again? :rolleyes: Based on that logic I guess I need to group a whole bunch of people on this board into one schizophrenic personality....
 
ondaedg said:
This is not how corporate world works. Nvidia answers to their stockholders and not to the consumer. Price cuts aren't made to satisfy the consumer unless there is a benefit to the stockholder. The same goes for ATI.

Yep, it's all about profit maximization. If two companies produce identical products, the one with the more efficient processing or cost-effective model isn't going to charitably pass that margin onto the consumer.
 
Back
Top