Next NV High-end

I don't think yields are the problem, though. Power consumption is. It's not a question of whether or not the signal is getting through all of the pipelines in time for the next clock cycle to start, it's a question of how much power the higher clock speed requires.
 
Ailuros said:
I'd rather think that ATI was initially aiming for even higher clockspeeds for R520. A part that went through multiple re-spins in order to reach the closest possible frequency to the initial target, is hardly an indication to go by for followup parts. Especially since there aren't any soft ground related rumours floating around considering R580.
I don't know whether ATI was aiming higher, but I still remember their frustration when they had the softground issue and due to clocks their XT part didn't even surpass the GTX. But you are indeed right, the R520 is no indication for future parts.
 
Chalnoth said:
Except the G70 was also on a smaller process than the NV40, and clocked only very slightly higher.

There's no use for selective quoting, when I in fact mentioned it myself in my own post.

Yeah. I just highly doubt that there will be enough efficiency gain in the process for the clockspeed not to drop.

Where's your guarantee that low-k 90nm can only tolerate a specific frequency A with chip complexity B in any case? Part of the reason why I don't agree here is what I can sense will come from NVIDIA's side both in terms of chip complexity and frequency in low-k 90nm.

***edit: and to further elaborate on that one, while transistor count is more or less in the same ballpark between R520(low-k 90nm) and G70(110nm), both coming high end refreshes of those two are most likely going to use low-k 90nm. What won't change there however is NVIDIA using lower frequencies and higher amounts of units in contrast to ATI which uses higher frequencies and lower amounts of units (where I wouldn't be surprised in the least to see again transistor counts being fairly close between G7x/R580). That IS vastly oversimplyfied, but it should be good enough to get the message across.
 
Last edited by a moderator:
Chalnoth said:
I don't think yields are the problem, though. Power consumption is. It's not a question of whether or not the signal is getting through all of the pipelines in time for the next clock cycle to start, it's a question of how much power the higher clock speed requires.

Has anybody downclocked XT memory to see what we see on power consumption of GPU vs memory on XT? Or the 256mb XT vs the 512mb XT might be instructive on that point as well.
 
Ailuros said:
What won't change there however is NVIDIA using lower frequencies and higher amounts of units in contrast to ATI which uses higher frequencies and lower amounts of units (where I wouldn't be surprised in the least to see again transistor counts being fairly close between G7x/R580).

It won't change? Admittedly I'm easily confounded by the technical jargon, but hasn't it been confirmed that R580 will triple those units to 48 (three shader processors per ROP)? Dave was recently alluding to how a higher shader processor count and lower clocks would allow for improved thermal characteristics of a mobile R580 relative to R520 (which apparently won't have a mobile version).
 
If you assume 90nm G7x has more than 6 quads (and the CW seems to have its money firmly on this proposition), then it still remains true that it has more units and lower frequencies (probably) than R580.

If I've digested this correctly, "apples to apples" R580 vs G70 is 48 vs 48 on ALUs
 
Hehe....my enzymes are slow when it comes to hardcore GPU architecture, so I'm still digesting.

From what we learned in Dave's R520 review, R580 should have 16 ROPs, 16 texture units, 48 shader units, and 16 of that Z/Stencil stuff :)oops:).

Excuse the oversimplification, but what would the corresponding "higher unit" expectation be for the 90nm G70 refresh? And let's keep those frying pans in check, s'il vous plait. :LOL:
 
Last edited by a moderator:
kemosabe said:
Hehe....my enzymes are slow when it comes to hardcore GPU architecture, so I'm still digesting.

From what we learned in Dave's R520 review, R580 should have 16 ROPs, 16 texture units, 48 shader units, and 16 of that Z/Stencil stuff :)oops:).

Excuse the oversimplification, but what would the corresponding "higher unit" expectation be for the 90nm G70 refresh? And let's keep those frying pans in check, s'il vous plait. :LOL:

. . and 8 vs (I think --tho it hasn't been explicitly addressed).

If you assume 8 ps quads (the CW's bet at the moment) for 90nm high-end NV G7x, my understanding (note danger words) at the moment is that it would be:

16 ROPs, 32 texture units, 64 shader units, and 8vs. I *think* "the Z/stencil stuff" doesn't have number of unit implications per se, but may have (probably has?) implications for the number of transistors in individual units (ROPs?).

Edit: And, come to think of it, it seems to me I may heard some speculation --but not yet rising to the level of joining the CW's considered opinion on the matter-- that 10 vs could happen, if NV is tired of being punked by pure clock-rate there. But then they've apparently started clocking the vs separately, so mebbee not, or maybe it depends on just how big they see the clock deficit being.
 
Last edited by a moderator:
I think nV is now forced to do something about HDR+AA, though I have no idea if it will be something HW-related or rather some dirty hack. That might consume a bunch of trannies as well as force them into some architectural decisions we won't be expecting IMHO.

At least I'd like to see a show like that :)
 
Geo, please bear with me as I try to understand the ramifications of these increases in shader units. If rumoured specs are accurate, ATI would be going from 16 units (R520) to 48 units (R580) while NVDA would be going from 48 units (G70) to 64 units (G7X)? Would the higher core clock of R520 alone account for the roughly equivalent performance in shader-intensive games despite the far lower shader unit count? And considering the relative increases in shader processors slated for each refresh (assuming just for the sake of argument that all else remains pretty much the same including core clocks), would that suggest that R580 would stand to achieve a relatively higher performance boost over R520 than G7X would over G70?
 
Last edited by a moderator:
_xxx_ said:
I think nV is now forced to do something about HDR+AA, though I have no idea if it will be something HW-related or rather some dirty hack. That might consume a bunch of trannies as well as force them into some architectural decisions we won't be expecting IMHO.

At least I'd like to see a show like that :)

Well, I would think that you'd have to wait until the G80 refresh (ie the refresh of G80) at the very earliest for a response to HDR + AA in hardware from NV. I would think that G80 is utterly set in stone by now. Of course, there's a good chance that G80 was always going to support HDR + AA, so we'll likely never know exactly how it all happened.

It'll be interesting to see what NV releases this year. There will definitely be a 512MB 7800, but will it have a much higher core clock? If you use NV40 as a reference point, you can see that NV haven't been too worried about achieve full high-end bragging rights so long as their product is competitive, and 7800 is certainly that, especially if you take pricing into consideration. My feeling is that there will be a 512MB 7800 with higher core and memory speeds, but that it will still be called the GTX. I reckon this because I'm not sure that NV can produce a G70 based board that will conclusively take the performance lead and I don't think they'll release an 'ultra' (or whatever they want to call it) unless they can do just that. Better to release a slightly quicker GTX core with 512MB of much faster memory and not get into a pissing contest with ATI (who have proven with the X800 XT PE that they are quite willing to put out a product essentially incompatible with their production yields just to get the PR performance lead).

Oh, and for the record, my expectation is that R580 will be clocked as high or higher than R520. So there.
 
Last edited by a moderator:
geo said:
If you assume 90nm G7x has more than 6 quads (and the CW seems to have its money firmly on this proposition), then it still remains true that it has more units and lower frequencies (probably) than R580.

If I've digested this correctly, "apples to apples" R580 vs G70 is 48 vs 48 on ALUs
CW? :???:
 
geo said:
If I've digested this correctly, "apples to apples" R580 vs G70 is 48 vs 48 on ALUs

Well not really.

Yes they both could do 48 MADs per clock, but shaders aren't only about that.
Throw in some ADDs and Texture fetches with AF and G70 will have a serious per clock deficit.
 
Hyp-X said:
Well not really.

Yes they both could do 48 MADs per clock, but shaders aren't only about that.
Throw in some ADDs and Texture fetches with AF and G70 will have a serious per clock deficit.
However, G70 can do more texture fetches per clock, at the cost of MADs.
 
kemosabe said:
It won't change? Admittedly I'm easily confounded by the technical jargon, but hasn't it been confirmed that R580 will triple those units to 48 (three shader processors per ROP)? Dave was recently alluding to how a higher shader processor count and lower clocks would allow for improved thermal characteristics of a mobile R580 relative to R520 (which apparently won't have a mobile version).

If you would have read the thread further up you would have seen that I mentioned an expected 3:1 TMU/ALU ratio on R580; however R580 will still have 16 SIMD channels and that's what I was referring to.
 
kemosabe said:
Geo, please bear with me as I try to understand the ramifications of these increases in shader units. If rumoured specs are accurate, ATI would be going from 16 units (R520) to 48 units (R580) while NVDA would be going from 48 units (G70) to 64 units (G7X)?

Counting ALUs might confuse even more, because it's more important what each unit is capable of.

R520 = 16 * 4 MADDs = 64 MADDs/clk
G70 = 24 * 8 MADDs = 192 MADDs/clk

Now if R580 and R520 ALUs will have the same capabilities, then it might be:

R580 = 48 * 4 MADDs = 192 MADDs/clk

***edit: those are without clock frequency taken under account obviously.


Would the higher core clock of R520 alone account for the roughly equivalent performance in shader-intensive games despite the far lower shader unit count? And considering the relative increases in shader processors slated for each refresh (assuming just for the sake of argument that all else remains pretty much the same including core clocks), would that suggest that R580 would stand to achieve a relatively higher performance boost over R520 than G7X would over G70?

One of the most important differences between the two competing architectures is that Radeons have texture and ALU ops de-coupled.

While there is a patent from NVIDIA that illustrates a quite similar technique, it's still unknown when and where it'll be incorporated in their GPUs.

The answer to your last question is IMHO no ;)

geo said:
If you assume 90nm G7x has more than 6 quads (and the CW seems to have its money firmly on this proposition), then it still remains true that it has more units and lower frequencies (probably) than R580.

If I've digested this correctly, "apples to apples" R580 vs G70 is 48 vs 48 on ALUs

R580 will have a frequency advantage though against G70.

Theoretical example:

192 MADDs/clk * 0.6 GHz = 115.2 GMADDs/clk = 230.4 GFLOPs

192 MADDs/clk * 0.43 GHz = 82.56 GMADDs/clk = 165.12 GFLOPs
 
Last edited by a moderator:
  • Like
Reactions: Geo
I do have to mention that I think that similar to a 16 tex, 48 ALU architecture, I think that a 32 tex, 64 ALU architecture would be similarly unbalanced (given memory bandwidth constraints).

If nVidia were to keep 24 texture units with the G75 and somehow beef up the ALU's, it might make for a better usage of transistors, though I do question how well they could make use of additional ALU power without decoupling the pipelines, so it may make most sense to go with just adding a couple more quads to the G70 for the 90nm incarnation.
 
Back
Top