Next NV High-end

Dave Baumann said:
Curious, what do you think it should have "more of" as opposed to shader capability? If I were asked to point out one primary weak point of R520 as a chip I'd point right to the PS ALU's.
Well, since it's pretty much inevitable that the R580 be running at lower clockspeeds than the current R520, it would make sense to add a few more texture units to even things out, and fewer additional PS units (ex. 24 tex, 32 ps), but you'd need to disconnect the PS units from the texture units, as 24 tex, 48 ps would be unfeasible.
 
Chalnoth said:
Well, since it's pretty much inevitable that the R580 be running at lower clockspeeds than the current R520, it would make sense to add a few more texture units to even things out, and fewer additional PS units (ex. 24 tex, 32 ps), but you'd need to disconnect the PS units from the texture units, as 24 tex, 48 ps would be unfeasible.

I don't think there is anything inevitable about it, the extra experience with the fabrication process could help them. Its probably an optomistic view to say it will be as fast, but I think the clock speeds will still be high. (~600mhz)
 
AlphaWolf said:
I don't think there is anything inevitable about it, the extra experience with the fabrication process could help them. Its probably an optomistic view to say it will be as fast, but I think the clock speeds will still be high. (~600mhz)
But adding the new PS units will inevitably increase power consumption (at the same clock speed). Since the R520 is already bumping up against the realistic limit of power consumption for consumer hardware, ATI will have to lower the clock speed of the core to compensate.

With really good cooling, I would expect the R580 to be able to clock just about as high as the R520, but not with normal cooling.
 
Chalnoth said:
Well, since it's pretty much inevitable that the R580 be running at lower clockspeeds than the current R520, it would make sense to add a few more texture units to even things out, and fewer additional PS units (ex. 24 tex, 32 ps), but you'd need to disconnect the PS units from the texture units, as 24 tex, 48 ps would be unfeasible.

I've no idea where R580 will be clocked, but I doubt it'll end up with less than 600MHz and that's even a conservative estimate from my behalf. As for the rest, have a closer look on RV530. The texture to ALU unit relation in RV530, R580 and Xenos is 1:3.
 
Actually, the ALU to texture ratio will be different on those desktop parts to Xenos as Xenos's ALU's will be dealing with VS work (which is less likely to be using texturing as often) and while Xenos's ALU's are Vec4 + Scalar, the desktop parts have the additional ADD and Modifiers.

Its difficult to gauge how texture limited we are with current titles at the moment - looking at shader benchamrks only PS1.1/1.4 benchmark vary that much with texture rate and we also have texture bandwidth to take into account.
 
Actually, the ALU to texture ratio will be different on those desktop parts to Xenos as Xenos's ALU's will be dealing with VS work (which is less likely to be using texturing as often) and while Xenos's ALU's are Vec4 + Scalar, the desktop parts have the additional ADD and Modifiers.

No doubt about that; yet from a vastly oversimplified POV Xenos also keeps the 3:1 (ALU<-->TMU) relation.

Speaking of texture bandwidth, what puzzles me with R520 and it's 48GBs of raw bandwidth are it's performance numbers in ultra high resolutions. The most simple explanation would be that there are still some "bubbles" in the driver that haven't been detected/removed yet.
 
Ailuros said:
I've no idea where R580 will be clocked, but I doubt it'll end up with less than 600MHz and that's even a conservative estimate from my behalf. As for the rest, have a closer look on RV530. The texture to ALU unit relation in RV530, R580 and Xenos is 1:3.
Well, I don't doubt that the R580 will have a 1:3 tex/ALU ratio, but there's no way it's going to clock so close to the R520.
 
Chalnoth said:
Well, I don't doubt that the R580 will have a 1:3 tex/ALU ratio, but there's no way it's going to clock so close to the R520.

I don't see why it should be impossible; and no R520 isn't necessarily a valid comparison either.
 
It's still being made on the same process, so power consumption constraints won't allow it to run at the same clockspeeds. You can't simply increase the transistor counts by 33%-50% with dense logic, on the same process, and expect to keep the clockspeed the same, not unless you want to deal with 33%-50% more heat from the chip. There's no way that an R580 will run at the same clockspeed as an R520 without significantly better cooling, which I don't think will happen for a retail part.
 
Chalnoth said:
It's still being made on the same process, so power consumption constraints won't allow it to run at the same clockspeeds. You can't simply increase the transistor counts by 33%-50% with dense logic, on the same process, and expect to keep the clockspeed the same, not unless you want to deal with 33%-50% more heat from the chip. There's no way that an R580 will run at the same clockspeed as an R520 without significantly better cooling, which I don't think will happen for a retail part.

How in the world did you get to that estimation? You're putting forward 430-480M transistors as your estimation of R580?
 
I'd rather think that ATI was initially aiming for even higher clockspeeds for R520. A part that went through multiple re-spins in order to reach the closest possible frequency to the initial target, is hardly an indication to go by for followup parts. Especially since there aren't any soft ground related rumours floating around considering R580.

Way too many made the mistake and thought that it would be impossible for NV to squeeze 6 quads into 110nm. Not only did they manage it (and yes of course is it a smaller process than 130nm), but it is also clocked higher than NV40 and consumes less power. Ironically I thought myself in the past that it's impossible and that mostly because I wasn't aware of some highly important details. I wasn't expecting 302M transistors for that one either.
 
geo said:
How in the world did you get to that estimation? You're putting forward 430-480M transistors as your estimation of R580?

I'd guess more around 25% more transistors, but that's besides the point.
 
Ailuros said:
Way too many made the mistake and thought that it would be impossible for NV to squeeze 6 quads into 110nm.
Except the G70 was also on a smaller process than the NV40, and clocked only very slightly higher.
 
Chalnoth said:
It's still being made on the same process, so power consumption constraints won't allow it to run at the same clockspeeds. You can't simply increase the transistor counts by 33%-50% with dense logic, on the same process, and expect to keep the clockspeed the same, not unless you want to deal with 33%-50% more heat from the chip. There's no way that an R580 will run at the same clockspeed as an R520 without significantly better cooling, which I don't think will happen for a retail part.

If you told us that R580 "may" not hit the same clocks as R520, a lot of people could´ve lived with it. However, the way you compare A vs. B based solely on transistor counts and draw conclusions from that should tell you, that you´re far and away off real process design and execution.

R580 is not just R520 with added complexity, R520 is not running at it´s peak speeds, either. Second, it´s on the same process, yes, but the process itself matures over time, which is another variable you have to throw in your "assumption". If you think about the hints there have been about ATi´s mobile parts which will be based on R580 rather than R520, it should at least tell you that there is more to it than just complexity.

Also, the cooling solution used for the R520 XT SKU is a lot more capable as you might think, actually i think that it´s one of the best cooling solutions delivered to date, if you take into account it´s heat dissipation ability.

Finally, let me again emphasize that you can´t draw conclusions like you just did about clock speeds in general, because there is a lot more to it than just adding 1+1 and getting the result, chip design is not just about plain mathematics, it´s like "expecting" the unexpected.
 
Last edited by a moderator:
Actually, Even "the same process" isn't necessarily the case - outwardly R300, R350, R360 were all "the same" 150nm process but people may recall sirerics listing of the tweaks made to those processes overtime that allowed for the increases in clock. Given the issues ATI have had with their 90nm adoption (not specifically with the processes, but surrounding issues) and the proximity of R580 to the final R520 in development I doubt that this will play a part with R580, but "the same process" doesn't necessarily tell everything - I'd say that G70 probably uses a different mix of 110nm parameters and tweaks to the 110nm process ATI first used with RV370.
 
Dave Baumann said:
Actually, Even "the same process" isn't necessarily the case - outwardly R300, R350, R360 were all "the same" 150nm process but people may recall sirerics listing of the tweaks made to those processes overtime that allowed for the increases in clock. Given the issues ATI have had with their 90nm adoption (not specifically with the processes, but surrounding issues) and the proximity of R580 to the final R520 in development I doubt that this will play a part with R580, but "the same process" doesn't necessarily tell everything - I'd say that G70 probably uses a different mix of 110nm parameters and tweaks to the 110nm process ATI first used with RV370.

Dave obviously got what i was hinting at.

R300/R350/R360 used 2 different process targets back then, even if it was considered to be the same process. With R300 they started with TSMC´s high performance node (which was delay optimized) and continued with an ultra high-speed copper process target, used for R350/R360. They hadn´t even come close to those speeds if the later hadn´t been that mature, but 150nm improved a lot over time.
 
Last edited by a moderator:
  • Like
Reactions: Geo
Dave Baumann said:
Actually, Even "the same process" isn't necessarily the case - outwardly R300, R350, R360 were all "the same" 150nm process but people may recall sirerics listing of the tweaks made to those processes overtime that allowed for the increases in clock. Given the issues ATI have had with their 90nm adoption (not specifically with the processes, but surrounding issues) and the proximity of R580 to the final R520 in development I doubt that this will play a part with R580, but "the same process" doesn't necessarily tell everything - I'd say that G70 probably uses a different mix of 110nm parameters and tweaks to the 110nm process ATI first used with RV370.
Yeah. I just highly doubt that there will be enough efficiency gain in the process for the clockspeed not to drop.
 
Chalnoth said:
Yeah. I just highly doubt that there will be enough efficiency gain in the process for the clockspeed not to drop.

woudln't that all depend on if the xt's clockspeeds are artificaly locked at these speeds by ati so they can release a faster part or if these clockspeeds are the best they could obtain with the highest yields ?

I don't see why 6 months from now when the r580 comes out they can't clock the same or higher with another set of quads . They would have had 6 months or so of tweaking the process towards the cards and made respins to get thermal output lower or power consumption lower
 
Back
Top