Anand talk R580

Somewhat OT, but regarding G70 - each group of 4 shader units works on a single quad each cycle, correct?

One way one might be able to improve dynamic branching performance is if one scanned a batch for "active quads" (those containing at least 1 pixel executing the current instruction in the batch) ahead of issue time, and at a rate greater than the issue rate (again, from what I understand,1 quad per cycle per 4 shader units).

To cope with memory latency, you'd still want to be able to handle multiple batches, but maybe one could get away with a much smaller number of threads (say 4 batches of 64 pixels per group of 4 shader units, or 2 batches of 128 pixels) and still get quite a decent speedup...
 
Chalnoth said:
Personally, I think it's unlikely that the a 90nm G7x will be able to be that fast. As such, nVidia will really need to be working on some significant core changes if they want their 90nm product to beat ATI's R580 across the board.

If nvidia thinks it can compete with the R580 with minor changes (adding more pipes) and clocking it higher, then they have already lost. If 8 pipes are added to G70's 24pipes it isn't going to affect its performance as much as 8 pipes added to a 16pipe R520. The fact that a 16pipe R520 is in the same ballpark as a 24 pipe G70 should send shivers up the spines of Nvidia's engineers.
 
Junkstyle said:
If nvidia thinks it can compete with the R580 with minor changes (adding more pipes) and clocking it higher, then they have already lost. If 8 pipes are added to G70's 24pipes it isn't going to affect its performance as much as 8 pipes added to a 16pipe R520. The fact that a 16pipe R520 is in the same ballpark as a 24 pipe G70 should send shivers up the spines of Nvidia's engineers.
More then that, it's not a 24 "pipe" part. 16-1-3-1/2: ROPs, texture units per ROP, shader units per ROP, Z/stencil multiplier per ROP.
 
Junkstyle said:
If nvidia thinks it can compete with the R580 with minor changes (adding more pipes) and clocking it higher, then they have already lost. If 8 pipes are added to G70's 24pipes it isn't going to affect its performance as much as 8 pipes added to a 16pipe R520. The fact that a 16pipe R520 is in the same ballpark as a 24 pipe G70 should send shivers up the spines of Nvidia's engineers.

Adding 8 pipes to the 6800 made the G70 GTX 256mb 100% faster than the 6800 in Shadermark tests. Bumping the clockrate by 27% made the GTX 512mb 25% faster in most shadermark tests. (Dave's article)

I look at it this way. The X1800XT has a 45% higher clock rate than the GTX. And the GTX has 50% more pipelines. They are roughly balanced and I see anything "amazing" about the R520's 16 pipelines at 45% higher clock hanging in the ballpark with the G70's 50% more pipelines, or vice versa.

If the R580 triples the number of quads, they will increase the shader workload they can handle from X to 3X. Likewise, if you take a G70 256Mb, add 33% more pipes (32 pipes), and 27% higher clocks, you're shader throughput will go from X to 2.53X (each additional G70 ALU is like 1.5 ALUs)

I don't see it being a slam dunk either way, but anyway, don't discount the G80.
 
Junkstyle said:
If nvidia thinks it can compete with the R580 with minor changes (adding more pipes) and clocking it higher, then they have already lost. If 8 pipes are added to G70's 24pipes it isn't going to affect its performance as much as 8 pipes added to a 16pipe R520. The fact that a 16pipe R520 is in the same ballpark as a 24 pipe G70 should send shivers up the spines of Nvidia's engineers.

That's too oversimplyfied for my taste since ATI's countering products are significantly clocked higher than the counterparts from NVIDIA. I can see 7800GT and X1800XL being more or less in the same performance ballpark.

Anything higher than that you have a widely available 256/7800GTX@430MHz being a slight notch lower than the X1800XT on average, and with still limited availability X1800XT@625MHz and the just announced 512/7800GTX@550MHz ending up a slight notch over the highest R520.

How you suddenly come to any preliminary conclusion with as little known right now about coming products is a bit beyond me. Nothing changed compared to current products and yes just a sterile calculation that affects only one part of the pipeline:

48 * 4 MADDs = 192 MADDs/clk * 0.675GHz = 130 GMADDs/clk = 260 GFLOPs

32 * 8 MADDs = 256 MADDs/clk * 0.6GHz = 154 GMADDs/clk = 307 GFLOPs

As you can see it comes down to final clockspeed for either/or and the presupposition would be that nothing will have changed for either/or ALUs. And yes that is a loopsided speculative MADD-only illustration, but I don't necessarily can find anything that signifies a disadvantage yet.

I personally doubt that NV will be able to reach 700MHz frequencies that easily (at least right from the start), yet if they should then they're far from being in trouble.
 
Last edited by a moderator:
DemoCoder, your math is only correct if each of the 32 "pipes" also has 1.5 times the throughput of an old g70 "pipe".
 
Ailuros said:
That's too oversimplyfied for my taste since ATI's countering products are significantly clocked higher than the counterparts from NVIDIA. I can see 7800GT and X1800XL being more or less in the same performance ballpark.

Anything higher than that you have a widely available 256/7800GTX@430MHz being a slight notch lower than the X1800XT on average, and with still limited availability X1800XT@625MHz and the just announced 512/7800GTX@550MHz ending up a slight notch over the highest R520.

How you suddenly come to any preliminary conclusion with as little known right now about coming products is a bit beyond me. Nothing changed compared to current products and yes just a sterile calculation that affects only one part of the pipeline:

48 * 4 MADDs = 192 MADDs/clk * 0.675GHz = 130 GMADDs/clk = 260 GFLOPs

32 * 8 MADDs = 256 MADDs/clk * 0.6GHz = 154 GMADDs/clk = 307 GFLOPs

As you can see it comes down to final clockspeed for either/or and the presupposition would be that nothing will have changed for either/or ALUs. And yes that is a loopsided speculative MADD-only illustration, but I don't necessarily can find anything that signifies a disadvantage yet.

I personally doubt that NV will be able to reach 700MHz frequencies that easily (at least right from the start), yet if they should then they're far from being in trouble.


ATI may depend on higher clocks but Nvidia is getting closer. Never mind the memory being 200MHz higher, but we're at a mere 75MHz spread for the about average 18% increase across the board in all titles (average) with the Ultra compared to the XT. Add to that Nvidia is operating a 2 TMU architecture in addition to the 24pixel pipelines, is it really a surprise? I think ATI is fairing well with the way they went to be honost. Had this card had met its early summer launch, i think we'd all be in aw of ATI. Its launched late however, but to me that doesnt diminish the fact of what ATI chose to do architectually, and how it stands up in todays games. ATI still does hold the lead in FEAR and SC:CT, so again, as titles become more shader limited, i wonder if thats any telling of the trends we will see.

Small note as well, just because im sure its been mentioned that there are reports TSMC is setting up for 80nm fab and that the R580, RV535, RV560 will be using it not too long after launch. Whats the R600 speculation on die size. Is 80nm the fair assumption for that core as well or is 65nm a possible reality this time next year for graphics cores?

http://www.xbitlabs.com/news/other/display/20051111233347.html
http://www.anandtech.com/video/showdoc.aspx?i=2596
 
Last edited by a moderator:
The question mark I have for the hypothetical 8 quad scenarios is what on God's green earth we'd need 32 TMUs exactly for; granted it's probably the easiest way to get there given the current layout of G7x, but I'm wondering if something like a 4-way TMU array capable of 16 tri-/32bilinear texels would make way more sense.
 
Ailuros said:


I will kindly point out that they did not disable the FEAR mis-optimization that ATI has running for their cards in that driver, for that particuliar game.

http://www.driverheaven.net/reviews/7800512dhreview/fear.htm

And to be honost, while i can be critisized for it, Nvidia charging 700 dollars for a card that performs 1 frame faster in SC with AA enabled (only tests i look at, you dont buy cards like that with no AA), is a loss. We cannot doubt the fact here that Nvidia is indeed taking advantage of the "performance" improvment by charging large premium price. Purely benchmark wise, hey we can argue in circles about how ATI can do with 1.1ns memory or a core clocked to about 700MHz. Faster cards will be coming soon.

I still see ATI with significant heading in shader limited titles, as well as those to come.
 
DemoCoder said:
Adding 8 pipes to the 6800 made the G70 GTX 256mb 100% faster than the 6800 in Shadermark tests. Bumping the clockrate by 27% made the GTX 512mb 25% faster in most shadermark tests. (Dave's article)

I look at it this way. The X1800XT has a 45% higher clock rate than the GTX. And the GTX has 50% more pipelines. They are roughly balanced and I see anything "amazing" about the R520's 16 pipelines at 45% higher clock hanging in the ballpark with the G70's 50% more pipelines, or vice versa.

If the R580 triples the number of quads, they will increase the shader workload they can handle from X to 3X. Likewise, if you take a G70 256Mb, add 33% more pipes (32 pipes), and 27% higher clocks, you're shader throughput will go from X to 2.53X (each additional G70 ALU is like 1.5 ALUs)

I don't see it being a slam dunk either way, but anyway, don't discount the G80.

Umm, do you think the 7800 is just a 6800 with 8 more pipes? You're crazy, there are many other core tweaks that go towards the shadermark tests.

What I personally find amazing is that even though the 7800GTX 512MB closes the clock gaps much closer and passes on the memory is that it didnt gain a whole lot of ground on the X1800XT. Sure its faster for the most part, but not that much faster. And when you compare the specs, its rather shocking, I personally would expect the 7800GTX 512MB to be bitch slapping the X1800XT, but that is most certainly not the case.

Also, do you think G80 will be out for the R580? I personally dont think so. I dont see Nvidia really having anything to counter the R580, and if ATi gets it out in time they'll have a good highend card in there hands. To bad I think it'll be to late, and with Vista coming soon (relatively) after then you have to wonder how soon both companies will launch their totally new cards.

I honestly see ATi in a very good position, if they hadnt of had delays they'd be kicking Nv's ass. But then again Nv made several very wise business choices that has put them in a good spot. From my viewpoint the ball is in ATi's court and if they play it right they could seriously make a move on Nv. Of course as of late ATi's shown their ball handling skills need some work......
 
i still speculate that every core Nvidia is launching between now and the flagship core they launch for Vista will be GXX codenamed, but that the true flagship vista core and fully DX10 compliant part will be labelled NV50. So if there is a G80, i wonder what it may be other then a further modification of current technology. My conspiracy theory :)
 
SugarCoat said:
i still speculate that every core Nvidia is launching between now and the flagship core they launch for Vista will be GXX codenamed, but that the true flagship vista core and fully DX10 compliant part will be labelled NV50. So if there is a G80, i wonder what it may be other then a further modification of current technology. My conspiracy theory :)

Is NV50 Nvidia's R400? Lol.
 
DemoCoder said:
Adding 8 pipes to the 6800 made the G70 GTX 256mb 100% faster than the 6800 in Shadermark tests. Bumping the clockrate by 27% made the GTX 512mb 25% faster in most shadermark tests. (Dave's article)
Nope, G70 pipes and NV40 pipes are almost exactly the same in Shadermark, per pipe per clock, on average. (I assume you mean 6800GT or 6800U)
Code:
              7800 GTX      6800 Ultra    % increase    ratio, PPPC
shader 2      1437          958           50.00%        0.930232558
shader 3      1196          778           53.70%        0.953348877
shader 4      1196          777           53.90%        0.954575838
shader 5      1077          698           54.30%        0.956886786
shader 6      1196          778           53.70%        0.953348877
shader 7      1077          658           63.70%        1.015056196
shader 8      778           419           85.70%        1.15150506
shader 9      1476          1075          37.30%        0.85148729
shader 10     1316          838           57.00%        0.973895025
shader 11     1196          718           66.60%        1.033015914
shader 12     718           479           49.90%        0.929585215
shader 13     703           421           67.00%        1.035555801
shader 14     778           479           62.40%        1.00726643
shader 15     539           329           63.80%        1.015998681
shader 16     568           359           58.20%        0.981192373
shader 17     777           444           75.00%        1.085271318
shader 18     89            56            58.90%        0.985603544
shader 19     297           178           66.90%        1.03475307
shader 20     99            64            54.70%        0.959302326
shader 21     95            92            3.30%         0.640377486
shader 22     238           120           98.30%        1.22997416
shader 23     279           133           109.80%       1.300926735
shader 24     180           82            119.50%       1.361315939
shader 25     174           99            75.80%        1.089969462
shader 26     166           94            76.60%        1.095167409
Data from Dave's 7800 preview.
Last column = (7800GTX score / 1.5 / 430) / (6800U score / 400)

Geometric mean of last column: 1.011

1% increase on average. Removing the shader 21 anomaly (DB shader, which makes no sense to me) makes it 3%.

In FEAR (with renamed exe), the X1600XT beats the 6800GS (and thus 6800GT) even without AA. 32-pipe G70 @ 700MHz will quadruple the 6800GT's score. R580 @ 590MHz should quadruple the X1600XT's score. In all likelihood, R580 will have the higher clocks, so you can ATI will win by around 30%.

Of course, there's the odd game where RV530 gets little over half the framerate of the 6800GS. Then 4x RV530 brings you right up to R520, i.e. all those math units are doing nothing. And this comparison has a large margin of error, because ATI usually pares down the HiZ and other optimizations for lower end parts.

Basically I'm just telling you what we all know. It all depends on the pixel shader workload. Obviously if the 90nm 32-pipe G70 has twice the texturing rate as R580, it will win sometimes.
 
TBH, after seeing some of these benchmarks I don't think an overclocked XT would be that far behind the Ultra, if at all.
 
Last edited by a moderator:
SugarCoat said:
I will kindly point out that they did not disable the FEAR mis-optimization that ATI has running for their cards in that driver, for that particuliar game.
What do you mean? Disabling Catalyst A.I will give better performance in this game?

Nevermind. I read the driverheaven review now.
 
Last edited by a moderator:
Mintmaster said:
Nope, G70 pipes and NV40 pipes are almost exactly the same in Shadermark, per pipe per clock, on average. (I assume you mean 6800GT or 6800U)

Nope what? Did you ever bother to read my message? I never said anything about per-pipe per-clock improvements.

Dave's article on the 512mb GTX http://www.beyond3d.com/previews/nvidia/78512/index.php?p=05#ps

GTX scaling is exactly according to theoretical. GTX is 100% faster than 6800U, and 512mb GTX is 25% faster than 256mb version, so adding pipes and increasing clocks produces results exactly as one would expect.
 
Mintmaster said:
More temporary registers are nice if you're coding in ASM, but with HLSL or GLSlang I don't see it being anything more than a minor convenience.
Er, bear in mind that you don't have easy access to external storage as in a CPU. So there is always the possibility of algorithms that simply require more registers to execute, and would otherwise need to be broken up into multiple passes (which HLSL and GLSL don't help with). More constant registers is probably more helpful for HLSL and GLSL than ASM programming, actually, as you can realistically make much longer programs.
 
Mintmaster: this calculation

(7800GTX score / 1.5 / 430) / (6800U score / 400)

isn't accurate (memory clock etc.). Look at this comparision.
 
Back
Top