The LAST R600 Rumours & Speculation Thread

Bouncing Zabaglione Bros. · Jan 20, 2007

zgemboandislic said:
That's what the majority thought about R520. It was also on a lower process node, ring bus etc. In the end, it turned out that it could only match G70.

I disagree there. R5x0 had better image quality at the same speed, and better speed at the same image quality. As usual, Nvidia marketing triumphed when it came to hiding issues like texture shimmering.

IbaneZ · Jan 20, 2007

Anarchist4000 said:
Could we be looking at another 1800->1900 style release.

Who knows. I'm sure AMD has a backup plan if the GF 8900 (?) kicks too much butt.

But I doubt we'll see another "R520 fiasco". Hopefully the R600 will pwn. We need two players in this game.

INKster · Jan 20, 2007

Bouncing Zabaglione Bros. said:
I disagree there. R5x0 had better image quality at the same speed, and better speed at the same image quality. As usual, Nvidia marketing triumphed when it came to hiding issues like texture shimmering.

Except this time the shimmering is gone and the image quality is (way) up on G8x, so ATI better come up with a pretty big reason for this late showing.

no-X · Jan 20, 2007

INKster said:
Except this time the shimmering is gone and the image quality is (way) up on G8x, so ATI better come up with a pretty big reason for this late showing.

Can you say at this moment, that R600 won't have better IQ than G80? (e.g. something better/faster than TrAA or an AA mode, which will be more compatible than CSAA?)

Razor1 · Jan 20, 2007

IQ is probably going to be equal on both sets of cards, but the r600 is late enough it has to compete with nV's refresh, we are talking about 5-6 months, marketshare swings pretty fast when OEM's, system builders don't have anything to sell against the competetion. We saw this with the g70. The r600 has to make a big impact to change perspective.

Also retail buyers will need to see the difference too. nV has had more time to tweak thier drivers, it might be very possible that the current g80 can compete well in the first few months of the r600 life time just because of this advantage, even if the r600 is capable of more with later driver revisions. All it takes is the first week or two of reviews. Very few people really look into newer reviews. First impressions are always the best. The r600 advantage of having xenos is good but will only help so much since its a closed system, alot less things that can go wrong, and developers will be more keen on getting the most out of the xbox 360 then they will be with a pc game since the pc is upgradable.

INKster · Jan 20, 2007

no-X said:
Can you say at this moment, that R600 won't have better IQ than G80? (e.g. something better/faster than TrAA or an AA mode, which will be more compatible than CSAA?)

Can you say it will ?
And if it does, will it even be visible enough in current and near future apps to justify the delay ?
I have a hard time telling between X1950 XTX and Geforce 8 IQ as it is, let alone something else.

Ailuros · Jan 20, 2007

INKster said:
Except this time the shimmering is gone and the image quality is (way) up on G8x, so ATI better come up with a pretty big reason for this late showing.

Besides the shimmering being mostly elliminated (you can never get completely rid of it if the game farts around with insane negative LOD values), here's a performance breakdown between high quality and quality:

http://www.hardtecs4u.com/reviews/2007/nvidia_g80_roundup/index8.php

Ailuros · Jan 20, 2007

no-X said:
Can you say at this moment, that R600 won't have better IQ than G80? (e.g. something better/faster than TrAA or an AA mode, which will be more compatible than CSAA?)

My guess is that it'll rather be in the nitpicking department than anything else, especially when it comes to AF. If I assume that AF quality is more or less the same on R600 as on R5x0, it would take eagle eyes to detect the slightly higher angle dependency on 45 degree angles on the latter. As for AA I'm willing to bet that they'll downplay coverage to being useless, since real 16xMSAA would be a pretty tall order despite the insane bandwidth and higher memory footprint.

I personally don't expect to be able to play on the G80 coming games like Crysis or any other future game in insane resolutions with a high AA sample density and that not because of the 86GB bandwidth or 768MB framebuffer but a bundle of bottlenecks, whereby I'd consider myself as lucky if I can reach 2xAA in 1600*1200 with all the ingame wizzbang enabled.

Nelsieus · Jan 20, 2007

IbaneZ said:
5-10% faster than G80 in DX10? If that's true, I'm not that impressed.

I usually don't chime in (not wanting to disturb the experts at their craft

), but shoud we perhaps consider a possible speed boost with G80 by activation of the "missing MUL?"

http://www.beyond3d.com/forum/showthread.php?t=37228

Not sure how much that would change the supposed R600 +5% - +10% lead, though.

Arun · Jan 20, 2007

You'd definitely expect a fair bit of driver-related potential in the G80, given that it's a new architecture and so on - the "missing MUL" is one example of something they could make better use of, since it's currently only used for perspective correction. The shader compiler in general, however, might get some boosts in the future - while G80's scalar nature makes it look easier for the compiler, there are some particularities (such as how the register file works...) that will benefit from some cleverness.

Another example of something they might get some boost from is the memory controller. It's basically 100% new, so there might be some room there based on how they partition things between the different memory banks etc. - if you remember the NV20, they had huge boosts over its lifetime which they attributed to the crossbar controller. It likely wouldn't be as big again, since they have more experience there, but you get the point.

R600 is also a new architecture though, but they've had working samples for their driver teams to toy with for a long time now - so hopefully their drivers will be quite mature and highly competitive when the part finally comes out. As I said, I clearly hope AMD is ready if NVIDIA manages 10%+ driver-based boosts while simultaneously releasing a 15% higher-clocked part. Both of those estimates feel quite conservative to me, so I'm sure AMD is taking that kind of risk clearly into consideration and will be prepared for it. We'll see!

Uttar

Razor1 · Jan 20, 2007

Uttar said:
R600 is also a new architecture though, but they've had working samples for their driver teams to toy with for a long time now - so hopefully their drivers will be quite mature and highly competitive when the part finally comes out. As I said, I clearly hope AMD is ready if NVIDIA manages 10%+ driver-based boosts while simultaneously releasing a 15% higher-clocked part. Both of those estimates feel quite conservative to me, so I'm sure AMD is taking that kind of risk clearly into consideration and will be prepared for it. We'll see!
Uttar

Well if the shader units are similiar to the xenos or r580, vec 4 would the R600 have similar performance boosts from drivers as the g80? The r600 will also have the ring bus and memory subsystems might be very similiar to that of the r5x0, so IMO we won't see a similiar potential performance boost in the r600 as would be possible for the g80.

The performance increase we saw with the r520 wasn't that great if I remember correctly, just went and looked around for driver benchmarks 5.1x cats were used for the intial benchmarks of the x1800 series. So there were some improvements but nothing out of the ordinary. The area I remember where there was substantial increase was when AA was activated in Doom or Quake, well Doom 3 engine in general.

http://www.tweaktown.com/articles/864/

Ailuros · Jan 20, 2007

If you'd skim through the results of the recent hardtecs4u review I linked to above, you'll notice that in some applications the performance drop for AA is probably too high compared to G7x f.e.

Granted they've got TSAA enabled for all tests, but have a look:

http://www.hardtecs4u.com/reviews/2007/nvidia_g80_roundup/index20.php

http://www.hardtecs4u.com/reviews/2007/nvidia_g80_roundup/index21.php

Look at the SS2 results; there's definitely quite a bit of headroom for improvements.

no-X · Jan 20, 2007

Razor1 said:
The performance increase we saw with the r520 wasn't that great if I remember correctly, just went and looked around for driver benchmarks 5.1x cats were used for the intial benchmarks of the x1800 series. So there were some improvements but nothing out of the ordinary. The area I remember where there was substantial increase was when AA was activated in Doom or Quake, well Doom 3 engine in general.

http://www.tweaktown.com/articles/864/

I usualy make some notes for each Catalyst release - and this releases boosted performance for X1xxx series:

- ring-bus patch - DooM3 engines + MSAA 4x up-to 35%
- 5.11 - includes rb-patch + further optimisations - D3 engine up-to 45% (compared to 5.10 w/o rb-patch)
- 5.12 - optimisations for dual-core (5-25% for some D3D games)
- 5.13 - further optimisations for DC
- 6.1 - performance optimisations for 3DM06
- 6.2 - slight performance improvements - D3D (3DMark05/06, F.E.A.R, FarCry, HL2), OpenGL (D3, Q4)
- 6.3 - COD2 5-10%, Q4 - a few percents
- 6.4 - minor (<5%) performance improvements in some D3D apps
- 6.5 - minor (<5%) performance improvements in some OpenGL apps
- 6.6 - nothing
- 6.7 - probably nothing (some people were reporting slight perf. improvement when using NF4 chipset)
- 6.8 - massive performance boost in OpenGL games (up-to 20%, shader compiler), slight perf. incr. in COD2 on X1800XT-256MB
- 6.9 - further improvements related to OpenGL
- 6.10 - nothing
- 6.11 - nothing
- 6.12 - CoH
- 7.1 - some people reporting performance increase in some D3D apps (Dark Messiah of MaM, Rainbow Six Vegas)

sorry for slight OT

chrisi · Jan 20, 2007

Jawed said:
Last night I realised a possible way to configure the register file in order to effect thread packing ...

Thanks for your nice drawings! However, what is the point of skewing the components across the banks?

Jawed said:
Each bank has one read port per operand.

You assumed that you can choose the address (row) per bank. Is this feasible?

Razor1 · Jan 20, 2007

no-X said:
I usualy make some notes for each Catalyst release - and this releases boosted performance for X1xxx series:

- ring-bus patch - DooM3 engines + MSAA 4x up-to 35%
- 5.11 - includes rb-patch + further optimisations - D3 engine up-to 45% (compared to 5.10 w/o rb-patch)
- 5.12 - optimisations for dual-core (5-25% for some D3D games)
- 5.13 - further optimisations for DC
- 6.1 - performance optimisations for 3DM06
- 6.2 - slight performance improvements - D3D (3DMark05/06, F.E.A.R, FarCry, HL2), OpenGL (D3, Q4)
- 6.3 - COD2 5-10%, Q4 - a few percents
- 6.4 - minor (<5%) performance improvements in some D3D apps
- 6.5 - minor (<5%) performance improvements in some OpenGL apps
- 6.6 - nothing
- 6.7 - probably nothing (some people were reporting slight perf. improvement when using NF4 chipset)
- 6.8 - massive performance boost in OpenGL games (up-to 20%, shader compiler), slight perf. incr. in COD2 on X1800XT-256MB
- 6.9 - further improvements related to OpenGL
- 6.10 - nothing
- 6.11 - nothing
- 6.12 - CoH
- 7.1 - some people reporting performance increase in some D3D apps (Dark Messiah of MaM, Rainbow Six Vegas)

sorry for slight OT

Those performance improvments weren't for all resolutions, look at the chart I showed you, and also they weren't all for the x1800xt either. Here is a link for the latest cat drivers

http://www.tweaktown.com/articles/1034/1/page_1_introduction/index.html

Dual Core optimization, Doom 3 stuff, memory controller optimizations, take all that out, I don't see any substantial performance increases as mentioned in the driver releases, and Ogl the Doom 3 engine did see quite a bit. The improvements you listed were not for every single graphics card, but specific graphics cards within the x1x00 family, Then you have your small tweaks, nothing accross the board ZOMG type stuff, that we saw with with the gf7 with the release of the x1x00's, or with the gf3's, or gf6's were there were performs drivers that did have a large boost overall.

This is an assumption, because we don't know what the ALU structure of the r600 is of course.

I also remember specifically that the shader compiler was a major issue that ATi made sure there wouldn't be too many problems or performance degredations with the new architecture. They made sure the new GPU wouldn't need a completely new compiler, and the old one would work well with it.

Dave Baumann · Jan 20, 2007

The shader compiler optimises specifically for the shader organisation of the graphics core; the organisation between R300 to R580 largely stayed the same, with a minor change in the PS core from R300 to R420.

Rangers · Jan 20, 2007

zgemboandislic said:
That's what the majority thought about R520. It was also on a lower process node, ring bus etc. In the end, it turned out that it could only match G70.

It was faster than G70, as has ATI been basically throughout.

Other than a short time where the paper product 7800GTX512 was top dog.

Jawed · Jan 21, 2007

chrisi said:
Thanks for your nice drawings! However, what is the point of skewing the components across the banks?

To provide the flexibility needed to read different combinations of data. This is all guesswork. It's part of my theory about improving ALU execution efficiency by re-ordering pixel scheduling depending on the number of components of an instruction.

e.g. in a simplistic 4-way SIMD ALU, MUL r1.rg, r0.rg, r2.rg wastes half the ALU. If you could schedule two pixels (each running the same instruction) through the same ALU, with the second pixel using the blue and alpha components (i.e. temporarily translating ".rg" into ".ba") then you waste nothing.

Additionally, as I've shown, this flexibility also supports packing threads to improve ALU utilisation for dynamic branching.

You can see my earlier discussion here (using 8-wide banks instead of 4-wide):

http://www.beyond3d.com/forum/showthread.php?p=900211#post900211

which assumes a fully symmetric 32-component ALU. i.e. special functions (like SIN or RSQ) are handled by all components of the ALU.

You assumed that you can choose the address (row) per bank. Is this feasible?

Ha, well what do I know about register file design? How does any register file support dual, triple etc. read ports?... How do you guarantee that all your read ports can always fetch the operands you require? What happens with co-issue operand fetching?

Trying to find much concrete stuff about register files in GPUs is extremely hard. One patent application I've got talks about implementing a register file (as an aside, not the main point of the patent) where every location exists twice - that's how dual-porting is "implemented" (it's a suggestion, nothing more). Now, that seems sorta unbelievable to me, actually loony.

So, in short, there's no way I can back any of this up. I'm hopeful that there'll be a discussion, that's all...

Jawed

KimB · Jan 21, 2007

Well, Jawed, the problem is that it may take too much time to search the instruction queue to pair up instructions for maximal dual-issue throughput. There's also the issue that it might play havoc with getting pixels ordered nicely for maximal texture cache hits.

Acert93 · Jan 21, 2007

IbaneZ said:
5-10% faster than G80 in DX10? If that's true, I'm not that impressed.

I wonder what DX10 products they are testing to arrive at this conclusion.

The LAST R600 Rumours & Speculation Thread

Bouncing Zabaglione Bros.

IbaneZ

INKster

no-X

Razor1

INKster

Ailuros

Epsilon plus three

Ailuros

Epsilon plus three

Nelsieus

Arun

Unknown.

Razor1

Ailuros

Epsilon plus three

no-X

chrisi

Razor1

Dave Baumann

Gamerscore Wh...

Rangers

Jawed

KimB

Acert93

Artist formerly known as Acert93

Similar threads