ATi 2900 XT 1GB DDR4 for pre-order...

Shtal · Jul 14, 2007

Morgoth the Dark Enemy said:
I still don't understand large parts of it, but I from what I gather I can safely say that you`re wrong about a number of things, and it`s probably pointless to highlight them as the language barrier may be a tough one to break. Ask yourself this:which would be the inherent reason for a huge performance jump in DX10 for the R600, allowing it to surpass the G80(asides from the "NV30 sucked at DX9" party line).What are the huge differences that make the jump from DX9 code to DX10 code an occasion for R600 muscle flexing?What makes you think that the 320 stream processors(omg, big number) are underutilized ATM?

The problem is how ATI adverting HD-2900XT, there is not actually 320 stream processors on that chip, there is only 64 real processors, but each is cable of 5 operations per shader clock. The 320 individual stream processing units in R600 are arranged in 4 groups of 80 SIMD arrays and each functional unit is arranged as a 5-way superscalar shader processor. First; most of the stream processors are simpler and aren't capable of special function operations. For every block of five stream processors, only one can handle either a special function operation or a regular floating point operation. The special function stream processor is also the only one able to handle integer multiply, while others can perform simpler integer operations. This means is that each of the five stream processors in a block must run instructions from one thread.
Although the unified shader concept is similar between the two cores, the way they go about presenting this functionality is a bit different. (Whereas the G80 has 128 aptly-named Unified Shaders), the R600 has 320 Stream Processors. Clearly 320 is a bigger number than 128, but as we know in the hardware world, bigger numbers donâ€™t always mean something is better. The fact of the matter is that Stream Processors are different than Unified Shaders. ATIâ€™s Stream Processors are an integral part of the Superscalar architecture implemented on the R600. Those 320 processors on the R600, but some of them are standard ALUâ€™s and some of them are special-function ALUâ€™s.
In contrast, NVIDIA's G80 has up to 8 groups of 16 (128 total) fully generalized, fully decoupled, scalar, stream processors, but keep in mind the SPs in G80 run in a separate domain and can be clocked as high as 1.5GHz. In ATI's R600, each functional SP unit can handle 5 scalar floating point MAD instructions per clock. And one of the five shader processors can also handle transcendental as well. In each shader processor, there is also a branch execution unit that handles flow control and conditional operations and a number of general purpose registers to store input data, temporary values, and output data. Simply R600 still has a chance, in conclusion we have to wait upon driverâ€™s updates from AMD and also how DX10 path code handles R600 architectureâ€. I could be wrong about R600 future; R600 is still someway/some point a failure desigh, "I will accept that"

Rangers · Jul 14, 2007

Utilizing shaders is simply not R600's problem.

Everything around the shaders (TMU's and ROP's, specifically lack of MSAA resolve) is the problem.

You're barking up the wrong tree.

R600 can (and does) crush G80 anywhere raw shader power is the issue. The problem is they're rarely the issue on R600.

pjbliverpool · Jul 14, 2007

Rangers said:
R600 can (and does) crush G80 anywhere raw shader power is the issue. The problem is they're rarely the issue on R600.

Its not quite as bad as it first seems. The Ultra actually has over 80% of the XT's MADD capability and when you take into account the more efficient utilisation of that power due to the purely scalar design, it should be pretty close. And thats obviously completely discounting the MUL which would if used take G80's shader power well beyond R600.

I think its clear that ATI were shooting for a DX10 focussed GPU with the shaders and the bandwidth and thought that the DX9 boost R600 would bring would be good enough to compete with whatever nvidia came up with. They just didn't expect G80 to be such a DX9 monster and probably didn't expect the first lot of DX10 games (DX9 with DX10 tacked on effectively) to be so reliant of the DX9 performance characteristics. Perhaps they expected DX10 to worm its way into the market much faster (much, much faster considering when R600 was supposed to be released).....

AlexV · Jul 14, 2007

Shtal said:
The problem is how ATI adverting HD-2900XT, there is not actually 320 stream processors on that chip, there is only 64 real processors, but each is cable of 5 operations per shader clock. The 320 individual stream processing units in R600 are arranged in 4 groups of 80 SIMD arrays and each functional unit is arranged as a 5-way superscalar shader processor. First; most of the stream processors are simpler and aren't capable of special function operations. For every block of five stream processors, only one can handle either a special function operation or a regular floating point operation. The special function stream processor is also the only one able to handle integer multiply, while others can perform simpler integer operations. This means is that each of the five stream processors in a block must run instructions from one thread.
Although the unified shader concept is similar between the two cores, the way they go about presenting this functionality is a bit different. (Whereas the G80 has 128 aptly-named Unified Shaders), the R600 has 320 Stream Processors. Clearly 320 is a bigger number than 128, but as we know in the hardware world, bigger numbers donâ€™t always mean something is better. The fact of the matter is that Stream Processors are different than Unified Shaders. ATIâ€™s Stream Processors are an integral part of the Superscalar architecture implemented on the R600. Those 320 processors on the R600, but some of them are standard ALUâ€™s and some of them are special-function ALUâ€™s.
In contrast, NVIDIA's G80 has up to 8 groups of 16 (128 total) fully generalized, fully decoupled, scalar, stream processors, but keep in mind the SPs in G80 run in a separate domain and can be clocked as high as 1.5GHz. In ATI's R600, each functional SP unit can handle 5 scalar floating point MAD instructions per clock. And one of the five shader processors can also handle transcendental as well. In each shader processor, there is also a branch execution unit that handles flow control and conditional operations and a number of general purpose registers to store input data, temporary values, and output data. Simply R600 still has a chance, in conclusion we have to wait upon driverâ€™s updates from AMD and also how DX10 path code handles R600 architectureâ€. I could be wrong about R600 future; R600 is still someway/some point a failure desigh, "I will accept that"

Thanks for schooling me on the R600...I was thinking it was totally something else and thus asking silly questions like what makes you think that DX9 and DX10 are so fundamentally different so that wing-spreading for the R6xx architecture may occur in DX10

.

All I can say is to start re-reading the many many reviews that you obviously read, then chug an eye over the numerous DX10 articles that have appeared on the interweb, then sit-back, have some ice-cream, sip a nice beverage of choice, and meditate. It will hit you

X9 and DX10 behaviour will be quite similar(if card X sucks in DX9 it'll suck in DX10, and if it`s good ad DX9 it`ll be quite good at DX10 in normal conditions as well). Of course contorted cases can be built, where emphasis falls on one feature and thus shows huge deltas(as strengths tend to vary between architectures/IHVs), but once you patch the many individual-cases into the big picture, those become quite irrelevant(and no, DX10 does not equal solely GS as...umm...some would like the consumer-base to think, nor do I think that the GS will be used extensively in near-term, relevant to current GPUs, architectures).

The R600 doesn`t suck at math. Their compiler will probably become better, but in terms of math the R600 is strong. The 320 stream processors are...umm...fully utilized. The think is, it tends to suck(ok, this is harsh, make it be rather mediocre) at many other things that ARE required for producing high-quality graphics:texturing, AA, AF(due to lacking texturing strength). The future is not quite as you and some guys on Rage3D want to envision it:math, math, math, procedurally generated textures etc. Math load is increasing, but so is texturing load, and the ratios ATi likes to give are not the all-telling story that some make them up to be, there's a bit more balance between math and texturing. ATi would have to work far closer with devs(think the son of the devil TWIMTBP type of program

) in order to push their stuff more and make them cater to the R600s primary strength of math math math. They currently seem unable to make one DX10 title work out of the box on their HW without hotfixing/waiting for drivers(Lost Planet-botched for X64 users, as they're a minority and don`t deserve a hotfix, the new World in Conflict Beta-botched in DX10, COJ had those image quality issues with the early version, COH gets the driver stuck in an infinite loop now and then, but let`s cut some slack for the last two as those tend to work most of the time). So I`m not holding my breath...perhaps you shouldn't either.

Anon Lamer · Jul 14, 2007

Isnt it much easier to make texture content than describing the game world in formulas? I dont think the R600 shader units are underutilized, the r600 superscalar design is an outgrowth of the R5xx and previous designs so there should be enough knowledge to get a good load going. Its just that Nvidia made a much better and balanced design than ATI. IMHO Its like the Nivida 4xxx series vs the 8500 again - ATI is not that hot but has geometry interpolation, Nivida has the FPS but not as many geek features.

It may be that ATI tried to make one design to fit both the GPGPU market and the GFX market, the replacment of dedicated msaa circuitry with shader calculations indicate that. The GPGPU market is certainly not interested in AF or AA or anything else than shader power. But it seems that Nivida are eating their lunch here too according to reports on this board. Poor suckers. Perhaps it would be better to make a Rvxxx variant with 640 shader processors and not unbalance their main design.

If their next design is 16 TMUs and 80+ shader cpus... then I think AMD should partition and sell off everything but the ATI chipset division since ATI is clearly a lost cause. AMD could just as well have bought SIS instead if they wanted chipsets and some low power graphics and saved themselves a lot of money and headache.

pjbliverpool · Jul 14, 2007

Anon Lamer said:
Isnt it much easier to make texture content than describing the game world in formulas? I dont think the R600 shader units are underutilized, the r600 superscalar design is an outgrowth of the R5xx and previous designs so there should be enough knowledge to get a good load going. Its just that Nvidia made a much better and balanced design than ATI. IMHO Its like the Nivida 4xxx series vs the 8500 again - ATI is not that hot but has geometry interpolation, Nivida has the FPS but not as many geek features.

I wonder if we will see any big performance improvements on R600 when the tesselator gets used in ported Xbox 360 games? Or perhaps the higher level geometry will simply be deactivated on G80. That would certainly be a big advantage for R600 owners.

AlexV · Jul 14, 2007

pjbliverpool said:
I wonder if we will see any big performance improvements on R600 when the tesselator gets used in ported Xbox 360 games? Or perhaps the higher level geometry will simply be deactivated on G80. That would certainly be a big advantage for R600 owners.

IF the tesselator is ever used on the R600(I know it`s part of future DX iterations, but since the R600 isn`t, it`s irelevan't to it). That is not a certainty. X360 ports have been fine and dandy on former cards(think R5xx, G7x) and were carbon copies visually. I think the tesselator is one of those things that was so easy to include that ATi said-sure, why not. It was already researched and tested on the X360, integration couldn`t be a major pain etc. But it`ll probably be mostly a fringe feature for this gen, if anything.

pjbliverpool · Jul 14, 2007

Morgoth the Dark Enemy said:
IF the tesselator is ever used on the R600(I know it`s part of future DX iterations, but since the R600 isn`t, it`s irelevan't to it). That is not a certainty. X360 ports have been fine and dandy on former cards(think R5xx, G7x) and were carbon copies visually. I think the tesselator is one of those things that was so easy to include that ATi said-sure, why not. It was already researched and tested on the X360, integration couldn`t be a major pain etc. But it`ll probably be mostly a fringe feature for this gen, if anything.

Yeah I think ATI said at the interview on this site that the Tesselator was small and very easy to implement. It will be a shame if it doesn't get used until DX11 though.

Still, you have got to wonder if its being used on the 360 and if so, how the function is handled when those games are ported to the PC. Could G80 handle it via the geometry shader?

fellix · Jul 14, 2007

pjbliverpool said:
Yeah I think ATI said at the interview on this site that the Tesselator was small and very easy to implement. It will be a shame if it doesn't get used until DX11 though.

Well, the fact that they left the tesselator intact in the skinny RV610, despite the chopped down HiZ, some caching and compression, indeed points out that this thing is not a rocket science, nor a greedy die eater.

AlexV · Jul 14, 2007

pjbliverpool said:
Yeah I think ATI said at the interview on this site that the Tesselator was small and very easy to implement. It will be a shame if it doesn't get used until DX11 though.

Still, you have got to wonder if its being used on the 360 and if so, how the function is handled when those games are ported to the PC. Could G80 handle it via the geometry shader?

Unlikely...the performance hit would be too significant. The R600 wouldn`t fare much better either, if one were to try to emulate heavy tesselator usage through the GS, IMO. That being said, I have absolutely no idea how much is it used on X360 titles. And I have serious doubts any serious developer would putz around with something that only a few users could enjoy.

swaaye · Jul 14, 2007

Let's just look to TruForm's success (or lack thereof). And that was before the currently-prevalent fancy geometry simluation thru normal mapping, etc.

pakotlar · Jul 15, 2007

swaaye said:
Let's just look to TruForm's success (or lack thereof). And that was before the currently-prevalent fancy geometry simluation thru normal mapping, etc.

Truform was not very versatile to say the least.

Shtal · Jul 15, 2007

Rangers said:
Utilizing shaders is simply not R600's problem.

Everything around the shaders (TMU's and ROP's, specifically lack of MSAA resolve) is the problem.

You're barking up the wrong tree.

R600 can (and does) crush G80 anywhere raw shader power is the issue. The problem is they're rarely the issue on R600.

I wrote about stream processors comparison

But of course TMU's and ROP's holds R600 back due to because not much space left on 80nm tech. Since the chip using lots of transistors that cause increasing size and complexity of the chip and the wafers on which chips are made are fixed in size and if you have a chip with lots of transistors, it takes up lots of space, and you canâ€™t make so many of them from one wafer. Having big complex chips can really hurt how much it cost to make; that is why ATI did not added TMU's or ROP's; it would have cost a fortune.

AlexV · Jul 15, 2007

Shtal said:
I wrote about stream processors comparison

But of course TMU's and ROP's holds R600 back due to because not much space left on 80nm tech. Since the chip using lots of transistors that cause increasing size and complexity of the chip and the wafers on which chips are made are fixed in size and if you have a chip with lots of transistors, it takes up lots of space, and you canâ€™t make so many of them from one wafer. Having big complex chips can really hurt how much it cost to make; that is why ATI did not added TMU's or ROP's; it would have cost a fortune.

You seem to have a penchant for stating the obvious at times

How does this support your reasoning?

Shtal · Jul 15, 2007

Morgoth the Dark Enemy said:
You seem to have a penchant for stating the obvious at times How does this support your reasoning?

Simply ATI followed NVIDIA steps from the past

NV30 was too complex in the way of having too much transistors on 130nm tech space.

AlexV · Jul 15, 2007

Shtal said:
Simply ATI followed NVIDIA steps from the past
NV30 was too complex in the way of having too much transistors on 130nm tech space.

So what?

WaltC · Jul 15, 2007

Shtal said:
Simply ATI followed NVIDIA steps from the past
NV30 was too complex in the way of having too much transistors on 130nm tech space.

I think it is much more likely that nV3x was simply too complex to have been a good overall design--at least in terms of its competition at the time. I say this for two reasons:

1) Prior to shipping nV3x, nVidia went to great lengths to explain, publicly and often at length, why nV3x was "impossible" except by way of 130nm production.

2) Subsequent nVidia gpus post nV3x (nV4x and later) were not related to nV3x in terms of architecture, but instead have been radically different. So this would indicate to me that nV3x was "too complex" for production regardless of process.

I still at this late date do not know what to actually make of R600 as it is currently shipping. Some reviews I've read are very negative, some are very positive, and much of the comment I read from R600 owners disagrees with the more negative reviews I've read--sometimes fundamentally. Reviews and customer anecdotals written about nV3x were almost uniformly consistent at the time, the only party being in almost complete disagreement with those opinions having been nVidia, IIRC.

fellix · Jul 15, 2007

Continuing the OT, the primary reason the NV30 had done so little with more hardware is that it wasn't a quad-based rendering architecture in a contrast to its rival at the time -- R300. Being so, there was a lot of redundancy logic to spend transistors on, and I haven't counted in the twice narrower memory bus.

Bouncing Zabaglione Bros. · Jul 15, 2007

I think R600 would be looked at more favourably if it was running at the 1 ghz mark, taking the edge of some of it's problems (like the massive AA hit due to it's lower clocked shaders). I can only hope that the 65 nm refresh manages to hit a much higher speed, giving the core the boost it needs to overcome the low points of it's design.

fellix · Jul 15, 2007

Honestly, I doubt the shader capacity is the bottleneck here, especially for the rather dull box resolving filter. There is some underutilization in the loop-back path where the subsamples are fetched to the SIMD arrays.

Anyway, I wouldn't mind, in the next die shrink re-spin of the R600, to see two more SIMD arrays added -- it's the most cheapish way around for an intermediate architectural upgrade and it will be a good "benchmarking" hype trick to catch an eye for the masses.
6:1 ALU:TEX ratio, baby.

ATi 2900 XT 1GB DDR4 for pre-order...

Shtal

Rangers

pjbliverpool

B3D Scallywag

AlexV

Heteroscedasticitate

Anon Lamer

pjbliverpool

B3D Scallywag

AlexV

Heteroscedasticitate

pjbliverpool

B3D Scallywag

fellix

AlexV

Heteroscedasticitate

swaaye

Entirely Suboptimal

pakotlar

Shtal

AlexV

Heteroscedasticitate

Shtal

AlexV

Heteroscedasticitate

WaltC

fellix

Bouncing Zabaglione Bros.

fellix