That backs up pretty well what I previously said about R600 being core rather than memory limited. At least in current games.
Despite a fairly substantial memory bandwidth increase the GPU gains practically no performance. I think the reasoning is fairly obvious, the R600 is not bandwidth limited at all, its core is simply underpowered for the current crop of games to maximise use of even 100GB/sec, never mind 128GB/sec.
Not only does Diamond’s Viper Radeon HD 2900 XT 1GB sport more memory than your typical Radeon HD 2900 XT card, as we outlined earlier, it’s packing 1.0GHz (2.0GHz effective) memory. As a result, the memory subsystem is capable of delivering up to 128GB/sec of peak bandwidth to the graphics core. That’s an impressive figure that no other GPU on the market can match, but this number can be a little deceiving. We’ll discuss this in more depth on the next page.
On paper, the 2.0GHz GDDR4 memory used on Diamond’s Viper Radeon HD 2900 XT 1GB is without equal in the desktop graphics market, nothing else really comes close to matching it for peak memory bandwidth. But as we discovered with the Radeon X1950 XTX, GDDR4 memory runs at much higher latencies than GDDR3, this hampers performance.
http://www.firingsquad.com/hardware/diamond_radeon_2900_xt_1gb/page2.aspIn the case of the X1950 XTX, the board’s GDDR4 memory ran over 200MHz faster than the X1900 XTX, yet in many benchmarks, the X1950 XTX was only 3-5% faster than the X1900 XTX. The delta separating Diamond’s Viper Radeon HD 2900 XT from the stock Radeon HD 2900 XT isn’t as large, so it’s possible that the performance between the two cards could be even slimmer in games that don’t take advantage of the Diamond board’s added memory.
Looks that 1GB-version of HD 2900XT has 128GB/s memory bandwidth, but since it is GDDR4 it runs at higher latencies. (Not much help)
http://www.firingsquad.com/hardware/diamond_radeon_2900_xt_1gb/page2.asp
Yup I read that but we are talking a 25% bandwidth increase here for pretty much 0% speedup. I can't see the latencies being that bad. In fact, it would be pretty damn co-incidental if they were (hurt performance by exactly the same amount that the extra bandwidth helps it). More likely R600 just can't use the extra bandwidth.
More likely GPU is struggling then the memory.
But if AMD/ATI could find some how in (future driver updates) find away to enable those 320 stream processing units and put them in good use.
The connection being?You have a feeling that it`s R600s math ability that`s lacking, due to the "OMG big number" stream processors not being fully enabled?
I expect for R600 with its 320 streams, 512bit memory - that it will have better chance in a new DX10 games and the ability to use those all 320 shader processors, current games aren’t exactly "DX10" - sadly the first games we get are pretty bad at being DX10 graphics wise and game play. Wait till we see second, third generation titles based entirely on DX10 that can actually use 320 streams on HD2900XT effectively and the way there meant to be used. But main concern it has to do with texture units; the HD-2900XT has 16 which are running approx ~740MHz where the GF8800GTX has 64 running at 575MHz, simply ATI does not have enough texture units; this is where I see the main problem for the HDRadeon2900XT. In my opinion, the 320 streaming processors using a VLIW architecture is too complex for a graphics card. They eat up too many transistors, which is why it doesn’t have enough texture units or AA resolve units. To compensate for this, they boosted its clock speed, which makes it run hotter and use more power, and thus necessitate a louder fan. I like NVIDIA’s idea of having the shaders running at a higher clock than the rest of the chip, because then you get extra performance without eating into your transistor count and die size on a chip. Using lots of transistors is very bad, because it increases size and complexity of the chip. The wafers on which chips are made are fixed in size and if you have a chip with lots of transistors, it takes up lots of space, and you can’t make so many of them from one wafer. Having big complex chips can really hurt how much it cost to make. That’s why ATI did not added texture to resolve units; it would have cost a fortune.
Thank you, you`ve given me a headache. Can't isolate the reason, but it`s there, and it started whilst trudging that very compact post. Anyway, if you`re waiting for third generation titles to put the 2900 in a better light...well, I`m definitely sure I`ll be pleased in a few years when I`ll get 3 FPS with everything on max on my CF 2900s, compared with my neighbour's 1 FPS on 8800s SLIed...I`ll be 3 times as fast, weee!
Seriously now, are you basing your beliefs on anything other than:they must`ve done something really right, it can't be so mediocre across the board?
Ok! fair enough.... (Software drivers is just a fractional / partially that holds R600 back)
Driver improvement; yes, for mathematical ability that is currently lacks for full potential not being fully enabled for 320 stream processors.
Here are the results for your decision.
HD-2900XT 512MB cost $400.00 dollars
HD-2900XT 1GB cost $550.00 dollars ($150.00 dollars extra for small bump)
eh? My 1Gb cost me $480.
the 1GB GGDR4 (2GHZ) with about the same performance of the 512mb GDDR3 (1,65GHZ) ? how ?
could u explain to the noob here ?
Latencies.
so , what's the advantage to have the same card with GDDR4 but with high latencies ?
I haven`t managed to decipher what you`re trying to say...maybe the first paragraph, I can get a glimpse of a meaning, but the second one is totally gibberish to me. I know that english is not your first language, but could you please try to rephrase that somehow?Because it makes no sense ATM. Thank you.
There isn't one, which is why I don't think its down the the latencies alone but rather down to the fact that the R600 core simply isn't powerful enough to make use of 100+ GB/sec in current games.
Perhaps in pure DX10 games it will gobble up that bandwidth but by the time we see those games, R600 will be obsolete.
My English vocabulary contains limited of words.
ATI 64 physical pipelines times by 5 you get total 320 streams processors.
NVIDIA 128 physical pipelines times by 1 you get total 128 streams processors.
NVIDIA’s is more efficient way to get maximum performance out of 128 stream processors. ATI on their other hand is more complex in the way for squeezing maximum performance out of 320 streams. (Example: Take a look at R580 3:1 ratio 48 pixel shaders…. Not very efficient way for squeeze all 48 shaders to the max from 16 physical pipelines)
And to make software-driver that mathematically calculate more efficiently and accurately is more complex because developers who program game engines has write a code that be able to read inside the physical pipelines which contains multiplier inside for producing output of 320 stream processors.
What makes you think that the 320 stream processors(omg, big number) are underutilized ATM?