Chip Comparison Chart

ok, the Radeon SDR (4 2Mx32 chips) also has a dual channel memory interface???
Sounds strange, so just for reference I've also examined a Voodoo3 and a good old TNT (both have 8 1Mx16 chips). According to the DMM, both of these also have two memory channels? WTF?
Two theories about the results:
1) All these cards really have two memory channels. Doesn't sound likely (ok maybe a Radeon, but a TNT?)
2) It's possible that the address signals are not connected together physically but they are logically still the same. For instance two output drivers could be used to provide cleaner signals (though that isn't likely, on motherboards far more chips than eight can be driven by the same driver).

Someone please help me and tell me what's going on...
 
Having two physical address buses on a graphics card (even if they are logically connected) has at least 2 major advantages over just having one:
  • It simplifies board routing, as you no longer need to draw traces that span all across the board. This can reduce board layer count and board size, thus reducing costs.
  • It results in shorter buses with fewer loads and less skew between the loads, so that you can run the address bus at higher clock speeds and still run stable.
The main disadvantage is about 20 extra pins on the GPU package, which adds negligible cost.
 
K.I.L.E.R said:
adds negligible cost

So shipping 1 million video cards with two physical address buses per vcard, will the cost still be "negligible" to the IHV?
It depends on what costs more: 20 pins more on the gpu or a slightly less complicated routing. Though the routing really shouldn't be much more difficult if you have only one bus, as you of course still need to connect all ram chips from the gpu (but the address pins can be placed at the right side of the gpu, closer to the respective ram chips).

arjan de lumens said:
It results in shorter buses with fewer loads and less skew between the loads, so that you can run the address bus at higher clock speeds and still run stable.
I'd agree to that if there would be more than 8 (or only 4 in case of the radeon sdr!) chips and the boards memory bus would actually run at a high frequency. But of the 4 boards tested, 3 of them have really "motherboard-like" ram frequencies, and motherboards have both much more chips and longer traces.
But unfortunate as it is, the experiments didn't help to find out if the R9000pro has a dual channel memory controller. Someone borrow me a logic analyzer? ;)
 
mczak, is it possible that the Radeon has dual 64-bit busses so that a low-cost board could be configured with a single 64-bit SDRAM path? (Kind of like the ultra-low end Geforce4/MX boards.) Or does that have nothing to do with it? (I'm clueless when it comes to memory-controller configs.)
 
most of not all video chips can do interweaving as it resuces memory latency. All this is the the chips ability to write to one bank of memory while the other was refreshing..
 
What about S3 Savage series? Does anyone know their specs?

STG 4000 - Fab ST
STG 4500, STG 4800 - Fab TSMC <- Hmm.......... isn't it wrong?
 
demalion said:
First, it seems like a good idea to have some distinction between fixed function and shader performance characteristics, and not just for pixel processing (as the "pixel pipe" and "TMU" no longer directly map), but for vertex processing.

:idea:

I'm slightly off-topic here, but I think this change should be done also for product reviews and any other product articles. The stuff has changed, the old pipeline has blurred, so B3D -- as probably the most educative enthusiast 3D graphics site out there -- should be the first to reflect that! (Of course the old system is still good for all sub-DX8 gear.)
 
Dave - I'm not overly confident with the NV34 being a 4 x 1 config, having just spent the past 30 mins or so sifting through the ORB:

5200 results:
http://service.futuremark.com/compare?2k3=752290
Single = 681
Multi = 815

http://service.futuremark.com/compare?2k3=703481
S = 561
M = 852

http://service.futuremark.com/compare?2k3=786564
S = 629
M = 748

5600U (4 x 1) results:
http://service.futuremark.com/compare?2k3=802030
S = 1298
M = 1370

http://service.futuremark.com/compare?2k3=740470
S = 989
M = 1186

http://service.futuremark.com/compare?2k3=792195
S = 996
M = 1200

5800U (4 x 2) results:
http://service.futuremark.com/compare?2k3=747164
S = 1215
M = 2778

http://service.futuremark.com/compare?2k3=792516
S = 1294
M = 3268

http://service.futuremark.com/compare?2k3=788136
S = 1378
M = 3453

GF4 Ti4600 (also 4 x 2) results:

http://service.futuremark.com/compare?2k3=450368
S = 963
M = 2534

http://service.futuremark.com/compare?2k3=665701
S = 832
M = 2174

GF4 MX440 results (2 x 2):
http://service.futuremark.com/compare?2k3=665701
S = 714
M = 1319

http://service.futuremark.com/compare?2k3=424109
S = 563
M = 1046

GF2 MX (2 x 2) results:
http://service.futuremark.com/compare?2k3=545741
S = 199
M = 320

http://service.futuremark.com/compare?2k3=409505
S = 296
M = 552

To me those figures have 2 x 1 for the 5200 stamped all over them. Bandwidth can't be blamed here given that the 5200 non-Ultra is on par with the MX440 (and the 5200U with the Ti4600)...hmmm, the 5200 isn't packing any buffer compression and neither is the GF2 MX; they both show a similar sized (small) increase between single and multi:

5200
~120% M/S

5600U
~100% m/s

5800U
~230% M/S

Ti4600
~260% M/S

MX440
~180% M/S

GF2 MX
~160% M/S

Bah...I've just convinced myself that I've been talking nonsense...what a wasted post this was! Pfff - I'm clicking Submit anyway after typing all of that. :devilish:
 
To me those figures have 2 x 1 for the 5200 stamped all over them.

If the 5200 were 2x1 then its maximum theoretical fillrate would be twice its clock speed, i.e. 500 mtexels/sec for those clocked at 250MHz (first and third ones you linked) and 600 mtexels/sec for the 300MHz one (the second). Since the realized texel fillrate is higher than that, the 5200 cannot possibly be a 2x1. In fact, since the realized pixel fillrate is higher than that, the 5200 cannot possibly be a 2 x anything.
 
Good point Dave H but if it is really 4 x 1 , then surely the loss of buffer compression can't be having that much of an impact on the fill rate? It should (in theory) be 1000Mtexels/sec for single texturing at the default clock speed and quite frankly, it's nowhere near that. I've tested my GF4 with buffer compression both on and off, and there's no difference in the results.

Edit: I've not been very clear - I agree with you Dave that it can't be a 2 x 1 (or 2 x 2) config on the basis that the single texturing fill rates are too high for this but I'm just wondering out loud as to why it's so poor for a 4 x 1.

Edit2: Sorry for the spamming but this is just to add that I can see that it's the overall bandwidth that's limiting the single texturing rates; it's just the multitexturing that seems so odd:

http://service.futuremark.com/compare?2k3=761228
Single = 863
Multi = 847
Theoretical (based on clocks) = 1176 (4x1)

http://service.futuremark.com/compare?2k3=765053
S = 929
M = 880
T = 1216 (4x1)

http://service.futuremark.com/compare?2k3=752290
S = 681
M = 815
T = 1000 (4x1)

The first one seems a little odd but the 2nd is just bizarre! At least the 3rd (non overclocked) one is more sensible.
 
Neeyik said:
Edit2: Sorry for the spamming but this is just to add that I can see that it's the overall bandwidth that's limiting the single texturing rates; it's just the multitexturing that seems so odd
Yes it's very common that graphic cards can't realize their theoretical single texturing rates, Radeon 9000 is also only about 60% efficient, not to mention the Radeon 9500pro which is even lower for obvious reasons. FX 5200 (and also FX5600) have indeed a lower than expected multitexturing score, if I'd have to guess I'd say loopbacks have a performance hit (the pipelines seem to operate with 2x2 pixel blocks (?), so maybe 1 additional clock for 1 loopback would drop the multitexturing efficiency to 90% max (4 + 1 + 4 clocks for four dual-textured pixels). Though given that logic, the efficiency would drop even more with more than 2 textures (down to a max of 80% for infinite number of textures (which is of course not possible)). That would however mean that single texturing fillrate would be higher than multitexturing fillrate (assuming there's enough memory bandwidth). Sounds too strange even to me, so I'm probably way off :)

edit: however, given the 3dmark scores which indeed show that single-texturing is faster than multitexturing in some cases, this explanation suddenly sounds more reasonable.
 
2 things, well, actually 3 ;)

1) When viewing the details of NV20, it says "Theoretical Performance (@250MHz)". The original NV20 did not run at 250 MHz, but at 200 MHz. It was only with the Ti500 that the core clock went up.

2) If I remember correctly, the NV10 ran at a core clock of 120 MHz. It's been a while since I used the GF DDR, but I seem to remember speeds of 120 for core and 150 for mem.

3) The chip code names are barely visible in the dark blue background :?
 
Neeyik said:
Edit: I've not been very clear - I agree with you Dave that it can't be a 2 x 1 (or 2 x 2) config on the basis that the single texturing fill rates are too high for this but I'm just wondering out loud as to why it's so poor for a 4 x 1

I know I shouldn't insist so much on this, but anyway...

A few days ago, I started a thread on the NV3x influence from ILDP ( Instruction Level Distributed Processing ) architectures. Of course, it was wayyy too insane to get any reply, but, to summarize...

The NV3x looks a LOT like it doesn't have multiple vertex/pixel pipelines, but more like ILDP-based systems included in the global pipeline ( global pipeline is Input->VBA->VS->Triangle Setup->Rasterization->PS->Output )

So, how does this answer your question?
The primary idea of a ILDP is to have many small units, usable in a theorically unlimited number of ways ( I mean, you can do FP->FX or FX->FP, for example ) , and who are *less efficient, clock-for-clock, than a comparable traditional design* due to additional communication delays , but who allow *higher clock speeds*

Seems familar? Sure does to me. If you're interested, you may want to read my thread about it, or maybe even read the whole paper I'm linking to there ( it's long, though! )

http://www.beyond3d.com/forum/viewtopic.php?t=6303


Uttar
 
2) If I remember correctly, the NV10 ran at a core clock of 120 MHz. It's been a while since I used the GF DDR, but I seem to remember speeds of 120 for core and 150 for mem.

120 core is correct... for the SDR version the mem. was at default 166MHz.
DDR version had 150 for the mem. as well.
 
About the 9000 and 5200 having lower efficiencies, couldn't that be a result of smaller caches? There are framebuffer write caches too, and these could be especially useful in alpha blending.

Anyway, being budget cards with emphasis on lower transistor count, I'm sure they reduced the sizes of caches and fifos, negatively impacting efficiency.
 
Back
Top