Digit Life convinced that 9500 Pro is a 4x2 with 256 bus

McElvis

Regular
http://www.digit-life.com/articles2/radeon/r9500pro.html

Here's a quote -

It proves our theory, doesn't it? :)
In closing of the synthetic part of the tests I can only repeat what I said after the fillrate test: the RADEON 9500 PRO has 4 pipelines with two texture units on each instead of 8 pipelines with one texture unit. That is, 4x2. Also, we proved that it's coupled with the 256bit memory bus. Well, ATI is really smart - it modernized the PCB, made it cheaper and took in many users and testers convincing of the 128bit bus. In the mode of execution of vertex shaders the chip is forcedly slowed down relative to the 9700, while in the fixed TCL mode its performance is not limited and equals the normal R300, which will definitely affect applications without vertex shaders.

However, everyone is satisfied: ATI is not bothered with questions like how they dare make relatively cheap cards with a 256bit bus, NVIDIA doesn't giggle aside thinking that that it might be too good to provide 256 bits on a mainstream card. Users of the RADEON 9700 are sure they are the only who have 256 bits and all 9500 PRO are coupled with the promised 128 bits. The protection works, and all tweakers are sure about 128 bits on the RADEON 9500 PRO.
 
Sigh. They come to this conclusion based on the single/multitexturing fillrate tests

The RADEON 9500 PRO (8 pipelines) runs at the same speed in the single texture mode as the RADEON 9500 (4 pipelines).

What they fail to take into account is a 128bit memory bus just isn't able to cope with all 8 pixels every cycle. In multitexturing mode it is doing 8 pixels every two cycles, so on the cycle it write on it still can't output all 8 pixels, so some (4?) are just cached and written on the next cycle (while the chip is doing the second texture cycle of the next 8 pixels).

Of course, a simple way of testing this is to do the 3DMark fillrate test in 16bit and see what the performance is like then - I see they didn't do this (I would, but have to go out now).
 
DaveBaumann said:
In multitexturing mode it is doing 8 pixels every two cycles, so on the cycle it write on it still can't output all 8 pixels, so some (4?) are just cached and written on the next cycle (while the chip is doing the second texture cycle of the next 8 pixels).

Not sure if you said this right. In multitexturing mode it does 8 pixels in 8 cycles, each with 8 textures (from what I know about 3DM2K1). Single texturing mode is where there are bandwidth problems. In this case, you're right in that just under 4 get done per clock, but the rest of the pixels just get backed up in a fifo, and so the card has to wait for room to add the next 8 pixels into the fifo.
 
Mintmaster said:
In multitexturing mode it does 8 pixels in 8 cycles, each with 8 textures (from what I know about 3DM2K1).

That implies that the multitexture test is using 8 texture layers?

I'm not familiar with what the fillrate test in 3DM are actually doing by my explaination would be the case if it were just dual texturing, however what I explained would still be the case for more textures but obviously there would be less pixel write per cycle.
 
good lord, those guys cant see alternate explainations for anything!

like, maybe it gets the same single texture fillrate as the 9500 (non pro) because it is bandwidth bound, and gets higher on the multi-texture fillrate tests becasue it is fillrate bound there?

Its like, DUHHHH?

And thier logic for a 256bit bus blows my mind.

This is one of my favorite quotes:
It's interesting that the developers overlooked the performance when lighting is enabled, and the RADEON 9500 PRO overtakes the RADEON 9700. According to ATI, they have identical chips, and it again proves that the card lost its weight on the software level, and the designers didn't account for all nuances. In the geometrical issues it's a higher clock speed that make an effect on the performance.

No shit! you mean TL performance is bound by clockrate? What a suprise! and the conclusion that this "proves" the card is modified thru software is ludicrous.

And DAMNIT!
break the page up into more than one huge massive farging thing!
 
lol. :) I prefer the whole article on one page :p

Single-Texturing: There are 64 surfaces with one texture each. This means that the graphics hardware fill each of these objects separately, no matter how many texture layers that card is capable of drawing in a single pass.
Multi-Texturing: We draw 64 texture layers as fast as possible. This means that we take advantage of the fact that modern cards are usually capable of drawing multiple texture layers on a single object as fast as it would draw one single layer. 64 texture layers are distributed so that each surface in use has as many texture layers as that particular card can draw in a single pass. For example, if your card can draw 8 texture layers in a single pass, then there will be 8 objects with 8 texture layers each. If your card is capable of doing 6 texture layers in a pass, there will be 10 objects with 6 layers and an 11th layer with the remaining 4 layers.

Thats what the 3dmark fillrate tests are supposed to do (2001 anyway, 2000 and 99 only had 4 layers)
 
I prefer it on 1 page also. Makes it easier to read than go back and forth between pages unless I am using opera :D

As for their conlcusion i have to say they seem to be always looking for conspiracies. Their last review had some odd conclusion too (I think it said 9500 used 256 bit bus) which was refuted by the board members here or I should say the board members disagreed with their conlcusion because the amount of data and tests that were needed to back those claimes up were not sufficient.
 
Article is down now.

Sent the following (it was quickly typed, so its fairly poorly done):

Hi,

I’d just like to suggest a few things for you to test in relation to your recent 9500 PRO review.

I see that you have concluded that the 9500 PRO is a 4x2 based on the 3DMArk fillrate tests – IMO that conclusion is wrong. I’ll explain why…

First off, the architecture of Radeon 9500 PRO is an 8 pixel pipe card with a 128bit bus; now at 32bit 8 pixels all at once (and that’s not including other memory transactions such as Z checks etc) will require 256 bits of transfer, however with only a 128bit bus this is not sustainable and only the equivalent of 4 pixels per clock can be passed out. In multitexturing mode its not an issue as 8 pixels are produced in 8 cycles (not one per cycle, but 8 cycles pass then all 8 are ready to be passed to the framebuffer); in this instance only 4 pixels can exit in the clock they are ready, but the other four are stored in the fifo buffer, and because the pixel pipelines are still busy texturing the next set of pixels over several clocks, so the other four pixels are just passed out of the fifo on the next write cycle.

This is why the fillrate tests ‘appear’ to show 9500 PRO as a 4x2 card, but in fact all it is doing is balancing between the ‘loop-back’ texturing and how many 32 bit pixels can be written out per clock cycle. GeForce FX is also an 8x1 chip, but because it is also limited to a 128bit bus it will display very similar tendencies.

An easy way to test this is to run the 3DMark test in 16bit mode (16bit frame buffer and z buffer)….

Radeon 9500 PRO 32bit 3DMark Single Texture Fillrate: 937 MT/sec
Radeon 9500 PRO 16bit 3DMark Single Texture Fillrate: 1523.7 MT/sec

As you can see the fillrate in 16bit mode is greater than the theoretical maximum of a 4 pipe card running at 275MHz. If you also test a 9700 (non-pro) you’ll see that this fillrate is very similar to its 32bit fillrate.

This also throws into question your conclusion that it is a 256-bit board. However, IMO this was on fairly shaky ground in the first place. Concluding that it is 256bit based on AA performance may not actually be possible with chips such as R300 – R300, like GeForce FX, features AA colour compression so it will be very difficult to assess exactly how much bandwidth is being used. Its also clear from the board layout that the power regulation circuitry that there is no memory traces going to the right hand portion of the chip, which is where two of the four 64bit memory controllers of R300 are located – all the memory traces of the 9500 PRO go to the top portion of the chip, meaning it only goes to two memory banks.

Cheers,
Dave Baumann
Editor-In-Chief
http://www.beyond3d.com

Only just recieved the following reply:

DB> Hi,

DB> I'd just like to suggest a few things for you to test in relation to
DB> your recent 9500 PRO review.

thanks!

we now reinvestigate this issue and probably we have mistake with 256 bit as you say

we found way to reprogram 9700 for work only with 2 controllers - 128 bit and it show WERY similar with 9500 pro results

so probably you right - here only 128 bit acces to memory

we will update and reanounce our article tomorrow

thank you for attention and detailed arguments

Best regards,
/Alexander Medvedev,
Technical Director,
 
I have sneaking, climbing suspicion that it be WERY possible that maybe from now on and in the future we vill look much mor-r-r-r-r-re closely at manufacturer's published specifications. Ve haf sneaking, climbing suspicion that mabye card maker know how they make cards and what for inside! You think the truth be proven that, maybe?

Ya!
 
Well, its back up again, and it look like they took on board what I suggested:

So, we definiely has a 128bit bus. The hope of turning the 9500 PRO into the 9700 on the software level vanishes away.

The 9500 and 9500 PRO have equal scores in the Fillrate test because of the limited throughput of the 128bit bus - it just can't record more than 4 32bit values per clock. To prove it we ran the Fillrate test in the 16bit mode. And now the difference between 4 pipelines of the RADEON 9500 and 8 pipelines of the RADEON 9500 PRO is noticeable. The 128bit bandwidth is sufficient for recording of eight 16bit color values per clock.

And what about the performance drop in the 2õ MSAA mode? A 128bit bus wouldn't be enough for recording twice more values without losing the speed. The answer is that the R300, like NV30, supports the technology of color compression in the MSAA mode. This, in fact, is lossless compression, close to 2:1 - each pair of MSAA samples is compressed, and if they are identical (it happens almost always except edges of polygons which make a very small part) only one value is recorded. In the 2õ mode effectiveness of such compression will be equal to the NV30, but in the 4õ mode the drop is much greater - it seems that only pairs of samples can be compressed, - if there are 4 identical samples, the factor is not 4:1 like in case of the NV30, but remains 2:1. It results in a twice greater frame buffer in the MSAA 4õ mode. However that may be, but 2:1 is much better than nothing, and this is what we can see comparing performance of the RADEON and GeForce 4 in the MSAA mode.
 
DaveBaumann said:
Well, its back up again, and it look like they took on board what I suggested:

So, we definiely has a 128bit bus. The hope of turning the 9500 PRO into the 9700 on the software level vanishes away.

The 9500 and 9500 PRO have equal scores in the Fillrate test because of the limited throughput of the 128bit bus - it just can't record more than 4 32bit values per clock. To prove it we ran the Fillrate test in the 16bit mode. And now the difference between 4 pipelines of the RADEON 9500 and 8 pipelines of the RADEON 9500 PRO is noticeable. The 128bit bandwidth is sufficient for recording of eight 16bit color values per clock.

And what about the performance drop in the 2õ MSAA mode? A 128bit bus wouldn't be enough for recording twice more values without losing the speed. The answer is that the R300, like NV30, supports the technology of color compression in the MSAA mode. This, in fact, is lossless compression, close to 2:1 - each pair of MSAA samples is compressed, and if they are identical (it happens almost always except edges of polygons which make a very small part) only one value is recorded. In the 2õ mode effectiveness of such compression will be equal to the NV30, but in the 4õ mode the drop is much greater - it seems that only pairs of samples can be compressed, - if there are 4 identical samples, the factor is not 4:1 like in case of the NV30, but remains 2:1. It results in a twice greater frame buffer in the MSAA 4õ mode. However that may be, but 2:1 is much better than nothing, and this is what we can see comparing performance of the RADEON and GeForce 4 in the MSAA mode.

*chuckle* these guys are something else...;) Do they just totally ignore everything the manufacturer says about various things--like all the compression comments they've made thus far? A strange crew--they want to second-guess everything ATI says, but accept on faith everything nVidia says about the nv30. Remarkable. Weird........!
 
Sorry to dig up this old thread again, but there's another Digit-Life review that seems to suffer the from belief that the 128 MB Radeon 9500 (non-Pro) has a 256 bit memory bus. I'm not sure how old this review is, but they seem to really want to cling to that belief.

http://www.ixbt.com/video2/images/r9500-2/3dm-single.png
http://www.ixbt.com/video2/images/r9500-2/3dm-multi.png

In this test we are to prove that the RADEON 9500 128MB has a 256-bit bus indeed. Look at the first scores. We enabled the AA2x mode (the lightest AA) knowing that it didn't cause speed drops in the RADEON 9700Pro. Taking into account that the RADEON 9500 is deprived of half of the pipelines, it won't be easy for it to cope with the AA even with a 256-bit bus, that is why we simplified the task as in this test the chip doesn't have much work to do. In 1024x768 the speed remains the same, but after that it goes down. 

In the multitexturing mode when the chip may use all its possibilities in operation with the textures, the performance doesn't worsen at all when the mode changes for AA2x! This fact proves that the card comes with a 256-bit bus!

Are they wrong yet again?
 
A non-Pro Radeon 9500? It will have the exact same performance hit as a Radeon 9700 Pro from FSAA because it has the same memory bandwidth/fillrate ratio, on a 128-bit bus.
 
DaveBaumann said:
That implies that the multitexture test is using 8 texture layers?
The multitexture test uses as many layers as it can. So if you support 2 textures at once, it will use 2. If you support 8, then it uses 8. Note that the final result has 64 texture layers, so even with 8 textures at once, it still takes 8 full screen polygons each with 8 textures.
 
Chalnoth said:
A non-Pro Radeon 9500? It will have the exact same performance hit as a Radeon 9700 Pro from FSAA because it has the same memory bandwidth/fillrate ratio, on a 128-bit bus.

That seems a little too generalized to me, Chalnoth
The Radeon 9500 doesn't have less VS units. That means it's very good in geometry-limited situations.


Uttar
 
Uttar said:
That seems a little too generalized to me, Chalnoth
The Radeon 9500 doesn't have less VS units. That means it's very good in geometry-limited situations.

Show me any real world application that is geometry limited on the 9700 ... :)
 
Hyp-X said:
Show me any real world application that is geometry limited on the 9700 ... :)

UT2K3 at 640x480, no AA/AF, minimum texture details and model/world details at maximum with shadows and stuff? :D

Okay, it isn't really "real world", nobody would do that. But eh, maybe future games will be. It's still worth saying.


Uttar
 
Uttar said:
Hyp-X said:
Show me any real world application that is geometry limited on the 9700 ... :)

UT2K3 at 640x480, no AA/AF, minimum texture details and model/world details at maximum with shadows and stuff? :D

I think it would be CPU limited in that configuration...
 
Back
Top