one more try.. please R300 *only* and 2 Tmu's

The origional topic was that Anand stated in his gigabyte review that the R300 would be *held back* from really being able to perform (or perform a lot higher) once the card is clocked at 400mhz and it has 400mhz + DDR II withouot the addition of a second TMU.

My feeling is that the card is designed for best operation with one TMU. Given the Flexability of the pipeline etc.. Even if you toss in 5-7 more Gbit bandwidth..

All I want to know is.. Do you think from a technical standpoint, Anand has a valid point or not. And what would the increases be, if any, given more bandwidth etc and a second TMU.

Thank you.
 
Dave,

I had made a comment in the other thread about the 9000 not being as flexable as the 9700. You said *eh?*

I did not think the 9000 could do 3 simultanious operations or have the sampleing capabilities that the 9700 has.
 
I'll take a shot at this one. :)


I imagine it's a lot more complicated then simply adding a 2nd TMU to each of the pipes. You would probally have to enlarge the texture cache to (efficiantly) compensate for the increased demands, eight individual memory controlers may become more prudent to feed the fatter pipelines, and raw bandwidth would have to be increased by quite a bit.

I'm sure you wouldn't have to do ALL that to make it perform faster with a 2nd TMU, but I don't know how much speed you would gain.


If I took a guess at the best way to speed R300 up. It would be to clock it up to 400Mhz+ (maybee via .13um), and take advantage of it's DDR2 support? :)
 
Getstuff...

You need to go read anands Gigabyte review. that card alreadyt clocks at 400mhz at .15 without additional cooling.
 
Hellbinder[CE said:
]Getstuff...

You need to go read anands Gigabyte review. that card alreadyt clocks at 400mhz at .15 without additional cooling.


So what? Did you notice the MAYBEE via .13um in my last post. Just because the card overclocks to 400Mhz does not make it a viable product to sell, or perhaps it could be. I wasn't making an assumption either way, but I would say over 400Mhz would be a better target for a refresh.

I tried to shed some light on your question, but you crap on your own thread to proove some meaningless point? Christ....
 
my word everyone is soooo touchie today...

I was not trying to offend anyone.. perhaps I should have posted a smile???

I was simply saying they alreadyt are achieving 400mhz... thats all. jus that .13 might n ot be needed to do it.

I appologize for having rubbed you the wrong way.
 
(Copied from other thread)

I think the "disadvantage"of only one TMU will depend a lot on the application. On dual-textured, fill-rate constrained applications, it certainly could hold it back. But since games of that generation it already has playable at 1600x1200 with AA and aniso, I'm not sure if that is a "real" issue.

I think for more advanced apps whether it is a drawback will depend on the operations you have in your pixel shaders. If your engine is written to use a lot of textures in one pass in the pixel shaders, and you're pixel shaders are not otherwise compute bound, I think the second TMU is an advantage.
 
To combine the thoughts of the above together....

Look at the Radeon 9000 vs. the 8500. That's the best example of "does a 2nd TMU help, given similar bandwidth to pixel rate ratios" we have. It seems to tell this story:

In many "older games" that are optimized for a multitexturing card, and specifically "dual texturing", the extra TMU may help boost scores...as seen on 8500 vs. 9000 benchmarks.

In games that start to utilize shaders more and more though, the extra silicon for those TMUs can end up just taking up space, and not adding any real significant value.

So, we might see a dual TMU per pipe 9700 getting some better Quake3 scores, but I think most would agree they're fast enough already. Going forward I think there's going to be a reversion to 1 TMU per pipe, and an emphasis on increasing clock, and the number of pixel pipes, rather than TMUs.

If carmack ever clues us into the relative performance of the 9000 to the 8500, that would help answer this questions.
 
Prabably we'll see the second TMU introduction in R350 :

8pipelinesx2TMUs each

+

500 Mhz core speed

+

1Ghz DDR-II memory speed .
 
Joe DeFuria said:
So, we might see a dual TMU per pipe 9700 getting some better Quake3 scores, but I think most would agree they're fast enough already. Going forward I think there's going to be a reversion to 1 TMU per pipe, and an emphasis on increasing clock, and the number of pixel pipes, rather than TMUs.

I'll second that view. Actually I think that any 'spare' silicon should be used to increase the computional power in the shaders to decrease the cycles needed to perform a given complex ops.

Multi-texturing as we know it in Quake III should be a thing of the past so kill the extra TMU with it. Focus should be on multiple textures in one pass and trying to get that down to as few cycles as possible by using the silicon on optimizing texture fetch, larger texture caches etc.
 
Hellbinder[CE said:
]The origional topic was that Anand stated in his gigabyte review that the R300 would be *held back* from really being able to perform (or perform a lot higher) once the card is clocked at 400mhz and it has 400mhz + DDR II withouot the addition of a second TMU.

To me that doesn't make sense, as the clock increases would be relatively linear. If it's being held back at 400/400, we should be able to say relatively the same thing at 325/310. Now if the increase went to say 400 core/500 mem, then I think there'd be a case for having a second TMU, but as it stands, as long as the memory and clock increase at roughly the same pace, then there shouldn't really be any problem with wasted bandwidth.

But in all seriousness, does any of it really matter, as in the vast majority of cases the card itself at just 325/310 is already overkill, IMO. There's nothing out there that could even pretend to stress this card.
 
I think you need to think of TMUs as just another functional unit for a pixel shader.

The more multipliers, dividiers, etc available, the more instructions you can issue in parallel, the better per-clock efficiency. If you have only 1 TMU unit, then anytime you encounter a shader instruction which operates on a texture, you must wait a number of instructions later to fetch another one to maximize parallelism. If you have 2 texture fetch instructions in a row, you stall the other functional units.


Since the vast majority of instructions in future shaders will be mathematical operations (I'm betting more proceduralism vs lookup), I doubt there will be a need to issue more than 1 texture lookup per cycle. 2 would be the maximum and I think the benefits will be marginal for DX9 titles.

But for older games, 2 TMUs would probably yield better performance.
 
But in all seriousness, does any of it really matter, as in the vast majority of cases the card itself at just 325/310 is already overkill, IMO. There's nothing out there that could even pretend to stress this card.
Uh, UT2K3 benches indicate otherwise.
 
LeStoffer said:
Actually I think that any 'spare' silicon should be used to increase the computional power in the shaders to decrease the cycles needed to perform a given complex ops.

Multi-texturing as we know it in Quake III should be a thing of the past so kill the extra TMU with it. Focus should be on multiple textures in one pass and trying to get that down to as few cycles as possible by using the silicon on optimizing texture fetch, larger texture caches etc.

I would agree with this and DemoCoder's last post. Multiple TMU's would enhance performance on DX7 titles that use multitexturing (which is a lot of current games), but most of those games are perfectly playable with IQ maxed out by the 9700, and presumably by the NV30.

We're just now really making the transition into DX8 games, which will have more of a balance between shaders and traditional lookup texturing. A second TMU will probably start to show diminishing returns in many DX8 games, which we'll see a host of next year.

Beyond that, it would seem that for pure DX8 shader coding and DX9+, shader computational power really becomes the key. Multiple TMU's are probably a step in the wrong direction at this point, though if the NV30 shows up in an 8 X 2 arrangement and posts the highest QIII scores of the millenium, I wouldn't be too shocked to see ATI respond by offering an 8 x 2 refresh. ATI is, after all, still having to "live up to" the "NVIDIA standard" on raw performance (at least, that's the way most of the world sees it).

Which has to make you wonder... wtf was Matrox thinking???
 
Let me just clarify a touch. The old term of '2 TMU's' is broken down in two parts now.

One is to double the throughput of the pixel shader (the number of instructions that can be executed per pixel per clock). This improves your performance on ALU operations but doesn't help at all with texturing.

The other is to double the number of texture fetches. This involves both increasing the size of the texture cache and (expensively!) doubling the number of read ports on it, so you can get the data out. If done without the former, this improves your texture performance in fetch-limited situations (16X anisotropic, for example) assuming you have the bandwidth to feed the texture unit.

Combining the two, without adding anything to the rate at which you can send whole pixels through the Z and colour parts of the pipe, is equivalent to the old fashioned idea of '2 TMU's'.

I honestly don't know what would be better for future expansion, but I'm sure there are people in ATI who do :)
 
Pete said:
But in all seriousness, does any of it really matter, as in the vast majority of cases the card itself at just 325/310 is already overkill, IMO. There's nothing out there that could even pretend to stress this card.
Uh, UT2K3 benches indicate otherwise.

Looking at the UT2003 benches, the only areas where it dropped below 60 fps was with AA AND aniso at resolutions greater than 1024x768. In my opionion that is not really stressing the card. I probably don't have the highest of standards though, considering I'm still fairly happy with my 3 year old GF256, so maybe you do have a point.
 
Which has to make you wonder... wtf was Matrox thinking???

Remember that this is Matrox's first exposure to the high end market, their first experience in this field.

Remember GF3? How it inially only ran at 200mhz and it took Nvidia some time to raise the clockspeeds?

Well, we see the same thing here and I really believe the refresh of Parhelia 512 on 0.13 will be done the right way.

Maybe it was a mistake to make Parhelia 512 4x4, maybe not...

Matrox still have to improve their drivers, they need to implement bandwidth saving techniques like the competition does (John Carmack himself said how "unhappy" he is from the perfomance of a first 256bit card).

If they do all of the above and do it the way it should be done, I see them competing on the high end with the likes of R300, NV30 and possibly the next card from 3DLabs.
 
Clashman, UT2K3 is a multiplayer game. The real question is how many other players/bots were running around in Anand's demos--I suspect less than the usual 16+.

Plus, too much is never enough with a $400 component (and Doom 3 coming up). :)
 
Bigus Dickus said:
Which has to make you wonder... wtf was Matrox thinking???

At this point it looks like Matrox might have been better off to cut out half the texture units and go with 8 pipes for an 8x1 setup like ATi, but they didn't add the extra texture units at the cost of shader stages. Parhelia should be able to execute longer shaders than GF4 and R8500 in fewer cycles. Also, it's possible that the extra texture units don't take up as much space as more pipelines would.
 
Back
Top