Q@Reverend

rubank

Newcomer
Hi,
maybe this question doesn´t warrant a new thread but here goes:

I´ve noticed you have said a couple of times, that o/c core is percentage-wise more effective than o/c memory with regards to PS2 scores.

I haven´t found it to be so, maybe I have misread you?

Here´s food for thoughts:

my 9600 pro@489/369 scores 32,3 in 3DM03 PS2 test.
Altering the clocks renders these results:
29,7 @ 405/369
28,4 @ 489/303

these numbers seem to contradict your remarks.
Have I missed something?

rubank
 
Did he say PS 2.0 shading, or scores in the 3dmark 03 PS 2.0 test? And was he talking about the RV350 in particular? Your wording is unclear, and I don't recall the comment at the moment. I'll address the specific PS 2.0 test you mention here.

You shouldn't treat the PS 2.0 test too simply...it doesn't just test PS 2.0 shader execution, it tests the application of that shading to a specific goal: procedural texturing. The bottleneck(s) are those resources that are limited, and there are textures being looked up and pixels being written as well as shading processing...with it's shading efficiency, it doesn't seem too surprising that bandwidth would be a bottleneck considering the balance of the default card's core/memory clock speed ratios and the memory bus width.

What would surprise me is if you started with equal memory and core clocks, and increased from there, that Rev's statement didn't hold true. But there is another factor involved that might make that behavior manifest: the PS 2.0 test in 3dmark 03 uses the sin macro...the NV3x family calculates that instruction, but the R3xx uses a texture look up. Given it's ability to do texture lookups at the same time as another op or two in a given pipe and clock cycle, the issue for that operation when it occurs would be dependence scheduling and texture cache utilization. Since other texture lookups are occuring, the ability of reducing memory bandwidth to more significantly bottleneck throughput just seems to point out that the ratio (and likely the driver management of the texture lookup for the macro implementation) was well balanced at the onset.

Or was he referring to general case PS 2.0 shading?
 
Heh, I just found the answer in the very next thread in my list. He listed specific cards (not the RV350) with different shader processing/bandwidth balances, and he wasn't talking about the PS 2.0 test in 3dmark 03.

Well, the R300 has similar balance in the architecture for the most part (probably not similar texture cache impact on the test, for one thing) and has more relative bandwidth for the given task/screen resolution/texture sizes used, but the NV3x cards listed are significantly different, with much less PS 2.0 shading power to boot. My discussion above should atleast point out some things that you might not have been taking into consideration.
 
Demalion,
in the recent review of the Albatron FX5600 Turbo Rev is specifically targetting the 3DM03 PS2 test, and makes the remark that he´s testing o/c of that cards core clock only since it is allegedly more efficient.
The way I read it (and his comment in another thread), it sounds like this would be so in general, that´s why I specifically ask if I have misread him or have missed some other fine point.

He doesn´t provide any testing to substantiate his claim, hence my curiosity.

So I tested my card with that application and came to another conclusion, that´s why I put forth the question. If that is treating the PS2 test too simply is of course your right to have an opinion on.

The FX5600 has a 128bit bus just like the Ati9600.

To make the test starting with equal (low) mem and core clocks and increasing from there seems pointless to me, in relation to the numbers I provided; the tendency is pretty clear.

Other than that I will wait for his answer.

rubank
 
Well, if you're not interested in other answers, perhaps a PM would have been more suitable? It is one thing to want his answer, it is another to ignore another answer given, and to do so rudely, especially when you are demonstrating that you are failing to consider much of the information I provided.

Please note that my explanation provided you with relevant info, when/if you decide it is acceptable for me to offer it. Namely, that the PS 2.0 shader processing/bandwidth ratio of the 9600 Pro and 5600 Ultra are not the same: I did not mention bandwidth by itself, though your reply failed to take note of my responses outside of the factor of bus width.

You also dismissed my example case and explanation completely without apparent recognition that it relates directly to your statements...that doesn't seem to ever be constructive, but hopefully the answer coming from Reverend will garner a different response.

As for the post that I thought you were referring to, it was this one, which does seem to refer to other PS 2.0 testing AFAICS. If you'll note, my first post did include an address of what you were actually talking about instead: cards other then the one you tested running the PS 2.0 test in 3dmark 03, if at some point discussing the details with me would be acceptable to you. :-?
 
Yes, I was being deliberately rude, to some extent, because I don´t like the presumption that your info, though relevant, would be new to me.
I know there are architectural differences between the two cards in question, but, again, I read Rev´s statements as having general validity. As I said in my earlier answer to you.
That is why, again, I specifically asked if I had misread him. If I have done so, this thread has no merit and that´s why I await his answer.

And yes, Demalion, you did mention bandwidth by itself [with it's shading efficiency, it doesn't seem too surprising that bandwidth would be a bottleneck considering the balance of the default card's core/memory clock speed ratios and the memory bus width.]
And this is in contradiction to Rev´s remark, no?

If you, as indicated in your first response, didn´t know of the remarks I attributed to Rev, you could´ve just politely asked for reference.

Now this is way off topic.

rubank
 
Umm...please note the word "balance" again, and what goes with it, like the mention of shader processing capibility. That's not "bandwidth by itself", which is the point of the statement I made...that's a discussion of the relationship of bandwidth and shader processing throughput and the rather significant relevence of that relationship to what you are observing on a card that Reverend was not discussing. :-? Perhaps reading my initial post again would help when your desire for "on topic" discussion rather than animosity includes my person, if that is possible.

That concept is in fact central to what I was attempting to point out about your question and the applicability of your observations to the statements in question by Rev.

However, you're right, the animosity is in fact off topic...but, err, it was your decision to make that the focus of your discussion. :oops: . The "on topic" portion of my answer is still there for you to address if an "on topic" discussion becomes appealing to you again.
 
Rubank - the PS2.0 test should be reaonably processor limited (i.e. Pixel Shader), however the procedural textures are built with the use of a noise map which is generated at load time. This noise map will need to be sampled during rendering, which inevitably uses bandwidth.

The card Rev was using had a default core/mem at the same speeds, which would make this less bandwidth limited - hence, overclocking the core on what should be a shader limited test is likely to make a difference. You're using a 9600 PRO which has the memory runnng much slower than the core, making it a little bandwidth starved in a number of cases in the first place. Try underclocking your core to the same speed as the RAM and then take the testing from there.
 
Dave,
as far as I can see the card Rev is testing runs at 325/300 for core/memory (memory at "600"), not the same speed.
So for the sake of it, I clocked my card to 327/303 (can´t reproduce the exact speeds) and started from there, giving 24,2 in PS2.
At 363/303 (equiv. to Rev´s o/c) the score is up by ~5%. But, at 327/363 the score is up by ~9%, so in my case o/c the memory still gains better scoring.

Though, running the complete "games" test at the same clockings renders different results. In this case the core o/c gives ~6% increase while o/c the mem gives ~4%. Practically all of the difference is related to GT4, which obviously benefits more from increased core speed.

It should be noted though, that this sets my card at a "balance" it´s not designed for.

(OT sidenote: I´m surprised at the level of performance this card gives at this wild underclock. I´m less surprised that the texturing artifacts in Troll´s lair are still there; I did wonder if it was due to the stressful memory clocking i normally use, but it isn´t)

rubank
 
9600 has 2 or 4 times the PS2.0 processing performance than NV34 - its going to be far more limited by PS processing power than bandwidth in comparison to 9600 which is why increasing the core on this would make much more of a difference.
 
Dave,
I realise that. But, again, I read Rev´s comments as of a more general nature. I also asked if I had misread him.
In the "512bit memory thread" he says this is true for R300, NV31 and NV 34.
In the Albatron test he says "I only tested overclocking the core in 3DMark03 since it is the core that determines the T&L and shader performances in this benchmark". Note "this benchmark", not "this card".

It seems I did misread him. End of story.

rubank
 
rubank, first of all, if you wanted a response from me and only me, use PM. Otherwise, posting things here in our forums and expecting folks to respect the possibility that you are knowledgeable about certain things when you don't provide all the relevant info is just as presumptious on your part as accusing, and being annoyed at, others of presuming you are not knowledgeable. Not being mean to you, just telling you about the possible scenario :)

As for the topic at hand, in that 512-bit thread, I specifically referred to my own PS research. Perhaps if I had reversed the order of my two paragraphs, it may be more clear to you why I think GPU performance should have more important going forward compared to bandwidth because I was referring specifically to my PS research using the three cards/cores (R300, NV31 and NV34) and, based on this, what I think is how things are progressing. My PS research, in no possible way, can have any relation to 3DMark03's PS_2_0 test, whether in terms of GPU performance or memory bandwidth considerations. I haven't reached the stage where I can give you indisputably correct and specific performance numbers on what I'm working on... for the moment, I'm concentrating on implementation (I'd have to guess it's about 40 to 50% now), before I start cleaning up the code and look for performance optimizations. And then I'd pass it on to Kristof to check if I'm a lousy programmer or not :)

In the Albatron test he says "I only tested overclocking the core in 3DMark03 since it is the core that determines the T&L and shader performances in this benchmark". Note "this benchmark", not "this card".
It wasn't clear to you that I was talking about the card in the review? Maybe it would be clearer if I had said "I only tested overclocking the core in 3DMark03 since it is the core that determines the T&L and shader performances in this benchmark with this card" (even though it should be clear what card I was reviewing)? :)

It seems I did misread him.
You misunderstood or misinterpreted what I said.

End of story.
Indeed, it should be.
 
Back
Top