How can Nvidia be ahead of ATI but 360 GPU is on par with RSX?

Status
Not open for further replies.
ihamoitc2005 said:
Rendering each frame takes X number of clock cycles.

Y number of clock cycles dedicated to vertex shader operations means (X-Y) number of clock cycles available for pixel shader operations no?
No. Because rendering a frame in modern graphics engines no longer consists of a single pass of vertex shading leading directly into pixel shading, i.e. vertices leading into rasterised pixels, which are then textured and shaded.

Some shadow-capable engines construct the shadows in a scene using a pre-pass for the frame. The graphics card constructs a z/stencil model which describes how the shadows from each light fall across the scene. While constructing this model, there's no need to texture or shade any pixels. Colour is immaterial.

The engine simply wants to know which pixels are shadowed.

Jawed
 
Jawed said:
48 shader pipes to run vertex programs instead of 8, when doing any kind of shadow pre-render.
For the last time, filling shadowmaps or volumes is NOT a shader heavy operation, it's not even remotely close to being one.
With shadowmaps you're talking about rendering stuff where 4VS could be enough to hit the vertex setup limit of Xenos. It's pretty much a given that RSX will also be setup limited in this situation - the question is what the triangle setup speed really is. It may be lower or higher then Xenos, but I am quite sure it's not anywhere close to being 6x lower.
 
shadows

Jawed said:
No. Because rendering a frame in modern graphics engines no longer consists of a single pass of vertex shading leading directly into pixel shading, i.e. vertices leading into rasterised pixels, which are then textured and shaded.

Some shadow-capable engines construct the shadows in a scene using a pre-pass for the frame. The graphics card constructs a z/stencil model which describes how the shadows from each light fall across the scene. While constructing this model, there's no need to texture or shade any pixels. Colour is immaterial.

The engine simply wants to know which pixels are shadowed.

Jawed

So this scene is constructed entirely of shadows without textures or pixels? You will have to educate me on this.

By the way, triangle setup limit on Xenos is 500M per sec no? So how many triangles per clock?
 
ihamoitc2005 said:
Efficiency is per scene based on vertex and pixel shader requirement. Xenos can adapt to varing needs. In scenes where 1 vertex shader is required, Xenos is efficient, but RSX wastes 7 vertex shaders, hence RSX less efficient. RSX compensates by having more pixel-shader capability than Xenos so wastage is not liability except in terms of chip size and cost.

Yeah, I can see that. However, I was replying to the poster who picked an arbitrary number , i.e. 90 % efficiency...what is the 90% measuring exactly from PR?

Jawed said:
Before we even start to count all the scheduling efficiencies that accrue from Xenos's scheduler, you've got basic stumbling blocks inside RSX's superscalar architecture. RSX's ALUs sitting idle because of register bandwidth restrictions, or because dual-issues are not possible on successively dependent instructions.

Do you seriously expect me to accept that a stall-less, zero-latency-branch pipeline is not more efficient than a conventional GPU pipeline? Have you entirely forgotten the scheduler discussions?

I'm not going over old ground again. Look at my reply, there was a reason I bolded VASTLY in my reply. It's your liberal use of unfounded superlatives that I was objecting to...

Jawed said:
With predication on each pixel. See the ATI patent on nested control flow:

http://v3.espacenet.com/textdoc?DB=EPODOC&IDX=US2005154864&F=0

The big deal about Xenos is the 64-pixel batches. As opposed to 1024 in RSX. Makes a vast difference in whether code with a dynamic branch is faster or not. The ideal would be 1-pixel batches. But the scheduler/register file/batch queue would be ginormous. So the compromise is an 8x8 tile. As opposed to RSX's 32x32 tile.

Jawed

So Xenos can't do per pixel dynamic branching then as you suggested...
And can you show me a link where RSX has 1024 pixel batches? IIRC, someone on this board hinted at 256...

Dave Baumann said:
I'm illustrating that these "if Xenos does this and RSX does that with the vertex shaders / pixel shaders one is less powerful then the other" trails are just stupid. It completely ignores the structure of the ALU's themselve, its ignores the the arragement of the ALU's, its ignores the efficiency of them - I mean we haven't even counted the extra 16 (filtered) texture address processors for Xenos, which uses ALU cycles on RSX; do we know if RSX has dedicated shader interpolators either?

Okay, your post wasn't clear as it was suggesting a contradiction to my reply to Scooby with the numbers you presented.

Yes I agree with your post in general. But my reply to Scooby was an extreme case that shows boundaries/ limits of the architectures including the CPU. You have a narrow set of options. In that extreme scenario you can get an 'idea' of the options available to you and attempt a comparison. And it was addressing Scooby's comments about Cell,

"If PS3 needs to use CELL to achieve the same graphics processing power as the Xenos, then doesn't much of the PS3 CPU advantage go out the window?"

And no...Cells advantage doesn't go out of the window...
 
Lysander said:
what demos? realtime, ingame, "realtime" cinematic, prerendered, prerendered with only 1 3d frame real time....

We've seen the sega lindbergh(whatevah!) demos, and also the realtime ps3 demos running on g70s(UE3 at 60fps, Fight Night, FFVII at 60fps, MGS4 at 60?, warhawk, Heavenly Sword-inside areas ran ok, iirc, no?). Xenos gphx should not just be on par, but should actually put such demos to shame(close transistor count, unified, supposedly practically 'free' 4xAA+HDR, tons of extra goodies, and at a higher clock.).

I've yet to see a 360 gphx demonstration that not just equals but pretty much completely outclasses said g70 gphx demos.
blakjedi said:
Um no what youve been seeing is a significant realworld advantage of the 7800 GTX over the x800/850... the latter of which I might add, is not SM3.0 compliant nor as fast.

What you are seeing on launch games are x800 development with Xenos optimisations. I'd wager that the improvement between what you see on x360 to date and next fall (a full year with the actual Xenos) will be significant. Much of the xenos panning here will go by the wayside...
These games where supposed to be developed with final h/w design in mind, at the least first party games. There's also the talk of likely 30fps for several games, and from what we've read in this thread talk of using 2x rather than 4x AA on final s/w, not something you'd expect if they were barely scratching the surface of the machine.

So it's: a.) tough to dev. for/port to, b.) dev.s are being supah lazy, c.) It's not beyond the visual range of what's possible on g70 based h/w.

ihamoitc2005 said:
So this scene is constructed entirely of shadows without textures or pixels? You will have to educate me on this.

By the way, triangle setup limit on Xenos is 500M per sec no? So how many triangles per clock?

Not a scene, I think what he means is that today we calc all the shadows of a scene in a single step during the rendering of a scene(and that during that step it'd have an advantage), rather than calc shadows as we go about rendering the scene. Faf commented that this was a non-issue cause it doesn't require too much vs capability till something(setup) else becomes the limiting factor.
 
Last edited by a moderator:
Whether he was trolling or not, I don't think this forum as a whole takes such accusations from newbie members with much credibility. In fact, I think the allegation alone works against the accuser more likely than not.
 
ecliptic said:
I got a better question. How comes trolls such as seismologist are allowed to thrive here?

I don't think he is a troll. He is just a Sony/PS3 fan that asked a question and let his bias slip in a bit. Regardless, the thread is still interesting.
 
zidane1strife said:
We've seen the sega lindbergh(whatevah!) demos, and also the realtime ps3 demos running on g70s(UE3 at 60fps, Fight Night, FFVII at 60fps, MGS4 at 60?, warhawk, Heavenly Sword-inside areas ran ok, iirc, no?). Xenos gphx should not just be on par, but should actually put such demos to shame(close transistor count, unified, supposedly practically 'free' 4xAA+HDR, tons of extra goodies, and at a higher clock.).
Have you been paying attention to those correcting you? eDRAM et al. are not necessarily wins. They're trades. Trades are wins, losses or draws (gosh, that sounds like a great TV show idea) depending on the situation. Your expectation of significantly greater graphics - especially amid subjective judgments of style and such - is just not warranted. Stop listening to rabid Xbox fans.
 
Is it just me or does it seem like Nvidia strongly wants consumers to correlate the G70 with the RSX.

As I recall, during the xbox launch they used a smilar marketing tactic by suggesting the GF3 was the same product that was being used in the xbox.

I'm not condemming this strategy. But I do think its pretty potent. Consumers currently in the market for GFX hardware will see the g70 line as technology from future consoles on their desktops today. And those purchasing after the ps3 is realeased will look for nvidia products when wanting a smilar visual quaitly on their desktops.

Nvidia stands to gain a lot by being discreet about the difference between RSX and G70 for as long a time as possible.
 
Dave Baumann said:
I'm illustrating that these "if Xenos does this and RSX does that with the vertex shaders / pixel shaders one is less powerful then the other" trails are just stupid. It completely ignores the structure of the ALU's themselve, its ignores the the arragement of the ALU's, its ignores the efficiency of them - I mean we haven't even counted the extra 16 (filtered) texture address processors for Xenos, which uses ALU cycles on RSX; do we know if RSX has dedicated shader interpolators either?

We known the NV40@400MHz was almost equal the R420@500MHz that use a separate texture address processor design in most game, and the NV40 is faster in the 3dmark05's pixel shader test and the shadermark 2.1 .

now, the case is RSX@550MHz compare with the Xenon@500MHz

RSX = [(8+2)*8VS + (8+8)*24PS]*550MHz = 255.2 GFLOPS (FP32, and suppose SONY not use the delta geometry clocking tech )

Xenos = (8+2)*48US*500MHz = 240 GFLOPS (FP32)

when the Xenos use 1 Bank to do VS processing, then that is mean there is only 2 Bank to do the PS, in this case , the Xenos's Pixel Shader power is 8*32US*500MHz = 128 GFLOPS, the Vertex Shader is 8*16US*500MHz = 64 GFLOPS.

i think in most game, the Xenos should run VS and PS at same time in many times, if so, the Xenos's pixel shader power will be slower than the RSX , but the VS will be much faster than the RSX.
 
dukmahsik said:
xenos is mightily impressive reading from all this
Yes it's a very groovy processor. Lots of clever things going on and all sorts of potential. No-one's dissing it. The argument here is..

'Xenos is the greatest GPU ever, and ATi won't have anything comparable for 5 years, and nVidia won't have anything comparable for 15 years, and RSX is poo by comparison'

...vs...

'Xenos is nice and all but it's not God's greatest gift to humanity and in many situation RSX might well perform as well if not better, and a lot of marketting speak is going round to hype Xenos when there's no real-world evidence to support claims of superior performance.'

Very similar to 'Cell is da greatest CPU evarrrrr, and it'll rule the world and XeCPU is poo by comparison' vs. 'Cell is nice and all but it's not God's greatest gift to humanity and in many situations XeCPU might well perform as well if not better.' Though in the case of Cell at least we have some real-world examples of it in action to base such opinions, one way or another.
 
cho said:
We known the NV40@400MHz was almost equal the R420@500MHz that use a separate texture address processor design in most game, and the NV40 is faster in the 3dmark05's pixel shader test and the shadermark 2.1 .
And this comparison is totally meaningless in relation to Xenos because R420 has a significantly different ALU structure and shader engine to both NV40, G70 and Xenos. Its a pointless comparison.

now, the case is RSX@550MHz compare with the Xenon@500MHz

RSX = [(8+2)*8VS + (8+8)*24PS]*550MHz = 255.2 GFLOPS (FP32, and suppose SONY not use the delta geometry clocking tech )

Xenos = (8+2)*48US*500MHz = 240 GFLOPS (FP32)

I'm not sure why you are calculating in such as fashion, since Xenos's ALU's are always capable of working on "5D" vectors - in terms of peak math capability per cycle Xenos's ALU's are higher, and even more so if we account for another 16 foating point filtered texture address processors; this is outside of the number of instructions actually calculated because both are significantly different and getting them to meet their peak "Ops per cycle" is highly dependant on the code and also not always very likely.

when the Xenos use 1 Bank to do VS processing, then that is mean there is only 2 Bank to do the PS, in this case , the Xenos's Pixel Shader power is 8*32US*500MHz = 128 GFLOPS, the Vertex Shader is 8*16US*500MHz = 64 GFLOPS.
Again, you are doing meaningles calculations - Xenos is not going to "use one bank to do VS", you can't calculate the number of ops in such a manner because the shader ALU's are completely re-allocatable betwen VS and PS operations on a per batch basis - almost never will you find a "full second" where a 3rd rendering is dedicated to VS tasks.

This of course, entirely leads into the argument of workload type eficiency - RSX will have a fixed VS/PS ratio, meaning that to hit peak efficiency between the units the workload has to be distributed according to that ratio; if we have a VS:pS complexity of 1:5 in the game then thats not going to benefit RSX, if we have 5:1 thats also not going to benefit RSX. In both cases Xenos will reassign the workload to best make use of the ALU's available and in neither case would your "when the Xenos use 1 Bank to do VS processing" calculation even come close to panning out!
 
Dave Baumann said:
And, FYI, the logic portion of Xenos works out a 257M transistors.

Thanks for the reminder (I had said 252M). Like I mentioned earlier, G70 minus Purevideo (Purevideo 1 was a tad over 20M, I believe Purevideo 2 has had some upgrades) would put G70 in the 275-280M transistor range. So we are looking at a 8% difference (and 16% difference we toss in Purevideo which does not do anything for shading or rendering anyhow).

To compare within the framework of recent GPU history, NV30 (125M) had almost 14% more transistors than R300 (110M), and NV40 (222M) had almost 39% more transistors than R420 (160M; 26% if you don't count Purevideo on NV40).

Titiano said:
Thing is, Xenos is not using a similar transistor budget for its shaders as G70/RSX.

Based on the above (basically a repeat of the 1st page and of past posts on this very issue) how can you say that Xenos is not similar to G70? Transistors dedicated to logic are very similar, more so than significant past generations of GPU desktop hardware. I would argue they are more similar than disimilar in this respect in regards to traditional differences between the two IHVs. As the chips get bigger the percentage between them also becomes less relevant in many ways.

Obviously architecture is the important thing, but it would not seem accurate to say Xenos does not have a similar transistor budget for shaders when compared to G70 because it does seem, within the general GPU history of the IHVs, that they are even closer than past gens in a similar performance "class". How effective those transistors are relates to their architecture, bottlenecks, feature capabilities and is a different discussion (as I highlighted above a couple times).
 
Fafalada said:
For the last time, filling shadowmaps or volumes is NOT a shader heavy operation, it's not even remotely close to being one.
With shadowmaps you're talking about rendering stuff where 4VS could be enough to hit the vertex setup limit of Xenos. It's pretty much a given that RSX will also be setup limited in this situation - the question is what the triangle setup speed really is. It may be lower or higher then Xenos, but I am quite sure it's not anywhere close to being 6x lower.
If you want to run the shadowing algorithm on the GPU (instead of wasting CPU time) then vertex shading capability is most-definitely relevant.

b3d36.gif


I've made 10% the proportion of frame render time that can be spent on the shadowing pass - since you want to spend the majority of the frame render time on the colour pass.

If you want to tessellate the scene and you want to do light-dependent tessellation (i.e. the smoothness of the shadow depends on the distance of the object from the light) as well as viewport-dependent tessellation, then you want all the vertex shader clock cycles you can get hold of.

The conclusion is that RSX depends on Cell to tessellate the scene according to the lighting and requires Cell to assist in shadow hull determination/transformation because of its meagre vertex shading resources - much like the Doom 3 engine has a heavy CPU workload and a very limited polygon budget.

http://www.gamedev.net/reference/articles/article1873.asp

Recently Mark Kilgard pointed out that computing the silhouette edges within the vertex shader may be detrimental to performance if the occluders have high polygon counts or if there are a lot of shadows casting light sources. This assessment stern from the fact that we need to push more vertices into the pipeline and all of these have to passed through the silhouette edges testing within the vertex shader. Consequently, occluders with high polygon counts would generate large amount of wasted vertices (degenerate quads), and the cost of testing all these extra vertices may not cover the geometry upload savings we get by using vertex shaders! Having more light sources will obviously worsen such vertex shader implementation further. Hence, implementation of shadow volume on programmable vertex hardware should be thoroughly tested to ensure that we have a net performance gain over implementation utilizing the CPU. If the CPU is needed for heavy A.I. or game logic computation, a vertex shader implementation of shadow volumes may be more efficient. However, it might also be better in many cases to just use vertex shader as an assist instead of trying to do everything within the vertex shader.

Jawed
 
Just wanted to blurt this out...

Sorry if it's kind of off-topic :S and is leaning to the annoying "What if's scenarios"...

But I was wondering...

Seeing as how the Cell has performed in the graphics department...

Namely the "Getaway" presentation from E3 and the Satelite-feed Landscape...

I was wondering if Sony went with its original plan to use a cell-based GPU...

Would it have been more a more formidable GPU than the one made by Nvidia???
 
Jaws said:
Yeah, I can see that. However, I was replying to the poster who picked an arbitrary number , i.e. 90 % efficiency...what is the 90% measuring exactly from PR?

If anyone is interested, the 95% effeciency was in relation to ATI's current GPU effeciency, which they placed at 50-70% effecient in regards to shader utilization.

So there is a context, and it is a meaningful one in that ATI is comparing a new product and architecture (Xenos and USA) against their own products and established architecture (R300, R420; traditional pipeline with dedicated PS/VS units). ATI is very proud of R300, which has been a huge success for them, so the comparison they draw is not irrelevant. How it plays out in the real world is still unknown, but shader utilization is a known issue and bottleneck. This is not "extreme pipelines" but a real case of "problem, meet solution". How good that solution is, in the real world, well, conjecture away. ATI staked their claim at 95%.

'Xenos is the greatest GPU ever, and ATi won't have anything comparable for 5 years, and nVidia won't have anything comparable for 15 years, and RSX is poo by comparison'

...vs...

'Xenos is nice and all but it's not God's greatest gift to humanity and in many situation RSX might well perform as well if not better, and a lot of marketting speak is going round to hype Xenos when there's no real-world evidence to support claims of superior performance.'

Kind of slanted when the post that some of the first posts are of the "Xenos cannot be more advanced, it is older and RSX will have 6 months of newer tech" nature. Also, the caricature of Xenos fans (which ones, specifically, I would like to know) on regards to the 5/15 year comments and the like are really out of place.

Basically you setup your position as very reasonable and then mocked those who disagree with you by exaggerating--and not so subtly at that--their stance.

I guess I could take the comparison more seriously if it was representative of all the angles present and did not bash your opponent.

I think what you are saying has a point (and said differently could be a good summary of the different positions), but it could have been said more eloquenty and fairly. Right now all I get out of it is a very weak attempt at mask name calling and trolling. Comparing a "sensible pro-RSX" perspective against a ficticious and extreme "Xenos best eva" position while ignoring the "RSX extreme" and those with a more "sensible pro Xenos" position like Jawed and Dave just seems to point this thread in a direction it did not need to go. Picking out the best of one group agains the worse of another is just unfair. And I would go as far as saying your caricature of the biased Xenos fans really is unfair.

This thread probably did not start off on the best first step, and I almost did not bother to come back and read it because I thought it would implode. But it has not and it seems reason, even among those who disagree, has been very good.

So lets keep it that way by not setting up straw men or making veiled attempts at name calling. Focus on the issues, not the people :D
 
Status
Not open for further replies.
Back
Top